Activity Data Synthesis

Wednesday, 27 April 2011

Free report from Martin Butler Research on Social Media Analytics Comparison

There is a free report from Martin Butler on Social Media Vendor Analytics.

Included in this brief report are profiles of eleven social media analytics solutions.

Alterian: Sophisticated offering suitable for medium to large organisations, with excellent sentiment analytics.
Brandwatch: High quality social data and good sentiment analysis tools.
Coremetrics: Very sophisticated solution with deep integration into IBM's WebSphere technologies.
IBM Social Media: Combination of services and solutions typically for large corporations.
Lithium: Enterprise level solution with strong collaboration features.
MutualMind: Closed loop model for quick conversion of information into action.
NM Incite: Top-end offering from Nielson and McKinsey. Many unique features - at a price.
Radian6: Very rich visual analysis environment - just been acquired by Salesforce.
SAS: Large, complex solution with opportunity to perform almost any kind of analysis.
SocialSprout: Easy to use - ideal for agencies, small businesses and individuals.
uberVU: Excellent all-round capability, providing a very cost effective solution.
Viralheat: Very sophisticated analytics for a very modest price.

Report available at

Thursday, 21 April 2011

The cookery metaphor

I had a great day today, visiting the Open University Projects RISE and UCIAD.
RISE has implemented three kinds of recommendations that appear on search pages and has rolled this out to library users. The recommendations for serial items, the three types of recommendation are based on
  1. Choices made from prior search results for the same search term
  2. Choices made by users taking the same course
  3. What users went on to select after the same choice from the current user’s last choice
These are shown on as many pages as they are applicable on, eg if the user is staff 2 is not shown, if there are no results nothing is shown.

UCAID is developing the infrastructure to gather and ontologies to reason over activity data obtained from multiple sources of different kinds, eg Web sites, blogs, Library services, VLEs. Interestingly, the UCAID trace ontology is being built from the bottom up while examining linked data created from the logs (using a KMi tool, NeOn). While visiting UCAID and hearing more about it, I felt that this is in important longer-term project for the activity data in that the linked data approach will allow great flexibility over queries that are made over activity data, and via the ontology, the ability to infer user centric information. For example, following up an interest of mine, we discussed how this approach could be leveraged to “find me people who are interested in learning the same kinds of things as I am.” (Answer, for general purpose uses, needs a bit of technology to ‘understand’ text, certainly available in the future, we felt.)

While we were talking, I felt that the cookbook approach needed another component, an ingredient. I remember reading two cookbook authors in particular and noting their interest in ingredients: Firstly, Elizabeth David, doyenne of Mediterranean and French cookery who …

"Born to an upper-class family, David rebelled against social norms of the day. She studied art in Paris, became an actress, and ran off with a married man with whom she sailed in a small boat to Greece. They were nearly trapped by the German invasion of Greece in 1940 but escaped to Egypt where they parted. She then worked for the British government, running a library in Cairo. While there she married, but the marriage was not long lived.

After the war, David returned to England, and, dismayed by the gloom and bad food, wrote a series of articles about Mediterranean food that caught the public imagination. Books on French and Italian cuisine followed, and within ten years David was a major influence on British cooking. She was deeply hostile to second-rate cooking and to bogus substitutes for classic dishes and ingredients. She introduced a generation of British cooks to Mediterranean food hitherto barely known in Britain, such as pasta, Parmesan cheese, olive oil, salami, aubergines, red and green peppers, and courgettes.”

Wikipedia on Elizabeth David

No mean feat, to change the eating habits of a country.

More to the point, I remember reading her cookbooks and enjoying lengthy discussions about good ingredients, a proper concern for a cookbook. The second author is Shizuo Tsuji who wrote a standout work, The Simple Art of Japanese cook, and he properly devotes space to ingredients too. I’d show you pictures but it’s all copyrighted so if you want to, scroll down at Google books till you get to illustrations of ingredients.

So what’s an ingredient? Something you can use. In the UCIAD case, the first example is their UCIAD ontology.

But an ingredient can be a data set too, like LIDP’s activity data that is collected from multiple institutions, or RISE’s database contents, or UCIAD’s triple store contents. And there are more.

Back to Manchester, I thought about NeOn toolkit (produced by the NeOn Project ). This is a tool being used in the construction of UCIAD’s ontology.

For a cookbook metaphor, what about tools, where do they fit? Clearly as cooking utensils, and of course my cookbook heroes do also talk about utensils. I remember Elizabeth David discussing copper omelet pans, and Tsuji writing about chopsticks for cooking tempura, thick in diameter so as to handle the tempura more gently than standard diameter cooking chopsticks.

Of course NeOn is in this class of objects, one uses it to prepare something, an ingredient, the UCAID ontology.

So basically what we are using here is a metaphor, it turns out that we can use the metaphor of cookbook, recipes, ingredients, utensils, and dishes (we shouldn’t forget dishes!) for some useful computer system concepts:

  • Utensils – programs or scripts
  • Ingredient – data, be it usage data or data that describes usage data (eg an ontology).
  • Dish – something for the user to consume (or interact with)
  • Recipes – description of a process whereby ingredients are transformed to (or change) another ingredient or produce some dish
  • Cookbook – a bunch of descriptions of processes.

Pretty good, although we didn’t intend to have as complete an activity-data computer-system description as offered by the metaphor, but those canny cooks have thought of it all. Perhaps it’s a consequence of the nature of making something; perhaps we could chosen any process of making something and used that as the metaphor. For example, we might have used flat-pack furniture construction instructions as the central part of the metaphor. Then in the metaphor we also have tools, bits of disassembled furniture, the piece of furniture assembled by following the instructions, and of course a collection of flat pack assembly instructions. (Don’t ask, but yes, really, honest I lie not :) this is a believable metaphor,of course I keep all my flat pack instructions just in case I find a need to dissemble to and re-assemble in the proper order.) No don’t believe me? ME neither! So it doesn’t take any consideration: The cooking metaphor is viable, perhaps because food is a feature of everyday life, and flat pack assembly as a metaphor is plainly just weird, probably because those instructions are often close to useless, and, additionally, how often do we get new flat pack furniture?

The point to all of this meandering is that cookbooks fulfill, for the synthesis team, a way of looking at project outputs, surfacing information that will inform whatever synthesis we do. One of the interesting things about synthesis activities is that one can’t say where the activity ends up, much depends on the way the informing information flows together and the patterns it reveals. Of course one can say things like “we can do some work in architecture”, or “we can attempt to build a taxonomy of activity data based systems”, or “we can pull together different projects’ user feedback into recommendations”, but one can’t foretell precisely what will turn up, and what turns the synthesis will take.

What I am hoping is that the process of eliciting and gathering recipes will elicit a good amount of data about the systems being developed. For example, it was revelatory to assist in recording three recipes for the three types of recommendations in RISE: By recording salient features of each of the three processes they suddenly stood, at a conceptual level, they stood in sharp relief to each other by virtue of differences expiated in the recipes.

And that’s all folks, have a happy Easter break!

- - - -

Footnote: For completeness from a computer scientist, but perhaps ignore this, an ingredient could also parameterise a process or specify how a program or script operates.

But something to mull over (vegetarians excepted, and apologies to you):

Tuesday, 19 April 2011

Tabbloid #6: 18 Apr 2011

There is so much activity data related aceness in this week's Tabbloid that it's hard to know where to begin ... << deep breath >>

Dave Pattern tweeted about the gender differences he'd noticed in one of the #LIDP datasets:
- females have a stronger book & grade correlation than males [src]; and
- males have a stronger e-resource usage & grade correlation than the females [src].

Serendipitously, Paul Bacsich shared a link to Elly Broos' discussion paper: 'Gender Perspective on e-learning and information sharing' which adds some additional context and apparently is generating a good level of debate on the Instructional Technology Forum email list.

The #LIDP project team have been having all sorts of fun with data this past week:
- Dave Pattern has swapped his self-proclaimed 'shambrarian' title for 'shamistician' and has been playing around on the the 'R Project for Statistical Computing' website and also sharing some interesting graphs.
- De Montfort shared their guide to 'stitching together library data with Excel' which makes it look so simple that I'm almost tempted to have a go myself.

The #JISCSALT project team shared their news that, somewhat surprisingly, that extracting a sample of data from the LMS at John Rylands had been easier than expected which bodes well for extracting the remaining 3.5 million records. Janine Rigby also shared her thoughts on what shape the user evaluation of their recommender tool will take - the plan is to gauge users' attitudes to data privacy as much as their thoughts on the tool itself and a deeper understanding of the subtle hierarchies of trust and perception of value that are in play when users evaluate recommendations.

Finally, a couple of 'wider world' links worth taking a look at:
- Dave Pattern (again, I know!) flagged up a virtual event being run by The National Federation of Advanced Information Services (NFAIS). The event was called 'Information Access and Usage Behavior in Today's Academic Environment' and it's worth taking a browse through their archived tweets from April 15th to eavesdrop on some of their interesting discussions - I particularly like the fact that one of the attendees managed to reference Flaubert.
- This Chronicle of Higher Education article on 'higher education's Net­flix Effect' profiles a 'number-crunching provost' in the US who has embraced the power of data to provide course recommendations to students. [spoiler alert: I'm giving them extra brownie points for including the word 'zeitgeist' at the end of the article]

Monday, 18 April 2011

Mind maps of problems and solutions from Birmingham

We've built Mindmester mind maps of all the impediments and solutions that were surfaced at the Birmingham programme inception meeting.

The main map shows all identified impediments and solutions. You can jump to a project specific map for the project that contributed a particular item or solution by clicking on a node (that contains a small arrow).

Some interesting items were potential sources of impediment/solution crossover as in these (clickable) screenshots

  • Visualisation crossover

  • 'This article is like that article' crossover (which is very interesting to me outside of the Activity Data Programme for an e-learning application)

  • I can't but feel that there are some interesting scalability lessons to be learned, though at a guess most outside of the current timeframe. I note here that UCAID is somewhat specialist, with triple store based experience, however, I believe that there is a growth path for activity data into the realms of triple stores with query operations via SPARQL end points).

Here is an example of two projects sharing a similar impediment, there are other examples in the data

There are also a gratifying number of known solutions in the data: There were about 20 known solutions vs 29 impediments/problems.

Please add a comment below if you see any similarities and patterns; this is very interesting for our synthesis activities (and, who knows, may lead to active project collaboration now or in the future).

Project mind maps are
  • Aberystwyth AEIOU Activity data to Enhance and Increase Open-access Usage Wales
  • Cambridge EVAD Exposing VLE activity data
  • Edinburgh OpenURL Using OpenURL Activity Data
  • Huddersfield LIDP Library Impact Data Project
  • Leeds Met STAR-Trak STAR-Trak Next Generation
  • Manchester AGtivity Exploiting Access Grid Activity Data
  • Manchester SALT Surfacing the Academic Long Tail
  • Open University RISE Recommendations Improve the Search Experience
  • Open University UCAID User Centric Integration of Activity Data

And our previous activity data project has a mind map too, of the most relevant things fr the current purposes
  • Sero and Hedtek MOSAIC Making Our Scholarly Activity Information Count

Chocolate fudge brownies

Chocolate fudge brownies


Tom Franklin


To earn brownie points for your project.


Most programme managers like chocolate, chocolate cake etc. However to make the point clear we recommend that you serve brownies to indicate that you deserve brownie points


  • 100g Chocolate (plain or milk chocolate to taste)
  • 100g Butter or Margarine (if you must or are vegan)
  • 200g Self-raising flour
  • 200g Sugar (Soft brown sugar works best but caster sugar still works well)
  • 3 large eggs
  • 2 tablespoons cocoa powder


You have the ingredients and that you are making these shortly before the arrival of the programme manager so that they are still fresh.

You are not vegan as this recipe contains eggs.


Do not serve when old, stale or mouldy.

Allergy advice:

  • This recipe contains eggs
  • This recipe contains gluten


  • Melt the chocolate and butter together in a sauce pan over a low heat
  • Turn off the heat and gradually stir in the sugar and eggs
  • Add the self-raising flour and cocoa powder
  • Pour the mix into greased 30cm x 35cm square tin
  • Bake for about 15 minutes at 180°C, the middle should be soft.
  • Let the brownies cool in the tin.
  • Cut into rectangles

Individual steps

See method

Output data

42 5cm x 5cm brownies

Appendix A: Sample output

See cook for a sample of the output data, or come to one of our workshops

Tuesday, 12 April 2011

Tabbloid #5: 11 April 2011

This week's Tabbloid is a bit of a bumper edition. The highlights for me are:
- The RISE project have shared a demo of their MyRecommendations prototype and reported a positive reaction from their first Project Board meeting which I noticed was attended by a representative of their 'Improving the Student Experience' programme. Some of their discussions were (perhaps inevitably) around privacy issues which (unsurprisingly) looks set to emerge as one of the overarching themes of this programme.
- LJMU reported on some of the initial hurdles they've faced in trying to extract the desired data from their systems and in their efforts to herd students into a focus group - suffice to say that the Royal Wedding is not really helping matters. The 'what would we do differently' angle they've added to the post is very useful and highlights the difficulty of gauging how much detail is needed in early discussions with internal stakeholders.
- The Leeds Met STAR-trak team have been blogging with vim and alacrity this past week. A gem of insight stood out to me in their post about users:
We are in the fortunate position of already having a proof of concept application. Having something to look at makes it far easier for end users to grasp the potential uses for the application and thus come up with requirements.

- There was also a whir of synthesis team activity - namely, running the first 'virtual event' which I then blogged about, and also Tom Franklin released his guide to writing a robust business case.
- The focus on costs vs. benefits within Tom's guide links nicely to a point noted by Martin Turner on the AGtivity blog relating to questions about "finding and defining the usage and benefit access gives the user" which emerged from last week's #INF11 Programme Meeting in Birmingham. Will we get further down the road on the Activity Data donkey if we focus on finding tastier carrots and do what we can to hide the 'risk stick' out of view (albeit not out of mind)?

Friday, 8 April 2011

Developing a business case

Purpose of Business case

The most important thing to remember when developing a business case is that its purpose is to persuade someone to release resources for the proposed activity. This will primarily be money or staff time in most cases. It makes little difference whether this is re-allocating staff to a new project, employing new staff or contracting for the service.

The person who will have to make the decision has a wide variety of competing requests and demands on the available resources, so that what they need to know is how the proposed project will benefit them. Note that while this should be about benefits to them in their role they are people with particular interests and desires (including keeping their job and their next promotion). The question that they need the answer to is why should I use the resources on this project rather than some of the others?

The answer to this question should be that it helps them move towards their strategic goals. So the first thing that you need to find out is what their strategic goals are. These could be around cost savings (very likely in the current economic climate), improving the student experience, increasing the use of resources. You should then select one (or at most two) of these goals and explain how the project will help to meet this goal (or goals).

You may feel that the project can address many goals, and it is possible that this is true, but you need to choose the ones that are central to the person you are writing the case for. Aligning the project to many goals has the danger of diluting each of them and having less impact than a strong case for a single goal.

The business case is intended to enable the decision maker to make an informed decision based on the (potential) benefits, the costs and the risks and how this particular project would further their strategic goals. It doesn’t require academic rigour, but it does require evidence


A business case, like a bid for funding or a student essay, must answer the questions that the person reading it has. A typical business case of the type we need here will have the following structure:

Title. Should be short and descriptive (eg Proposal to use data from the VLE and SRS to identify students at risk of dropping out).

Intended audience. While this would not normally form part of a business case this would, it will be helpful for other people reading this business case and considering how they might use it for themselves. Essentially, the intended audience should be the budget holder who will benefit from the project. Why would the librarian want to pay for a project that only benefits student services when there are plenty of projects that they would like to fund of benefit to the library? Examples might include head of student service, librarian, PVC for teaching and learning.

Brief description of what is being proposed. This should focus on the effect that will be achieved when the result is delivered. It is best to avoid technical descriptions, and to give the reader some concrete benefits from the project. (See panel for a brief example). The level of detail should be appropriate to the size of the project and the audience that it is intended for. The description will be fleshed out further in the main part of the business case, the idea is that the reader can quickly understand what the proposal covers and provides some context for all that follows which will include the details needed to support an informed decision.

Brief description
For a VLE activity data project
The University of Wigan has a significant problem with non-completion by students. Currently 21% of all undergraduate students fail to complete, with half of those dropping out in the first year. Evidence demonstrates that early identification of students at risk of dropping out followed by active intervention could reduce this to 18% gaining the University £3million per year in additional HEFCE and student income. This project will use data that is collected by the attendance system and VLE to automatically identify students displaying at risk patterns of behaviour, thereby enabling student services and personal tutors to focus their efforts where they will have the greatest impact.

Alternative options. There are always alternative options that could be implemented to achieve the same business goal, and it is important to show that you have considered them and explain why the proposed option is superior. Note that these are not technical alternatives (or at least not only technical alternatives), but alternative approaches to achieving the same benefit that this project is seeking to achieve. For instance, an alternative to using activity data for the early identification of students at risk might be reports from tutors. The important thing in these business cases is to demonstrate that you have considered alternatives and have valid reasons for the choice that you made. This may be because the cost is lower, the benefits are higher or the risks are lower.

For each alternative you should:

  • Provide a brief description of the approach, highlighting the key differences from the approach proposed,
  • Describe the benefits of the alternative approach (lower cost, less staff development, fewer risks, no legal implications…..)
  • Describe the costs and risks associated with the alternative
  • Summarise the reasons for rejecting this approach in favour of the selected approach.

Do not “over egg the pudding” in terms of overstating the costs and risks or understating the benefits. If the alternatives are not credible then the business case may be rejected as it appears to be not offering realistic alternatives.

Benefits. This is where you outline the benefits of the project. Remember that the business case is aimed at a particular person (role), who will be funding the project either with cash or by allocating staff time. They are primarily interested in benefits that address their strategic goals. Or, to put it another way, why would they pay for a project if the benefits fall elsewhere, they would expect the beneficiary to pay for the project.

The benefits should be realistic and quantifiable. If your project will, if successful, reduce student drop-out from 28% to 27% then that is what you should claim. If you over claim you might be asked to deliver that, and then when you deliver what is actually very worthwhile the project may be seen as failure as it did not deliver what was promised. It may even be worth under claiming the benefits (so long as they still exceed the costs), as delivering more than promised is usually seen as a good thing. Wherever possible the benefits should be quantified in monetary terms, this allows the decision maker to compare the benefits and costs (which usually can be expressed in monetary terms), and so see the return on investment.

Return on investment

Formally this can be calculated as:

ROI = benefit - cost

But more usefully it can be considered to be the amount of time taken for the investment (cost) to be covered by the benefit.

Costs. This can be expresses in financial terms or in terms of staff effort (which can easily be turned into financial costs. A breakdown of the main headings is useful. If you want to relate the costs directly to the project plan then this may be more appropriately put after the project plan.

Project plan. You have all produced these in the past, so I don’t think there is any need to go into any great details. I would expect a fuller description of the project, the main tasks involved and how much effort or cost each will take.

Risks. This is similar to the type of risk register that you might include in a JISC bid, though it is useful to give an indication of the cost that risk will incur if it occurs. This could be expressed financially (pounds) or as effort (person days).

Data formats incompatible IT manager Low 5 days Map between formats
Sued for breach of privacy Data protection officer Very low >£10,000 Ensure agreements are in place and signed by students

Recommendation. This is likely either to be to take action (ROI is positive) or not (ROI is negative).

Additional information

Guidance and templates


Dispatches From the Ether: virtual event #1 [6/4/2011]

While pondering what I would write about the first of our 'virtual events' the lyrics of an overlooked 80s pop tune drifted into my peripheral thoughts:
"I'm sorry to interrupt your conversation, but we are experiencing violent storm conditions in the asteriod belt at this time. We may lose this valuable deep space communication link.
Please, be as brief as possible. Thank you."
Despite fully testing the web conferencing system (GoToMeeting) on Sunday the session had more of a 'wing and a prayer' feel than we'd been hoping for. Aside from a few glitches regarding audibility and less interactivity than we had planned the session went well overall (thanks in no small measure to Mark van Harmelen's technology wrangling skills) and the text chat facility turned out to be very useful for trading snippets of knowledge and relevant urls.

At the start of the session a spokesperson for each of the JISC Activity Data projects gave a 1 minute pitch describing the aim of their project and their key challenge at this moment in time. Some familiar themes emerged around issues of data protection and also the technical challenges of extracting useful data (particularly when it's being mined from different sources i.e. system or institution). I was particularly interested to hear that the AEIOU project are planning to run focus groups - it will be good to swap notes on effective focus group methodology between them and projects such as LIDP who are also planning to run them.

We then got to hear from our guest speakers who were kind enough to spare the time to tell us about their projects:

David Weinberger talked about the < whispers dramatically > Harvard LibraryCloud project which is currently in 'semi-stealth mode' and aims to take in a range of library data, normalise it and then release it in a format that can be as widely exploited as possible. They are currently looking at issues around the sustainability of providing a global library cloud service and they will be opening the project up to a wider audience as soon as the core infrastructure issues (such as hardware capacity) have been tackled.

Steve Midgeley and Dan Rehak gave an insight into The Learning Registry which is currently creating the infrastructure which will enable projects to share their data in the public (or within a secure environment if required). The technical wizadry in the background is a schema-free database which means that projects can donate their datasets without needing to transform it to meet the requirements of a specific database schema first. They plan to move into 'production' phase next month and are currently offering to give technical support to anyone who has data that they'd like to put into the registry. You can also get involved via their community discussion group or their developer's discussion group. On a personal note, it was good to hear them talk about 'paradata' which for me has recently replaced 'metadata' as an strong indicator of how geeky a conversation is likely to get.

We were also joined on the call by Susan Van Gundy who was representing the U.S. National Science Digital Library (NSDL) who are currently working in partnership with the Learning Registry on work which includes a demonstration project called STEM Exchange. Note also their useful definition of paradata.

Other noteworthy links and discussions from the backchannel:
- (A collaborative project between The Open University and Carnegie Mellon University which "aims to bring researchers and educators together in an intelligent social network to share knowledge on the development of Open Educational Resources (OER)").
- (the Open Bibliographic Data guide to rights and licensing)
- (who are "at the forefront of global initiatives related to the exchange and interoperability of digital learning resources.")
- there was a useful reminder via the text chat backchannel that the complex issue of data protection differs between different countries and that in the US 'privacy concerns trump all other issues, no matter how unlikely the risk.' ... which means that David Kay's assertion that taking a 'sensible approach' to using activity data to improve pedagogical practice is likely to get lost in translation if a project takes data from outside of the UK [I wonder where this leaves UK universities who have students accessing their library services from non-UK campuses?]

There was a very positive response to the suggestion that we invite speakers from the world of supermarket loyalty cards to speak at the next virtual event so it will be our mission to make that happen. Our colleagues from the project also requested a speaking slot at the next session so we'll be including them on the agenda, assuming that our scheduling stars are in alignment.

Wednesday, 6 April 2011

Tabbloid #4: 4 April 2011

A tweet from the #LIDP project caught my eye - it announced the publication of a report by Deborah Goodall and the venerable Dave Pattern: 'Academic library non/low use and undergraduate student achievement: a preliminary report of research in progress' which explores the University of Huddersfield's finding that
"in some subjects, students who ‘read’ more, measured in terms of borrowing books and accessing electronic resources, achieve better grades."

A couple of other things that my internal Activity Data radar has picked up in recent days:
- - a pan-European competition which is offering a total prize fund of €20,000, including €1000 for the "Talis Award for Linked Data", which may well get some of you data geeks' hearts racing.
- The Government's ICT Strategy, which was released this past week, mentions open data more than once - and announced plans to establish the 'Public Data Corporation' to support the opening up of Government data and interfaces. Within 6 months there is planned action to: "To ensure that appropriate data is transparent and shared rather than duplicated, the Government will implement engagement processes for open data standards activity and crowd-source priority areas for data standards." It will be interesting to see whether this burst of activity has a ripple through effect to the world of academic data.