Activity Data Synthesis

Wednesday, 31 August 2011

Tabbloid: 31 August 2011

A couple of updates from the project blogs this week:
In the wider world I stumbled across a mention that 'big data' has now made it onto the Gartner Hype Cycle for the first time, which seems significant even if, like me, you find yourself wondering where the Gartner Hype Cycle itself falls on their chart.

Next week the synthesis team will be reunited when we head to Leeds to run one of the ALT-C pre-conference workshop, 'Improving processes by using activity data', where we'll be joined by the geographically convenient Rob Moores who'll be sharing knowledge and experience from the Leeds Met STAR-Trak project with those who attend.

Wednesday, 24 August 2011

Tabbloid: 24 August 2011

This week there's an interesting post over on the EVAD project blog about the problem of finding the right 'data munging' tool and how they ended up developing their own custom perl script instead. They've publically released the perl script so it will be interesting to watch and see whether their custom built script suits the needs of another project or whether a new bespoke tool needs to be fashioned for every project going.

The LIDP project have been presenting to, and in attendance at, the Performance Measurement in Libraries and Information Services conference which is a week-long event taking place at York University [#pm9york]. Word on the twittersphere is that the LIDP toolkit will be released next week so I'll probably be linking to that next week.

The OpenURL Router Data project launched their article recommender prototype and it's just as well that I don't have an Athens log-in because I was quickly drawn in all sorts of intriguing looking material, including an article entitled 'Getting a Grip on Strangles'.

Out in the wider world there have been relevant links flying into my twitterstream from unexpected quarters which suggests to me that either a tipping point is coming our way in terms of a wider awareness of activity data, or I'm am getting more creative in my interpretation of what is relevant to the programme. In any case here are a few highlights that I've picked out of this week's Tabbloid:

Wednesday, 10 August 2011

Five Filters news digest: 10 August 2011

A Tabbloid did in fact wend its way into my inbox this morning but it was a little bereft of life so I've turned to the trusty Five Filters website to create this week's blog digest. As before, you can generate a digest on the fly but I'll also be sending it out via email.

Just a couple of project updates this week:
  • the UCIAD project published their final project blogpost, including a video which gives a demo of the UCIAD platform, with an accompanying written commentary nestled below the video [and I can confirm that it's in with a good chance of winning both the 'techiest video I've watched' and 'longest video without a soundtrack' awards in my imaginary video award ceremony at the end of the year]. It's a shame we haven't got any more online exchanges planned because it would have been a good opportunity to get Mathieu to talk through the demo. I'll be interested to hear the results of the user feedback that the project plans to gather as part of their post-JISC project activity.
News from the twittersphere:
News from the synthesis team is that we've finalised the programme for the pre-conference ALT-C ['Improving processes by using activity data'] workshop which we're running on 5 September in Leeds. The workshop is free, includes lunch, and you don't need to be going to ALT-C in order to attend.

Wednesday, 3 August 2011

Tabbloid: 3 August 2011

Some more final blogposts have emerged this week:
Some other project blogposts worth visiting if you're interested in the more technical side of what they've achieved:
A couple of other interesting reads I saw flying at high velocity around the twittersphere today:

Tuesday, 2 August 2011

Draft Guide: 'Dealing with Activity Data'

[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]

The problem:

A project that aims to make use of activity data from sources such as those in the Identifying Activity Data draft Guide can’t avoid the fact that they will inevitably have to roll their collective sleeves up and get hands on with various data sources. It is likely that the data you hope to extract and manipulate will be either hard to reach, unwieldy, incompatible, incomplete, downright uncooperative or all of the above. This guide shares some helpful hints from the experiences of the JISC Activity Data projects and the wider world of library data hacking.

The solution:

Dealing with activity data relies on embracing a pioneering mindset, requiring equal measures of experimentation and hacking, together with a sixth sense of how far down one route you should go before accepting that a different solution is needed. Unfortunately there are no hard and fast rules you can follow but here are helpful principles and pointers that have come out of the JISC AD projects and beyond:

Taking it further:

If you are releasing open data with the hope that people outside of the project and the institution will do something with that data, it’s worth taking steps to remove any unnecessary barriers. Many of those barriers will be the same things that made it a challenge for you to deal with the data in the first place:

  • create small sample files that enable potential end-users to get a feel for the scope and structure of the data you’re sharing.
  • use lowest common denominator/widely accepted formats e.g. CSV
  • publish the scripts you yourself used to manipulate the data. If you adapted someone else’s script/code then share what you’ve done with them to create a virtuous cycle of iterative improvements.

Additional resources:

Tony Hirst’s Online Exchange presentation covers some of the issues mentioned in the section above: . Tony’s blog is also a robust source of further information:

This twinset of AEIOU project blogposts were the initial inspiration for this guide:

The EVAD project is handling a vast dataset and have blogged about the data and also published a Guide to Using Pivot Tables in Open Office. They’ve also shared their thoughts around taking a user-centric approach to their data:

The OU RISE project documented their thoughts about how they could most usefully format the recommender data they plan to release: