Activity Data Synthesis

Tuesday, 15 November 2011

Adieu from JISC AD

The launch of our final report website this week brings the work of the Activity Data synthesis team to a close and, therefore, this blog will now be mothballed and there are no further updates planned.

You can browse the final report using the web interface (expertly designed for us by Dan Moat) or download a pdf of the final report.

There is still activity happening on the individual project blogs and via the twitter hashtag #jiscad which you can keep up to date with by creating a Five Filters news digest on the fly whenever you get the urge.

So for now it's a heartfelt adieu from me and the rest of the project synthesis team. < waves >

Notes and Photos from Library Camp 2011

Back in October I joined a gathering of library workers, geeks, advocates and enthusiasts in Birmingham for the Library Camp 2011 unconference.

There were a few unusual things about the event from my point of view ...

To start with, it was the first time I'd been at a library event with such a mixed crowd - Public libraries folks rubbing shoulders with folks from the academic libraries doesn't happen as frequently as it should do.

Secondly, I have never seen so much cake.

Thirdly, it was the first unconference event I've been too where everyone introduced themselves at the start of the day.

Fourthly, I've never seen more sessions get proposed than the slots available.

Fifthly, there was a poet-in-residence which is, again, a first for me (though strangely I've been at another event since then which had a poet-in-residence too).

The idea of 150ish people introducing themselves one by one at the start of an event might seem like lunacy but I have to say I found it very moving and uplifting to hear everyone's reasons for being in the same place.

Here are some of the reasons I managed to scribble down as folks introduced themselves:
"... to capture the libgeist."
"I'm here to start the revolution."
"... lured by cake and curiosity."
"I'm looking for library lovers."
"... critique, collaboration and revolution."
"... to steal people's enthusiasm, passion and, hopefully, anger."
"Gratuitous hugging."
"Libation in the library."
"To show the rage and passion for libraries." 

Dave Pattern kindly agreed to be my wing man and ran a session on Activity Data and Recommender Services with me. Dave shared the good work he’s been involved with at Huddersfield University and I talked a bit about this programme and also shared some of the great (open source) applications that have come out of the JISC MOSAIC and Discovery developer competitions. Hopefully my interpretative dance representation of Alex Parker's Book Galaxy serendipitous search interface persuaded a few of those present to take a look at whether they can exploit any of the applications that are sitting there waiting to be plundered for a good cause. Part of the discussions we had in our session were around the challenge for libraries who don’t have developers on their staff to take advantage of opportunities like those. One of the solutions we discussed for that problem was to check who else is using the same library systems that your institution is using and looking for opportunities to form alliances around shared development goals.

All in all it was an invigorating day full of positive conversations and rapidly shared ideas. My only regrets are that a) I couldn't stay on into the evening to continue with the conversations and b) I didn't have a large tupperware box with me to take some of the cake home with me :)

Wednesday, 28 September 2011

A round-up of recent JISC Activity Data activity

The synthesis team have been busy honing the content of our final deliverable for this programme. Namely a one stop shop which gathers together the projects' collected wisdom on identifying, collecting, managing and sharing activity data within UK HE. The amount of content we have to share means we have a challenge to present it in an intuitive way with easily navigable routes into (and out of) the information for end users but we're hopeful that it's achievable and you'll be able to judge the fruits of our labour next month when we'll be launching it.

Earlier this month we ran a pre-conference workshop at ALT-C which (talking of navigation issues) seemed to go very well once the initial challenge of finding the room itself was conquered. The session was entitled 'Improving processes by using activity data' and featured the following sessions:
- Introduction to activity data [presented by Tom Franklin]
- Challenges raised by activity data [presented by Mark van Harmelen]
- [Case Study] Leeds Met STARTrak: NG - using activity data in support of student success [presented by Rob Moores]
- Discussion of potential use case [facilitated by David Kay]
- [workshop session] Building a business case [facilitated by Tom Franklin and myself]
- [workshop session] Working with activity data - technical discussion of value and challenges [facilitated by Mark van Harmelen and David Kay]

I've combined (i.e. wrestled) all the slides from the workshop into one presentation and uploaded it as a pdf which you can view below:

The Tabbloid digest for that week captures the tweets from the workshop:

Here are some of my twitter highlights from the day:

Hidden amongst all the workshop tweets is a gem from the AGtivity project: a blogpost sharing what they've worked out about handling timestamps between UNIX, GNUplot and Excel during the course of their project. Another gem came from the direction of AGtivity in the shape of Martin Turner bringing CritterVRE to my attention. Martin used it to capture the twitter activity from the day and it looks like a useful tool to add into my event amplification / capture toolbox. The service was developed to use alongside Access Grid sessions but it looks useful for other purposes too (if that's allowed, I haven't had a go yet).

September was a busy month on the Exposing VLE Activity Data project blog as the end of their project extension period:
- A guide to their Perl analysis tool.
-  A guide to using Gephi to visualise a bipartite network of users and websites [including a discussion of their approach and a technical recipe/guide] .
- Analysis of their VLE event logs.
- A discussion of releasing anonymised VLE event log data [including a link to the dataset they've released (4gb download)].

The LIDP project have now released data as well as a Library Impact Data toolkit (both of which are published under open licences).

Wednesday, 14 September 2011

Draft Guide: 'Legal issues relating to sharing data'

The problem

If you want to share activity data with others then you have to make sure that you have the right to do so, that you share it in an appropriate way and that the terms under which you share it are appropriate.

In order to share data you have to have the right to do so, In practice this means that you need to ensure that you have the right to do so because you have appropriate intellectual property rights (IPR) in the data. If the data subjects might be able to be identified (i.e. you are realising full data rather than statistical data) then the data subjects need to have been informed that sharing can happen when they agreed to the data being collected (and they had a real ability to opt out of this). Finally you will need to select an appropriate licence under which to release the data.

The options

Intellectual property rights (IPR)

It is likely that you will own the data from any systems that you are running, though it may be necessary to check the licence conditions in case the supplier is laying any claim to the data. However, if the system is externally hosted then it is also possible that the host may lay some claim to the log-file data, and again you may need to check with them.

  • JISC Legal has a section addressing copyright and intellectual property right law

Data protection

Data protection, which addresses what one may do with personal data, is covered by the 1988 Data Protection act, and there is much advice available including:

An alternative approach to addressing the needs of data protection is to anonymise the data.

Licensing the data

Any data automatically comes with copyright, and therefore you need to licence the data in order for other people to legitimately use the data. There are a wide variety of types of licence that you can use, though the most common is likely to be some form of creative commons licence.

Guidance is available from a wide variety of places including:

- Introduction to licensing and IPR

- Creative Commons license:

Wednesday, 31 August 2011

Tabbloid: 31 August 2011

A couple of updates from the project blogs this week:
In the wider world I stumbled across a mention that 'big data' has now made it onto the Gartner Hype Cycle for the first time, which seems significant even if, like me, you find yourself wondering where the Gartner Hype Cycle itself falls on their chart.

Next week the synthesis team will be reunited when we head to Leeds to run one of the ALT-C pre-conference workshop, 'Improving processes by using activity data', where we'll be joined by the geographically convenient Rob Moores who'll be sharing knowledge and experience from the Leeds Met STAR-Trak project with those who attend.

Wednesday, 24 August 2011

Tabbloid: 24 August 2011

This week there's an interesting post over on the EVAD project blog about the problem of finding the right 'data munging' tool and how they ended up developing their own custom perl script instead. They've publically released the perl script so it will be interesting to watch and see whether their custom built script suits the needs of another project or whether a new bespoke tool needs to be fashioned for every project going.

The LIDP project have been presenting to, and in attendance at, the Performance Measurement in Libraries and Information Services conference which is a week-long event taking place at York University [#pm9york]. Word on the twittersphere is that the LIDP toolkit will be released next week so I'll probably be linking to that next week.

The OpenURL Router Data project launched their article recommender prototype and it's just as well that I don't have an Athens log-in because I was quickly drawn in all sorts of intriguing looking material, including an article entitled 'Getting a Grip on Strangles'.

Out in the wider world there have been relevant links flying into my twitterstream from unexpected quarters which suggests to me that either a tipping point is coming our way in terms of a wider awareness of activity data, or I'm am getting more creative in my interpretation of what is relevant to the programme. In any case here are a few highlights that I've picked out of this week's Tabbloid:

Wednesday, 10 August 2011

Five Filters news digest: 10 August 2011

A Tabbloid did in fact wend its way into my inbox this morning but it was a little bereft of life so I've turned to the trusty Five Filters website to create this week's blog digest. As before, you can generate a digest on the fly but I'll also be sending it out via email.

Just a couple of project updates this week:
  • the UCIAD project published their final project blogpost, including a video which gives a demo of the UCIAD platform, with an accompanying written commentary nestled below the video [and I can confirm that it's in with a good chance of winning both the 'techiest video I've watched' and 'longest video without a soundtrack' awards in my imaginary video award ceremony at the end of the year]. It's a shame we haven't got any more online exchanges planned because it would have been a good opportunity to get Mathieu to talk through the demo. I'll be interested to hear the results of the user feedback that the project plans to gather as part of their post-JISC project activity.
News from the twittersphere:
News from the synthesis team is that we've finalised the programme for the pre-conference ALT-C ['Improving processes by using activity data'] workshop which we're running on 5 September in Leeds. The workshop is free, includes lunch, and you don't need to be going to ALT-C in order to attend.