Activity Data Synthesis

Wednesday, 28 September 2011

A round-up of recent JISC Activity Data activity

The synthesis team have been busy honing the content of our final deliverable for this programme. Namely a one stop shop which gathers together the projects' collected wisdom on identifying, collecting, managing and sharing activity data within UK HE. The amount of content we have to share means we have a challenge to present it in an intuitive way with easily navigable routes into (and out of) the information for end users but we're hopeful that it's achievable and you'll be able to judge the fruits of our labour next month when we'll be launching it.

Earlier this month we ran a pre-conference workshop at ALT-C which (talking of navigation issues) seemed to go very well once the initial challenge of finding the room itself was conquered. The session was entitled 'Improving processes by using activity data' and featured the following sessions:
- Introduction to activity data [presented by Tom Franklin]
- Challenges raised by activity data [presented by Mark van Harmelen]
- [Case Study] Leeds Met STARTrak: NG - using activity data in support of student success [presented by Rob Moores]
- Discussion of potential use case [facilitated by David Kay]
- [workshop session] Building a business case [facilitated by Tom Franklin and myself]
- [workshop session] Working with activity data - technical discussion of value and challenges [facilitated by Mark van Harmelen and David Kay]

I've combined (i.e. wrestled) all the slides from the workshop into one presentation and uploaded it as a pdf which you can view below:



The Tabbloid digest for that week captures the tweets from the workshop:


Here are some of my twitter highlights from the day:







Hidden amongst all the workshop tweets is a gem from the AGtivity project: a blogpost sharing what they've worked out about handling timestamps between UNIX, GNUplot and Excel during the course of their project. Another gem came from the direction of AGtivity in the shape of Martin Turner bringing CritterVRE to my attention. Martin used it to capture the twitter activity from the day and it looks like a useful tool to add into my event amplification / capture toolbox. The service was developed to use alongside Access Grid sessions but it looks useful for other purposes too (if that's allowed, I haven't had a go yet).

September was a busy month on the Exposing VLE Activity Data project blog as the end of their project extension period:
- A guide to their Perl analysis tool.
-  A guide to using Gephi to visualise a bipartite network of users and websites [including a discussion of their approach and a technical recipe/guide] .
- Analysis of their VLE event logs.
- A discussion of releasing anonymised VLE event log data [including a link to the dataset they've released (4gb download)].

The LIDP project have now released data as well as a Library Impact Data toolkit (both of which are published under open licences).

Wednesday, 14 September 2011

Draft Guide: 'Legal issues relating to sharing data'

The problem

If you want to share activity data with others then you have to make sure that you have the right to do so, that you share it in an appropriate way and that the terms under which you share it are appropriate.

In order to share data you have to have the right to do so, In practice this means that you need to ensure that you have the right to do so because you have appropriate intellectual property rights (IPR) in the data. If the data subjects might be able to be identified (i.e. you are realising full data rather than statistical data) then the data subjects need to have been informed that sharing can happen when they agreed to the data being collected (and they had a real ability to opt out of this). Finally you will need to select an appropriate licence under which to release the data.

The options

Intellectual property rights (IPR)

It is likely that you will own the data from any systems that you are running, though it may be necessary to check the licence conditions in case the supplier is laying any claim to the data. However, if the system is externally hosted then it is also possible that the host may lay some claim to the log-file data, and again you may need to check with them.

  • JISC Legal has a section addressing copyright and intellectual property right law

Data protection

Data protection, which addresses what one may do with personal data, is covered by the 1988 Data Protection act, and there is much advice available including:

An alternative approach to addressing the needs of data protection is to anonymise the data.

Licensing the data

Any data automatically comes with copyright, and therefore you need to licence the data in order for other people to legitimately use the data. There are a wide variety of types of licence that you can use, though the most common is likely to be some form of creative commons licence.

Guidance is available from a wide variety of places including:

- Introduction to licensing and IPR http://xerte.plymouth.ac.uk/play.php?template_id=352

- Creative Commons license: http://xerte.plymouth.ac.uk/play.php?template_id=344