Activity Data Synthesis

Wednesday, 29 June 2011

Draft Guide: 'Anonymising data'

The problem:
Data protection requirements mean that we cannot release personal data to other people without the data subjects' permission. Much of the activity data that is collected and used contains information which can identify the person responsible for its creation. It may contain their username, the IP address from which they were working or other information including patterns of behaviour that can identify them.

Therefore where information is to be released either as open data for anyone to consideration needs to be given to anonymising the data. This may also be required for sharing data with partners in a closed manner depending on the reasons for sharing and the nature of the data together with any consent provided by the user.

The options:
Two main options exist if you want to share data.

The first is to only share statistical data. As the Information commissioner recently wrote:
"Some data sharing doesn’t involve personal data, for example where only statistics that cannot identify anyone are being shared. Neither the Data Protection Act (DPA), nor this code of practice, apply to that type of sharing."

The second is to anonymise the personal data so that it cannot be traced back to an individual. This can take a number of forms. For instance, some log files store user names while other log files may store IP addresses, where a user uses a fixed IP address these could be traced back to them. anonymising the user name or IP address through some algorithm would prevent this. A further problem may arise where rare data might be able to be used to identify an individual. For instance a pattern of accessing some rare books could be identified to someone with a particular research interest.

Taking it further:
If you want to take it further then you will need to consider the following as a starting point:
  • Does the data you are considering releasing contain any personal information?
  • Are the people that you are sharing the data with already covered by the purpose the data was collected for (eg a student’s tutor)?
  • Is the personal information directly held in the data (user name, IP address)?
  • Does the data enable one to deduce who used that data (only x could have borrowed those two rare books – so what else have they borrowed)?
Additional resources:

Friday, 24 June 2011

Draft Guide: 'Developing a Business Case'

[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]

The problem:
Getting senior management buy in for projects which make use of activity data to enhance the user experience or management of facilities is key if projects are to get the go ahead in the first place and become a sustainable service in the long term. There is a lack of persuasive business cases to refer to in the public realm. This guide gives some high level advice for the effective development of a solid business case.

In the current programme, activity data is being used to enhance the learner experience through recommending additional material, effectively manage resources and increase student success by helping them improve their online practices. Each of these is a powerful strategic benefit.

The solution:
The most important thing to remember when developing a business case is that its purpose is to persuade someone to release resources (primarily money or staff time) for the proposed activity. The person who will have to make the decision has a wide variety of competing requests and demands on the available resources, so that what they need to know is how the proposed project will benefit them.

The answer to this question should be that it helps them move towards their strategic goals. So the first thing that you need to find out is what their strategic goals are. Typically these are likely to include delivering cost savings, improving the student experience or making finite resources go further. You should then select one (or at most two) of these goals and explain how the project will help to meet this goal (or goals). Aligning the project to many goals has the danger of diluting each of them and having less impact than a strong case for a single goal.

Structure of a business case:
- Title
- Intended audience
- Brief description
- Alternative options
- Return on investment
- Costs
- Project plan
- Risks
- Recommendation

Do not 'over egg the pudding' in terms of understating the costs and risks or overstating the benefits. If the costs or benefits are not credible then the business case may be rejected as it appears to be not offering realistic alternatives.

The benefits should be realistic and quantifiable and, wherever possible, the benefits should be quantified in monetary terms. This allows the decision maker to compare the benefits and costs (which can usually be expressed in monetary terms), and so clearly see the return on investment, and compare this business case with other calls on their funding and staff.

Taking it further:
If the sector is to build a higher level picture of the business cases for exploiting activity data and also for pursuing the path towards open data then it is important to share knowledge of what works in terms of convincing key decision makers to give sustained support to using activity data.

The programme has produced some example business cases which can be used to understand the type of information that it is sensible to include, and which may form the basis for your business case. However, the business case must relate to the local circumstances in which you are writing it, and the audience for which you are writing it.

Additional resources:

Guidance and templates

Examples and further reading

Thursday, 23 June 2011

JISC online consultation

JISC is currently undertaking a consultation exercise and wrote:

As part of our institutional engagement work, the JISC Organisation and User Technologies Team is carrying out an online consultation (using moodle) to identify emerging issues and concerns in UK Higher Education that we may, in the future, be looking to develop programmes of activity around. There are five top level areas each with a discussion forum attached, please feel free to either post a new concern or issue or respond to someone else’s post.
The site is at http://scenarios.conted.ox.ac.uk/course/view.php?id=2 it’s a moodle site so there is a quick and simple 2 part registration before you post
Anything you can contribute will be helpful in shaping our future plans


I have added the following post on analytics - you may wish to comment or add others...


One of the key factors for both students and universities will be student success; though at times they may have different definitions of what this means.

There are two key areas here; retention and outcome (loosely result but also that the student has achieved what they set out to do). Retention is already good by international standards, but this does not give grounds for complacency, and there is much that can (and is) being done to improve it.

It is arguable that student success is also one of the factors in the student experience.

In this posting I want to look at one tool that can be used to enhance student success, where JISC is already doing some work, but much more could be done and would have a very positive return on investment for institutions. This is data analytics to support student success.

Universities and colleges are already collecting vast amounts of data about their students, but making very little use of it. Every time a student logs on to the VLE, undertakes a search of the library resources, accesses an e-journal, swipes their card through the library turnstile or lecture theatre the event is recorded in logs on servers at the university. Most of the time this information simply sits there gathering electronic dust until it is archived or deleted. However, there is much valuable information that could be used to help students to help themselves.

For example there are patterns of behaviour which may give early indications that a student is at risk of dropping out (non- attendance, declining use of VLE perhaps) where early intervention to support students may help them to achieve the results that they wanted to.

Similarly there are patterns of behaviour which may indicate that students are studying as effectively as they might, again where early intervention could be of great assistance to the student.

There are a number of areas where intervention at the national level would be of great value to the sector. These include:

  • Understanding the information that universities have available to them
  • Identifying patterns associated with success and failure. Note that these are likely to be discipline dependent. Some disciplines make much more use the library than others. They are also likely to be institution dependent as, for instance some universities make much more use of VLEs than others,
  • Developing algorithms to identify students at risk or with sub-optimal study patterns
  • Researching methods of intervention that actually support students to succeed. There is evidence that some approaches may be counter-productive

These methods can form part of the way in which to enhance student learning and success, and where national support will enable all universities and colleges to achieve more than they could by developing the tools and algorithms for themselves.

Wednesday, 22 June 2011

Tabbloid: 22 June 2011


As you'll see from the Tabbloid digest, the synthesis team have had a busy week here on the blog. We've shared the first draft of the recommendations we've submitted to JISC. We've also published the following draft Guides:
Your comments on the draft guides and recommendations are very much welcomed between now and the end of August when we will be submitting final versions of them to JISC.

The projects have been busy too:
A retweet by Dave Pattern about Derek Rodriguez's article on 'Understanding library impacts on student learning' led me off on a small trail of oblique serendipity:
And, in other news, a couple of things relating to anonymisation hit our radar this week

Tuesday, 21 June 2011

Activity Data Synthesis Project: Recommendations

The following is the recommendations that we have submitted to JISC. Your comments would be most welcome by both JISC and us..

Introduction

This is an informal report outlining the likely recommendations from the Activity Data projects to help JISC to determine future work in the area. This is not intended as a public document, rather to stimulate discussion and lead to a more formal document at a later stage.

There are two things to note at this stage.

  • Activity data can serve a wide variety of different functions as exemplified by the range of projects in this programme. However the greatest impact (and return on investment) will be from supporting student success.
  • We suggest that the next call explicitly funds other universities to pick up of the techniques and / or software systems that have been developed in this programme in order to see if they are useful beyond the initial institution, and in this process, discover what the issues may be to make effective use of the techniques and / or systems. However, this may not be in accordance with JISC’s standard practice and is not an essential part of the recommendations.

The recommendations appear under the following topic areas:

  • Student success
  • Student and researcher experience
  • Collection management.

Student success

“It is a truth universally acknowledged that”[1] early identification of students at risk and timely intervention must[2] lead to greater success. It is believed that some of the patterns of behaviour that can be identified through activity data will indicate students who are at risk and could be supported by early intervention. It has also been demonstrated in work in the US that it can help students in the middle to improve their grades[3].

Recommendations:

  1. In year 2, JISC should fund research into what is needed to build effective student success dashboards
    Work is needed at least in the following areas:
    • Determination of the most useful sources of data that can underpin the analytics
    • Identification of effective and sub-optimal study patterns that can be found from the above data.
    • Design and development of appropriate algorithms to extract this data. We advise that this should include statisticians with experience in relevant areas such as recommender systems.
    • Watching what others are doing including in the areas of learning analytics, including VLE developer activity developments.

At this stage it is not clear what the most appropriate solutions are likely to be; therefore, it is recommended that this is an area where we need to “let a thousand flowers bloom”. However, it also means that it is essential that projects collaborate in order to ensure that projects, and the wider community, learn any lessons.

  1. In year 2 or 3, JISC should pilot some of the systems developed under the current programme:

Student and researcher experience

This area is primarily concerned with using recommender systems to help students and (junior) researchers locate useful material that they might not otherwise find, or would find much harder to discover.

Recommendations

  1. It is recommended that in year 2, JISC fund additional work in the area of recommender systems for resource discovery.
    In particular work is needed in the following areas:
    • Investigation of the issues and tradeoffs inherent in developing institutional versus shared services recommender systems. For instance there are likely to be at least some problems associated with recommending resources which are not available locally.
    • Investigating and trialling the combination of activity data with rating data. In doing this there need to be acknowledgement that users are very frequently disinclined to provide ratings, and that ways to reduce barriers to participation and increase engagement with rating processes need to be discovered in the context of the system under development and its potential users.
    • Investigation and implementation of appropriate algorithms. This should look at existing algorithms in use and their broader applicability. We advise that this should include statisticians with experience in areas such as pattern analysis and recommender systems.
    • Some of the systems developed under this programme should be piloted elsewhere.

Collection management

Activity data provides information on what is actually being used / accessed. The opportunity exists to use data on and how and where resources are being used at a much finer level of granularity than is currently available. Activity data can therefore be used to help inform collection management.

Note that this is an area where shared or open data may be particularly valuable in helping to identify important gaps in a collection.

  1. It is recommended that in the coming year JISC should fund work to investigate how activity data can support collection management.
    In particular work is needed in the following areas:
    • Consider how activity data can supplement data that libraries are already obtaining from publishers, through projects such as JUSP.
    • Work with UK Research Reserve.
    • Assess the potential to include the Open Access publications domain in this work.
    • Pilot work from this programme to see if the data that they are using is helpful in this area.

Other areas

The following are important areas that JISC should pursue.

  1. It is recommended that JISC continue work on open data for activity data, and in particular investigates appropriate standard formats.
  2. It is recommended that one or more projects in year 2 should investigate the value of a mixed activity data approach in connection with no-SQL data stores in order to maximise flexibility in the accumulation, aggregation and analysis of activity data and supporting data sets; the US Learning Registry project may be relevant.
  3. It is recommended that JISC ask appropriate experts (such as Naomi Korn / Charles Oppenheim or JISC Legal) to provide advice on the legal aspects such as privacy and data sharing, similar to Licensing Open Data: A Practical Guide (written for the Discovery project).

Other

The Activity Data Synthesis Project is not in a position to make any recommendation over the use of linked data in this area in the absence of any compelling use.



[1] Austen J, Pride and prejudice

[2] In reality – “highly likely” – but that does not fit with the quote

[3] Arnold K, Signals: Applying Academic Analytics, Educause Quarterly, 2010, Vol 33 No 1 http://www.educause.edu/EDUCAUSE+Quarterly/EDUCAUSEQuarterlyMagazineVolum/SignalsApplyingAcademicAnalyti/199385 or http://bit.ly/c5Z5Zu

Monday, 20 June 2011

Online Exchange #2: Event Recording [2 June 2011]

On the 2nd June we held the second of our Activity Data Virtual Meetings using Webex online conferencing tool. The hour-long session can be downloaded or streamed using the following links:
We heard from Richard Nurse who talked us through the Open University RISE project and shared the progress they've made so far. [Richard's presentation starts at the 11min mark]


We also heard from Sheila Fraser who presented an overview of EDINA's Using OpenURL Activity Data project and touched on how the data might be used, as well as inviting participants to suggest ideas and discuss the issues around using other institutions' data. [Sheila's presentation starts at the 20min, 20secs mark]


We also had speakers lined up to share information and experience about the Journal Usage Stats Portal (JUSP), Metridoc and the RAPTOR project but unfortunately a technical glitch in Webex meant that we had to postpone their contributions to a future session.

[NB: you can view the in session chat box by selecting 'View' >> 'Chat' from the menu at the top of the Webex playback window]

Draft Guide: 'Bringing activity data to life'

[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]

The problem:
Activity and attention data is typically large scale and may combine data from a variety of sources (e.g. learning, library, access management) and events (turnstile entry, system login, search, refine, download, borrow, return, review, rate, etc). It needs methods to make it amenable to analysis.

It is easy to think of visualisation simply as a tool to help our audiences (e.g. management) ‘see’ the messages (trends, correlations, etc) that we wish to highlight from our datasets. However experience with ‘big’ data indicates that visualisation and simulation tools are equally important for the expert, assisting in the formative steps of identifying patterns and trends to inform further investigation, analysis and ultimately the development of such as Performance Indicators.

The options:
Statisticians and scientists have a long history of using computer tools, which can be complex to drive. At the other extreme, spreadsheets such as Excel have popularised basic graphical display for relatively small data sets. However, a number of drivers (ranging from cloud processing capability to software version control) have led to a recent explosion of high quality visualization tools capable of working with a wide variety of data formats and therefore accessible to all skill levels (including the humble spreadsheet user).

Taking it further:
Youtube is a source of introductory videos for tools in this space, ranging from Microsoft Excel features to the cloud based processing from Google and IBM to tools such as Gephi, which originated in the world of version control. Here are some tools recommended by people like us:
Excel Animated Chart - http://www.youtube.com/watch?v=KWxemQq10AM&NR=1
Excel Bubble Chart - http://www.youtube.com/watch?v=fFOgLe8z5LY

Google Motion Chart - http://code.google.com/apis/chart/interactive/docs/gallery/motionchart.html

IBM Many Eyes - http://www.youtube.com/watch?v=aAYDBZt7Xk0
Use Many Eyes at http://www-958.ibm.com/software/data/cognos/manyeyes/

Gapminder Desktop - http://www.youtube.com/watch?v=duGLdEzlIrs
See also http://www.gapminder.org/

Gephi - http://www.youtube.com/watch?v=bXCBh6QH5W0

Gourse - http://www.youtube.com/watch?v=E5xPMW5fg48

Additional resources:
To grasp the potential, watch Hans Rosling famously using Gapminder in his TED talk on third world myths - http://www.youtube.com/watch?v=RUwS1uAdUcI&NR=1
UK-based Tony Hirst (@pyschemedia) has posted examples of such tools in action – see his Youtube channel - http://www.youtube.com/profile?user=psychemedia. Posts include Google Motion Chart using Formula 1 data, Gourse using Edina OpenURL data and a demo of IBM Many Eyes.
A wide ranging introduction to hundreds of visualisation tools and methods is provided at http://www.visualcomplexity.com/vc/

Draft Guide: 'Enabling student success'

[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]

The problem:
Universities and colleges are focused on supporting students both generally and individually to ensure retention and to assure success. The associated challenges are exacerbated by large student numbers and as teaching and learning becomes more ‘virtualised’. Institutions are therefore looking for indicators that will assist in timely identification of such as ‘at risk’ learners so they can be proactively engaged with the appropriate academic and personal support services.

The options:
Whilst computer enabled systems may be part of the problem, they can certainly contribute significantly to the solution through identification of patterns of learning and associated activity that highlight ‘danger signs’ and sub-optimal practice and by the automation of ‘alarms’ (e.g. traffic light indicators, alerts) triggered by one or more indicators. This approach forms part of the field of ‘learning analytics’, which is increasingly popular in North America.

Well-chosen indicators do not necessarily imply a cause and effect relationship, but they do provide a means to single out individuals using automatically collected activity data, typically combining a bundle of indicators (e.g. Students who do not visit the library in Term 1 may be at risk; students who also do not download content from the VLE are highly likely to be at risk).

Taking it further:
Institutions wishing to develop these capabilities may be assisted by this checklist:
  • Consider how institutions have developed thinking and methods in such as the JISC Activity Data programme - see resources below
  • Identify where log information about learning –related systems ‘events’ are already collected (e.g. Learning, library, turnstile and logon / authentication systems);
  • Understand the standard guidance on privacy and data protection relating to the processing and storage of such data
  • Engage the right team, likely to include key academic and support managers as well as IT services; a statistician versed in analytics may also be of assistance as this is relatively large scale data
  • Decide whether to collect data relating to a known or suspected indicator (like the example above) or to analyse the data more broadly to identify whatever patterns exist
  • Run an bounded experiment to test a specific hypothesis
Additional resources:
Three projects in the JISC Activity Data programme investigated these opportunities at Cambridge, Huddersfield and Leeds Met universities.

See Activity Data Guide on ‘Data Strategies’ to maximise your potential to identify and track indicators

More about Learning Analytics in the 2011 Educause Horizon Report - http://www.educause.edu/node/645/tid/39193?time=1307689897

Academic Analytics: The Uses of Management Information and Technology in Higher Education, Goldstein P and Katz R, ECAR, 2005 - http://www.educause.edu/ers0508

Draft Guide: 'Identifying activity data in the library service'

[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]

The problem:
Libraries use a range of software systems through which users interact with premises, services and resources. The LMS system is far from the only source, the OPAC and the LMS circulation module representing increasingly partial views of user attention, activity and usage in a changing world. So libraries wishing to build a picture of user interactions face the challenge of identifying the appropriate data – depending on their purpose, which may range from collection management (clearing redundant material, building ‘short loan’ capacity) to providing student success performance indicators (if correlation can be established), to developing recommender services (students who used this also used that, searched for this retrieved that, etc).
Let’s split the problem down. In this guide we consider the variety of sources available within library services, a list to which you may add more. In other guides we consider strategies for deriving intelligence from ‘anything that moves’ as well as from targeted data extraction and aggregation with reference to specific goals.

The options:
Libraries already working with activity data have identified a range of sources and purposes – Collection Management, Service Improvement, Student Success and Recommender Services. Potential uses of data will be limited where the user is not identified in the activity (‘No attribution’). Here are some key examples:

Data Source

What can be counted

Value of the intelligence

Turnstile

Visits to library

Service improvement, Student success

Website

Virtual visits to library (no attribution)

Service improvement

OPAC

Searches made, search terms used, full records retrieved (no attribution)

Recommender system, Student success

Circulation

Books borrowed, renewed

Collection management, Recommender system, Student success

URL Resolver

Accesses to e-journal articles

Recommender system, Collection management

Counter Stats

Downloads of e-journal articles

Collection management

Reading Lists

Occurrence of books and articles – a proxy for recommendation

Recommender system

Help Desk

Queries received

Service improvement


Taking it further:
Here are some important questions to ask before you start to work with user activity data:
  • Can our systems generate that data?
  • Are we collecting it? Sometimes these facilities exist but are switched off
  • Is there enough of it to make any sense? How long have we been collecting data and how much data is collected per year?
  • Will it serve the analytical purpose we have in mind? Or could it trigger new analyses?
  • Should we combine a number of these sources to paint a fuller picture? If so, are there reliable codes held in common across the relevant systems – such as User ID?
Additional resources:
Consider also the Guides on Student Success and Data Strategies

The Library Impact Data Project (LIDP) led by the University of Huddersfield - http://library.hud.ac.uk/blogs/projects/lidp/

Draft Guide: 'Strategies for collecting and storing activity data'

[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'References' section]

The problem:
Activity data typically comes in large volumes that require processing to be useful. The challenge is where to start and at what stage to become selective (e.g. analyse student transactions and not staff) and to aggregate (add transactions together – e.g. 1 record per day for books borrowed).
If we are being driven by information requests or existing Performance Indicators, we will typically manipulate (select, aggregate) the raw data early. Alternatively, if we are searching for whatever the data might tell us then maintaining granularity is essential (e.g. if you aggregate by time period, by event or by cohort, you may be burying vital clues). However, there is the added dimension of data protection – raw activity datasets probably contain links to individuals and therefore aggregation may be a good safeguard (though only partial, as you may still need to throw away low incidence groupings that could betray individual identity).

The options:
It is therefore important to consider the differences between two approaches before you start burning bridges by selection / aggregation or unnecessarily filling terabytes of storage.
Approach 1 - Start with a pre-determined performance indicator or other statistical requirement and therefore selectively extract, aggregate and analyse a subset of the data accordingly; for example:
  • Analyse library circulation trends by time period or by faculty or …
  • Analyse VLE logs to identify users according to their access patterns (time of day, length of session)
Approach 2 - Analyse the full set (or sets) of available data in search of patterns using data mining and statistical techniques. This is likely to be an iterative process involving established statistical techniques (and tools), leading to cross-tabulation of discovered patterns, for example:
  • Discovery 1 – A very low proportion of lecturers never post content in the VLE
  • Discovery 2 – A very low proportion of students never download content
  • Discovery 3 – These groups are both growing year on year
  • Pattern – The vast majority of both groups are not based in the UK (and the surprise is very low subject area or course correlation between the lecturers and the students)
Additional resources:
Approach 1 – The Library Impact Data Project (#LIDP) had a hypothesis and went about collecting data to test it - http://library.hud.ac.uk/blogs/projects/lidp/
Approach 2 - The Exposing VLE Data project (#EVAD) was faced with the availability of around 40 million VLE event records covering 5 years and decided to investigate the patterns - http://vledata.blogspot.com/

Recommender systems (a particular form of data mining used by such as supermarkets and online stores) typically adopt Approach 2, looking for patterns using established statistical techniques - http://en.wikipedia.org/wiki/Recommender_system and http://en.wikipedia.org/wiki/Data_Mining

Monday, 13 June 2011

Tabbloid news digest: 13 June 2011

Lo and behold, my Tabbloid resuscitation skills have worked this week:


Some of the projects have reported on unexpected project hiccups which will no doubt resonate with anyone who has worked on a similar project:
- A 'regime change' at Leeds Met means that they're having to regain buy-in for the project and the project team have been asked to submit a paper to the Vice Chancellors Group containing a proposal for an extended trial of STAR-Trak.
- The EVAD team have been retrieving archived data and having to deal with corrupt and missing data. As they say, their experiences "illustrates the problems of dealing with data that’s collected but not looked at very often", which reminded me of things I've seen around digital preservation and 'data rot', which says that data stored becomes less reliable as the ability to store it increases. Unfortunately it's only when we find a use for that data that we discover whether the data we think we've been collecting is actually there at all/how intact it is.

One of the news highlights last week was the release of OpenUrl data and it's good to see that an initial exploration of that data has already happened. Tony Hirst shared how he's been using nothing more than the command line to explore OpenUrl's hefty dataset. Mark van Harmelen was inspired by Tony's efforts to have a play with the data himself and selected Ruby as his data digging weapon of choice. What struck me as interesting was that both Tony and Mark's curiousity was picqued by the fact that there was data on Mendeley (which makes me wonder how long it will be before one of the guys at Mendeley get tempted into digging around in the data themselves). Also of interest to me was the fact that because more than two people were delving into the data and publishing what they found that meant they could cross-check what they found with each others results - very useful!

Tony Hirst has also been using a tool called Gourse to create hypnotically watchable videos of OpenURL data visualisations [see post one and post two on Tony's blog for further information]. e.g.:



It certainly puts a new spin on the 'let a thousand flowers bloom' phrase that I hear so often in the world of open data.

A couple of other highlights from the last week of JISC AD project blogs:
- The AGtivity project published a 'recipe' for producing a basic activity diary report. They also shared their thoughts on users, serendipity and use cases.
- The AEIOU project reported on the suggestions which came out of their first focus group.

Monday, 6 June 2011

Another unTabbloid news digest

'My personal battle with Tabbloid' seems to be emerging as an over-riding theme for my synthesis posts and, alas, this/last week was no different. As before, you can produce a Five Filters pdf newspaper on the fly but I noticed that not everything is showing up and only the last few tweets are included (seemingly due to a glitch with the Twitter Atom feed). Happily there is a Twapper Keeper archive of the #jiscad tweets which you can browse through as a supplement to the projects' blogposts. Perhaps next week I'll print out all the new blogposts and tweets and handcraft my own news digest from paper and glue, before scanning it in and adding all the necessary hyperlinks ... let's hope it doesn't come to that. Anyhow, here are some highlights I've drawn out from the last couple of weeks:

The OpenURL Project announced the release of OpenURL data under the ODC-PDDL license with an ODC-by-SA Attribution clause. Full details of the data the project has released, and the data itself, is available on their website.

The OU RISE Project are holding a one day 'Innovations in Activity Data' workshop at the Open University in Milton Keynes on 4th July. The day includes presentations from the RISE, SALT and LIDP projects and a presentation from Tony Hirst on visualizing activity data.

Over the past few weeks the OU RISE team have been knee deep in early user feedback, ahead of their main evaluation activity which is planned for July. Firstly in the form of feedback from focus groups which were asked about the usefulness of recommendations (as part of a wider OU Library search evaluation). The feedback gathered suggests that the provenance of recommendation was key in determining its usefulness. The results also suggest that there are differences in how provenance is judged depending on a student's level of study. Secondly, the results of an ongoing user survey which they've been analysing. The results are looking encouraging so far but are also raising more questions for them to delve into.

The UCAID project have written an interesting post on how their project differs to the rest of the Activity Data projects. Namely, that their focus is on making activity data available to individual users for their own benefit. It got me thinking about the range of personal vs public motivations and benefits behind the projects and whether the focus sometimes drifts too quickly towards the open data agenda. Maybe we could do with thinking deep thoughts about the Drucker principle / Pearson's Law which states that if something is measured then it improves - these seems particularly apt for those projects who have a keen interest in building data visualizations. Those are just some off the cuff ponderings from me - I might try and corral them into a future blogpost.

LIDP team discuss how they've been tackling one of their project's big issues and liaising with JISC Legal to ensure that they're complying with legal guidelines around accessing and releasing data. On a related note, the Discovery initiative recently published the timely 'Licensing Open Data: A Practical Guide' [pdf] by Naomi Korn and Professor Charles Oppenheim.

Lastly, news from the e-research JISC programme in the shape of a press release about the RAPTOR project which has just released v0.1 of their e-resources usage statistics tool.