Activity Data Synthesis

Monday, 23 May 2011

Academic analytics resources from Educause


Traffic light warning system used at Purdue University


This post looks at some of the material available on the Educause web site relating to the use of what they call academic analytics (and we seem to be calling activity data) to support student success.

What Educause http://www.educause.edu/ calls academic analytics is very similar to what we are calling activity data (though clearly the focus is different, as with activity data the focus is on the data while with the analytics it is on the tools, presentation and use, so I guess that I prefer the analytics term). In one report (Academic Analytics: The Uses of Management Information and Technology in Higher Education) they say that academic analytics “describe[s] the intersection of technology, information, management culture, and the application of information to manage the academic enterprise.”

Anyhow over the last few years Educause has produced some very useful material, most of which is available from their Academic Analytics page http://www.educause.edu/Resources/Browse/Academic%20Analytics/16930

Here I will pick out some of the things that might be of interest to you.

7 Things You Should Know About Analytics http://www.educause.edu/ir/library/pdf/ELI7059.pdf Educause produce a series of reports entitles 7 things you should know about x. These are very brief about two sides and include a story / case study, definition and some of the key issues. They can be very useful introduction to those who do not already know about what you are doing, and come from an independent authoritative source.

2011 Horizon Report

http://www.educause.edu/Resources/2011HorizonReport/223122

The Horizon report is produced annually by Educause and looks at technologies that are going to have an impact over the next year, two-three years and four to five years. Not all technologies make it from important in four to five years to important now (the joys of futurology).

Its part based on survey and part expert discussion and provides a very broad brush overview of the technology. This year one of the areas that they have picked out for the four to five year time frame is learning analytics. Discussed on pp28-30. It provides a two page overview and some examples and further reading.

Signals: Applying Academic Analytics http://www.educause.edu/EDUCAUSE+Quarterly/EDUCAUSEQuarterlyMagazineVolum/SignalsApplyingAcademicAnalyti/199385 or http://bit.ly/c5Z5Zu This is a fascinating case study from Purdue University, where they say that the use of analytics has improved results, and led those in greatest danger of failing to switch courses earlier. They ran the trial using a control group, so there results have some validity, and courses were sufficiently large for the results to be meaningful.Their statistics include

“Over the succeeding weeks, 55 percent of the students in the red category moved into the moderate risk group (in this case, represented by a C), 24.4 percent actually moved from the red to the green group (in this case, an A or B), and 10.6 percent of the students initially placed in the red group remained there. In the yellow group, 69 percent rose to the green level, while 31 percent stayed in the yellow group”

Although they don’t say how the outcome compares with the control group.

Academic Analytics: The Uses of Management Information and Technology in Higher Education http://www.educause.edu/ers0508, a book discussing analytics in HE. Dated 2005 it still has interesting stuff in it.

Among the things to note is the sources of data people are (were) using in their analytics:

Table 6-3. Information Contained in Data Stores or Warehouses (N = 213)

Source

Percentage

Student information system

93.0%

Financial system

84.5%

Admissions

77.5%

HR system

73.7%

Advancement

36.2%

Course management system

29.5%

Ancillary systems (e.g., housing)

28.2%

Grants management

27.7%

Department-/school-specific system

22.5%

Comparative peer data

20.2%

Feeder institutions (high schools)

9.4%

Note that there is no mention of Library systems of any type and the strong emphasis on administrative systems rather than academic systems.

Presentation on analytics: Academic Analytics: A New Tool for a New Era http://www.educause.edu/Resources/AcademicAnalyticsANewToolforaN/162057 the slides themselves are a bit thin, but No 19 is interesting

Results to Date

Typically 10-20% of students receive a message

Highest Risk:

  • Most remained “at risk”
  • Still unlikely to take advantage of resources

Lower Risk:

  • Majority were able to leave the “at risk” status
  • More likely to take advantage of resources

How the ICCOC Uses Analytics to Increase Student Success

Case study of the use of analytics at Iowa Community College Online Consortium to improve student retention at . http://www.educause.edu/EDUCAUSE+Quarterly/EDUCAUSEQuarterlyMagazineVolum/HowtheICCOCUsesAnalyticstoIncr/219112 or http://bit.ly/m2HgnZ. Over the period 2005-9 they increased the student success rate from 77% to 85%. What is not clear is how much of the improvement relates to the analytics and how much derives from other work to improve student success.


There is much more and I will post some other summaries of articles that I think will be useful later.

Wednesday, 18 May 2011

A Slight Return - Tabbloid #7-ish



So you'll see that I managed to get Tabbloid up and running again - I asked it to arrive weekly, first thing on a Wednesday and it arrived on Monday instead and it goes all the way back to 12 April but I suppose you can't have everything - I'm optimistic that it will settle down next week and we can move on and forget all about this sorry episode ;-)

Last week the Information Commisioner's Office published the 'Data Sharing Code of Practice' and Tom wrote a useful overview of the report here on this blog.

The #OURISE project team hit another project milestone with the launch of their RISE Google Gadget prototype. It will certainly be interesting to see how many users take the plunge and add the gadget without much provocation and what the pattern of uptake looks like in the weeks and months ahead. Maybe they'll be able to compare notes with the #CULwidgets team at Cambridge who developed all manner of Google plug-ins as part of the JISC LMS programme.

In addition to completing the relatively herculean task of exporting usage data for 46,575 graduates for the #LIDP project, Dave Pattern appeared to have the audience slightly aflutter following his presentation at the CILIP Cymru Conference. You can access a copy of his presentation via his blog and I'll keep an eye out for the post-conference conversations as they emerge on the LIDP project blog.

How To Guide [Draft]: 'How to inform your users about data processing'

[This is a draft of a How To Guide that will be published as an deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this How To Guide.]

The problem:

In planning the OpenURL Router activity data project, EDINA became aware that by processing activity data generated by the OpenURL Router service it effectively acts as a ‘data processor’. Even the act of deletion of data constitutes processing so it is difficult to avoid the status of data processor if activity is logged. In the project, EDINA is collecting, anonymising and aggregating activity data from the Router service but has no direct contact with end users. Thus, it can only discharge its data protection duties through individual institutions that are registered with the Router.

The solution:

After taking legal advice, EDINA drafted a paragraph to supply to institutions that use the OpenURL Router service for them to add into their institutional privacy policies:
“When you search for and/or access bibliographic resources such as journal articles, your request may be routed through the UK OpenURL Router Service (openurl.ac.uk), which is administered by EDINA at the University of Edinburgh. The Router service captures and anonymises activity data which are then included in an aggregation of data about use of bibliographic resources throughout UK Higher Education (UK HE). The aggregation is used as the basis of services for users in UK HE and is made available so that others may use it as the basis of services. The aggregation contains no information that could identify you as an individual."
EDINA wrote to the institutional contacts for the OpenURL Router service giving them the opportunity to ‘opt out’ of this initiative, i.e. to have data related to their institutional OpenURL resolver service excluded from the aggregation. Institutions opting out had no need to revise their privacy policies. Fewer than 10% of institutions that are registered with the OpenURL Router opted out and several of those only did so temporarily, pending revision of their privacy policies.

Taking it further:

If you plan to process and release anonymised activity data, you may use the EDINA example above as the basis of a paragraph in your own privacy policy – in consultation with your institution’s legal team. If your institution has already incorporated the paragraph because you are registered with the OpenURL Router, you may simply amend it to reflect the further activities that you undertake.

Additional resources:

The research undertaken by EDINA and the advice received prior to adopting this approach: http://edina.ac.uk/projects/Using_OpenURL_Activity_Data_Initial_Investigation_2011.pdf

The University of Edinburgh’s Data Protection policies and definitions: http://www.recordsmanagement.ed.ac.uk/InfoStaff/DPstaff/DataProtection.htm
http://www.recordsmanagement.ed.ac.uk/InfoStaff/DPstaff/DPDefinitions.htm

The University of Edinburgh’s Website Privacy policy:
http://www.ed.ac.uk/about/website/privacy-policy

JISC Legal’s ‘Data Protection Code of Practice for FE & HE’ [2008]:
http://www.jisclegal.ac.uk/Portals/12/Documents/PDFs/DPACodeofpractice.pdf

Information Commissioner’s Office’s ‘Privacy by design’ resources:
http://www.ico.gov.uk/for_organisations/data_protection/topic_guides/privacy_by_design.aspx

Information about the EDINA 'Using OpenURL Activity Data' project:
http://edina.ac.uk/projects/Using_OpenURL_Activity_data_summary.html

[n.b. revised on 23 May with minor amendments]

Wednesday, 11 May 2011

Information Commissioner’s Office publishes UK code of practice on data sharing

For those of you thinking of sharing or publishing personal data as a result of these projects may be interested in the "Data sharing code of practice" from the Information Commissioner's office.

A mere 59 pages available as a pdf from the Information Commissioner's office.

A few quotes may give you a little of the flavour:


"As I said in launching the public consultation on the draft of this code, under the right circumstances and for the right reasons, data sharing across and between organisations can play a crucial role in providing a better, more efficient service to customers in a range of sectors – both public and private. But citizens’ and consumers’ rights under the Data Protection Act must be respected."

"Organisations that don’t understand what can and cannot be done legally are as likely to disadvantage their clients through excessive caution as they are by carelessness."

"the code isn’t really about ‘sharing’ in the plain English sense. It’s more about different types of disclosure, often involving many organisations and very complex information chains; chains that grow ever longer, crossing organisational and even national boundaries."

The code covers activities such as:

  • two departments of a local authority exchanging information to promote one of the authority’s services;
  • a school providing information about pupils to a research organisation;

By ‘data sharing’ we mean the disclosure of data from one or more organisations to a third party organisation or organisations, or the sharing of data between different parts of an organisation. Data sharing can take the form of:

  • a reciprocal exchange of data;
  • one or more organisations providing data to a third party or parties;
  • several organisations pooling information and making it available to each other;
  • several organisations pooling information and making it available to a third party or parties;
  • different parts of the same organisation making data available to each other.

When we talk about ‘data sharing’ most people will understand this as sharing data between organisations. However, the data protection principles also apply to the sharing of information within an organisation – for example between the different departments of a local authority or financial services company.


When deciding whether to enter into an arrangement to share personal data (either as a provider, a recipient or both) you need to identify the objective that it is meant to achieve. You should consider the potential benefits and risks, either to individuals or society, of sharing the data. You should also assess the likely results of not sharing the data. You should ask yourself:

  • What is the sharing meant to achieve? ...
  • What information needs to be shared? ....
  • Who requires access to the shared personal data? .....
  • When should it be shared? ....
  • How should it be shared? ....
  • How can we check the sharing is achieving its objectives? ....
  • What risk does the data sharing pose? ....
  • Could the objective be achieved without sharing the data or by anonymising it? [my emphasis] It is not appropriate to use personal data to plan service provision, for example, where this could be done with information that does not amount to personal data.
  • Do I need to update my notification?
  • Will any of the data be transferred outside of the European Economic Area (EEA)?


Whilst consent will provide a basis on which organisations can share personal data, the ICO recognises that it is not always achievable or even desirable.

If you are going to rely on consent as your condition you must be sure that individuals know precisely what data sharing they are consenting to and understand its implications for them. They must also have genuine control over whether or not the data sharing takes place.

-- it goes on to say where consent is most appropriate and what other conditions allow sharing (p14-15), with some examples of what is permissable

The general rule in the DPA is that individuals should, at least, be aware that personal data about them has been, or is going to be, shared – even if their consent for the sharing is not needed.



The Data Protection Act (DPA) requires organisations to have appropriate technical and organisational measures in place when sharing personal data.

followed by lots of useful guidance on this area covering both physical and technical security


It is good practice to have a data sharing agreement in place, and to review it regularly, particularly where information is to be shared on a large scale, or on a regular basis.

and outlines what should be covered by the agreement (p25)

it is good practice to carry out a privacy impact assessment.


Agree common retention periods and deletion arrangements for the data you send and receive.

Things to avoid

  • Misleading individuals about whether you intend to share their information.
  • Sharing excessive or irrelevant information about people.
  • Sharing personal data when there is no need to do so
  • Not taking reasonable steps to ensure that information is accurate and up to date before you share it.
  • Using incompatible information systems to share personal data, resulting in the loss, corruption or degradation of the data.
  • Having inappropriate security measures in place,

Section 14 is on data sharing agreements pp41-3

Section 15 provides a data sharing checklist p46

the case study on p 55 covers research using data from other organisations


Tabbloid is dead, long live the tabloid


Technology has not been a friend of mine these past few weeks and after several attempts to rescusitate the failed weekly Tabbloid service I accepted defeat and looked for an alternative. So it is with great pleasure that I introduce you to the new weekly digest using FiveFeeds' open source MakePDF PDF newspaper maker. Alas it rendered as a series of mostly blank pages when I uploaded it to Issuu.com so the battle is not quite won, but if you click on the image above then it will dynamically generate the PDF on the spot for you. I'll also be emailing a copy out as usual.

So anyway onto the round up of the latest happenings in the world of #JISCAD ... if you look beyond my 'curses to all technology' headline tweets you will be treated to a rather uplifting post from the #JISCSALT project blog which reports a positively excited response from users to the prospect of academic library recommendations ... I don't have a data server but if I did then I would print out their article and tape it above :)

The #OURISE project have been dipping their toes into the robust anonymisation pool and delving beyond into the technical depths to look at how they will release their recommender data openly. They're looking for feedback as to the most useful format for the data they release but their current thinking is both XML and as a MySQL database. They're also soliciting feedback on their XML record format (which is based on the one developed by Mark van Harmelen as part of the MOSAIC project) so it looks like we have the makings of another 'recipe' emerging for our cookbook.

The #OURISE project have also shared some useful information regarding how they're making use of Google analytics to segment the behaviour of their users. And as if that wasn't enough, they reported that development of the RISE Google Gadget is complete and ready to be put through its paces in the user evaluations. I don't have the authority to hand out gold stars but if I did then the RISE project team would get one this week ;-)

The rest of the stories in this week's newspaper are from previous weeks and you'll be glad to know that I won't be treating you to a re-synthesis of those stories. Hopefully by next week technology will be behaving more co-operatively!

Wednesday, 4 May 2011

Set doors to manual

This week's round-up of activity on the project blogs and twitter will have a distinctly rustic and hand-cranked feel because the Tabbloid service that usually does a mighty fine job of collating it all for me appears to be on strike.

First off I'd like to send a, slightly belated, message of congratulations to the OU RISE team who went live with their My Recommendations tool towards the end of April. It was also pleasing to see that they found Mark van Harmelen's synthesis project visit a useful process to go through. It sounds like they've provided Mark with plenty of 'food for thought' for the synthesis Cookbook (don't worry, there's plenty more cookery-based puns where that came from).

While I'm on the topic of the Cookbook it's probably a good time to bring Mark's explanation of the cookery metaphor to your attention. Tom Franklin kindly submitted a preliminary recipe for chocolate fudge brownies which hopefully you'll all have a go at ... and I hasten to add that I will gladly volunteer for any user testing that you carry out on that particular recipe. Joking aside, we will be releasing and refining the 'recipes' over the coming months and your input will be much appreciated.

The AEIOU project team have been in a ponderous mood on their blog where they've been contemplating what will be the best tool to use for digesting and regurgitating data for their recommender service. The conclusion of their pondering appears to be an FLA soup served on a bed of SQL database (n.b. it's *really hard* to drop the cookery metaphor once you start). The AEIOU project have been able to make use of a DSpace/EPrints patch that the PIRUS2 project released - to be honest the technical side of what they're doing is slightly beyond me but it is good to see a project benefitting from a previous JISC project's open innovation in this way.

The twitter feed for #jiscad has been quiet - which is understandable given the ratio of bank holidays to work days over the last couple of weeks. One of the (tangential) things I tweeted about that I think is worth highlighting again is the Digging into Data Challenge which is open for applications until 16 June 2011. There is also a free conference in June but unfortunately it's in Washington DC - hopefully there will be a livestream or at least a good deal of tweeting that we can follow. The challenge has been covered on the Times Higher website where they highlight the challenge of technology as enabling researchers to navigate the vast ocean of data that technology is making available through digitisation etc. This places data in an ecosystem where the data needs to be made usable in order for a cycle of virtuosity to be unleashed.