It’s a couple of weeks since our start up meeting in Birmingham and I’m sure your thinking is developing rapidly. However, some reflections on the ‘sum of the parts’ presented by all 9 projects may still be of value – certainly as a marker for the Synthesis Project. So here are the high level observations that we shared at the end of the Birmingham meeting:
Variety & Volume – Our group of projects is particularly impressive in terms of the variety of data sources (a wide range of library, learning, repository and admin applications), the available volumes of data (including multi-year) and the potential aggregations (with opportunities in library, repository and VLE spaces). What’s more many of you had already collected formal commitments to supply data as part of the bidding process. Given this potential feast of possibility, the next points are particularly important …
Time constraints - Given the timeline (5 months from 1 March), it may be best to plan backwards from the end point as well as forwards from the start – it’s always a useful sanity check. Whilst projects using an agile methodology could fit several sprints / iterations of activity in to the period, you are likely to be restricted by major milestones such as getting hold of the data (and ingesting / amalgamating it) in the first place.
Technical priorities – It may be wise to park the tech challenges relating to scalability and repeatability and to concentrate on low cost agile experiments that will prove your hypothesis … or not! Given that something worthwhile emerges, it is highly likely that second order issues of performance and automation can be addressed post-hoc – and with relative ease, given the available tools.
Algorithmic investigations – The experience of projects such as MOSAIC indicates that investigation (theoretical and practical) of the algorithms that underpin data processing (e.g. ingest), analysis, filtering and presentation will be really important, and increasingly so as the data scales beyond an initial experiment. And if you come up with a rule or algorithm (like the Huddersfield example of discarding course activity data where there are less than 35 students), please share it.
Legal concerns – Whilst data protection and other legal issues do not appear to be show stoppers for this work right now, it will benefit us all to catalogue issues and responses, risks and mitigations as they emerge; I’m going to start a legal issues register (without calling it a risk register!) with the points raised in Birmingham, hoping you’ll contribute more as we progress. We also agreed that we should address these challenges (ghost busting?) in a ‘can do’ manner, evidencing where institutions are taking affirmative action (e.g. upgrading privacy statements as highlighted by the Edina project) rather than diving for cover.