Activity Data Synthesis

Wednesday, 29 June 2011

Draft Guide: 'Anonymising data'

The problem:
Data protection requirements mean that we cannot release personal data to other people without the data subjects' permission. Much of the activity data that is collected and used contains information which can identify the person responsible for its creation. It may contain their username, the IP address from which they were working or other information including patterns of behaviour that can identify them.

Therefore where information is to be released either as open data for anyone to consideration needs to be given to anonymising the data. This may also be required for sharing data with partners in a closed manner depending on the reasons for sharing and the nature of the data together with any consent provided by the user.

The options:
Two main options exist if you want to share data.

The first is to only share statistical data. As the Information commissioner recently wrote:
"Some data sharing doesn’t involve personal data, for example where only statistics that cannot identify anyone are being shared. Neither the Data Protection Act (DPA), nor this code of practice, apply to that type of sharing."

The second is to anonymise the personal data so that it cannot be traced back to an individual. This can take a number of forms. For instance, some log files store user names while other log files may store IP addresses, where a user uses a fixed IP address these could be traced back to them. anonymising the user name or IP address through some algorithm would prevent this. A further problem may arise where rare data might be able to be used to identify an individual. For instance a pattern of accessing some rare books could be identified to someone with a particular research interest.

Taking it further:
If you want to take it further then you will need to consider the following as a starting point:
  • Does the data you are considering releasing contain any personal information?
  • Are the people that you are sharing the data with already covered by the purpose the data was collected for (eg a student’s tutor)?
  • Is the personal information directly held in the data (user name, IP address)?
  • Does the data enable one to deduce who used that data (only x could have borrowed those two rare books – so what else have they borrowed)?
Additional resources:

No comments:

Post a comment