This talk is the second talk in a series about "The impact of the GDPR in the Higher Education sector". You can view the other talks by following this link.
General Data Protection Regulation (GDPR) is an European Union (EU) regulation intended to strengthen and unify data protection for all individuals within the EU. The enforcement date is 25th May 2018, at which time organisations in non-compliance will face heavy fines.
What will that mean for Higher Education Institutes (HEI) and other public institutions that need to deal with students' and prospects' personal data on a daily basis?
These and other questions were the focus of an event organised by FULL FABRIC, where a series of subject matter experts explain the impacts and the opportunities created by this regulation.
This is the second video in a series of six, with Lothar Fritsch, researcher and professor at Karlstads. Lothar's research focuses on Privacy enhancing technologies and is a key advocate on how to incorporate GDPR principles in Universities' IT stacks.
This video was filmed at FULL FABRIC's conference The Impact of the GDPR in Higher Education. The event was held at Imperial College London on 22 June 2017 as part of London EdTech Week.
Thank you very much. I guess it is warm in here, and we're all enduring the heat. At least I can stand, and that pumps my blood pressure up. Unfortunately, I'm the guy who is in between you and lunch, so I'll have to rush my presentation. Let me just set the clock up here, so I know where I am.
[00:00:30] I'm a technology person. Most of my academic life, I've been looking at how to implement data protection in technical measures, or how to manage computers and databases in a way that they actually fulfil requirements coming from data protection. My talk is going to focus about managing IT systems from a data protection perspective. It will be somewhat technical, discussing measures on how to set up a project team and make it work to increase technical compliance inspired by the GDPR.
[00:01:00] I must say, I'm one of those white, elderly male computer scientists in front of you. Unfortunately, still 95% of our computer engineering students are male and no politician ever talks about putting a quota on the first year. I guess we're not solving the problem anytime soon, unfortunately. What are we talking about when we talk about higher education and personal data?
[00:01:30] We actually talk about research data, which there will be a presentation later this afternoon, I suppose, which is surveys, experimentation, medical data, anything you can imagine about research and people. We talk about staffing records, which is human resources, salaries, career development, problems of individuals in the workplace and how they're being managed, lots of sensitive data. There's enough legal proceedings about this as well.
[00:02:00] There's student data, which have been probably mentioned this morning already about all aspects of being a student, mostly grade profiling, performance tracking, but there's student life in addition. There's door cards, access cards, there's signing up for student activities, there's partying, there's committees. There's lots of more personal data than just tracking the grades and performance of students that actually accumulate on campus, the lock files of the door lock system, for. There are phone records for what anyone on campus call patterns.
[00:02:30] People are using phones everywhere. There are metadata and locks, there's email headers that tell a lot about what people do, and they get archived in lock files, they get archived in databases. Again, door access locks, library records. I'm not digging into that. Library records have been a subject of tensions between, say, homeland security inspired agencies and data protection agencies. Who is reading what chemistry book for what purpose? It's all of a sudden a matter of national interest, and not a matter of private interest anymore.
[00:03:00] There are hospital medical records, which are mostly regulated in all countries and known as specific sensitive personal data, no matter whether it's just treatment or research and experimentation. We get all kinds of records Thinking as an IT manager on campus, there are spheres of influence for privacy management. On the left side, we talk about systems that are maintained by the institution on campus. It's the own databases with own administrators where actually we could have an influence on how they work, if you want to make them more compliant.
[00:03:30] There are systems that are shared with other higher education institutions. Most countries have a local network of universities that host systems that they all use, so they are shared applications developed by universities, or bought but operated by universities, shared with other universities. There the sphere of influence stretches over the whole group of universities.
There are systems purchased in binary or hardware, where we don't even know how they work, what they do, or how they comply, we just use them.
[00:04:00] Anything from Microsoft would be an example, or any database from Oracle where we configure some aspects, but we don't know the inner workings. We put the box here.
There are systems maintained by individuals on or off campus, and this seems to be the compliance manager's nightmare at universities, think research. It starts with master students, or maybe student helpers for a master student doing interviews. They end up in a desktop database on somebody's personal computer on campus.
[00:04:30] The IT manager doesn't even know this database exists, but it's actually their responsibility to govern it.
It keeps going on with PhD students, or independent postdoc researchers. What do you do if you start a new investigation on some preexisting server? You make a copy of it and then work on an extended databases, and then oops there's another database nobody knows about. This is happening all the time and you've got researcher mobility, you've got visiting researchers, you've got mobility. Here it gets really messy.
[00:05:00] Our systems outsourced to the commercial cloud, where you pay a service provider to host a system to deal with your problems and stay administrators, the systems are elsewhere. You don't care where they are basically, you put your data there and compute.
Hopefully, but next year you will have very proper contracts with such providers with what they're allowed to do, and what not to mitigate some problems. Those five issues are five different spheres of influence a local campus privacy manager has to deal with, with very different approaches.
[00:05:30] I'm going through a number of possible solutions, mostly in the local influence sphere. There are issues in there from privacy management that are serious, that are resources conflicts about budgets, priority, and staff. Hardly any university I have talked to so far has a dedicated team of several persons to deal with GDPR compliance. They have one IT person with part of his time allocated to come up with a suggested to the director of the university, but they have no one working operatively on it.
[00:06:00] They're conflicting goals between scientific open access, and archiving, and tracking student grades, and doing Bibliometrics evaluation of your researchers, and actual data protection. There are hard conflicts we know for medical research between verifying scientific data and medical subject privacy interest. There are legacy system that will need to come to new operation.
[00:06:30] This a major headache actually. You cannot just buy everything in new, and keep operations up. Nobody can do it anywhere.
Across organisation systems, service composition on the web, when the university start including Facebook in the communication strategy, the teacher start interacting through social media platforms hosted by someone else. Suddenly things blend together without any explicit policy of consent. Just in question is teacher interacting with Facebook through its university role, who actually is a data controller.
[00:07:00] So is it the university encouraging the student to go to Facebook, get on, and talk to the teacher, or is it Facebook that actually provides the communication channel to the teacher? It's not easy to sort out, actually.
Then there is legacy data of answered improvidence that is all the old databases that existed before GDPR that have been collected under various, or maybe under no regimes of compliance or policies. Are they still there, and that's the way research works. Keep them for 20 - 50 years and see if you can have new conclusions from them once you have new methods.
[00:07:30] Now you have to have a governance regime for them, and find out about their prominence, actually.
Privacy by design is in all words, which is a rather undefined methodology that is prominently quoted in GDPR discussions, which it's basically about finding out the threats, finding out the attackers or the accidents that can happen, get an idea about the risks, that personal data collection and processing creates.
[00:08:00] Make a multilevel analysis for the security infrastructure that respects the interests of all involved, and then implement and test a design, and then reiterate in the next round. Whenever the infrastructure changes, the purpose change, the cycle starts from zero.
There are a number of suggestions how to do that. This is the one from the Oasis Consortium, as it continues management process. Those of you familiar, I sold 27 thousand security management. We will see that this is somewhat overlapping with the information security and management system from the ISO standard.
[00:08:30] There are other standards, and they all are somewhat inspired by this ever ongoing cycle of definition of scenario, analysis of stakeholder, analyst of risk, selection of controls to contain the risks, implementation, evaluation. In the ISO standard it's called plan, do, check, act cycle and it's running internally. It needs permanent dedicated staff resources.
[00:09:00] One issue we are facing in privacy that is not information security, that is actually that there is two perspectives on risk. If you talk about compliance management you would often talk about a person or a team, whose mission is to protect the company from damage against whoever, or protect the company from loses against whoever. That is dealing with the side with the business risks. But GDPR actually looks at the risks for the users, the data subjects as well.
[00:09:30] If you take a glance at those risks the risks are substantially different. The company risks are about reputation, loss, lost business, lost opportunities, collisions with compliance, maybe being excluded from government tenders, maybe losing money for fines. Those are all well-experienced risks, quantifiable risks. If you move over to the dimension of the data subjects on the right side, you're getting into the unknown. We have very poorly defined proceedings out there.
[00:10:00] You have a student, you keep student records, you have your local social media for student life, where they of course post their wild party pictures at student time. 20 years later the person turns famous for some reason, politician, manager and the part pictures are still lying around in university system, but now they have become very valuable to certain corners of the press, for example, or maybe for blackmail, or foreign secret services, or whoever.
[00:10:30] At the time of posting the student pictures you classify of risk is substantially low, and then 10 years later the complex for the data subject changes. If you rerun the risk analysis you will find those party pictures pose a major risk. We don't have any working with that these days. The only advice I can give you is minimize the amount of that data in your system while you can. We don't have any routines that foresee the future and flip the system when people that are not on campus anymore suddenly change their life context.
[00:11:00] From a technical perspective there is a Dutch researcher, at Radboud University, and he looked at how to actually get privacy hammered into the system so you can demonstrate how to actually have privacy by design. He came up with privacy implementation strategies and part of the, again, a cycle, where you start with the privacy goals and you find some patterns, how to implement them. You select privacy in ancient technologies that actually implement the patterns.
[00:11:30] Then you test the system, evaluate it, and then you match your concept development to the evaluation result.
There is a very interesting paper to read. We do not have much time for that. Basically, on this side, you see the privacy strategies that are defined based on privacy law and GDPR, which are enforcement, demonstration of compliance, control of information processing, informing the data subject, minimise data collection, abstract
[00:12:00] the data into less risky, less detailed data, separate data from each other, or hide data from inappropriate use or access.
This table from this paper is very interesting. You start on the left side, when you think about your system and think what shall we do? Well, we can abstract the data by making it less detailed, we can separate it in different data pools in different places, which is commonly done in medical research, you have the blood results here, but the person's identity in a paper file locked away in the basement so it can not be stolen by hackers, for example.
[00:12:30] That will be an implementation of a separate regime.
If you go through the table you actually get some hints on the actions that you should do when planning, or changing IT systems. Look at the paper. It's on the slides and references in the end.
But I'm rushing through some suggestions on how to deal with the GDPR compliance. The first one is that privacy is more than just encryption, it's this idea that privacy is just security, access, control, encryption and that's it and it's not.
[00:13:00] It's about having a number of processes and the number of different concepts involved.
Unobservabiltiy, or anonymity is one of those concepts that's not done by encryption, because you still see who gets to talk A to B by sending encrypted messages. Unlinkability is a returning customer doing a transaction, but maybe you don't want the bookshop on the web to know every book you buy, you would very much like to come as a different buyer next time. That's unlinkability. That's not easily solved just with encryption, or access control.
[00:13:30] There is transparency, which we heard about, which is basically defining information processes and increased information awareness in all systems. That's nothing about classical IT security, it's about data management. Then there's multiparty policy enforcement with those meshed services on the weapon in social media where you then don't have any control. We have to make sure that the whole production chain of initial service is complied to a policy.
[00:14:00] Privacy management is actually processed that administered processors, data, data providence, data subjects and relationships to all the processors involved.
[00:14:30] What do you have to do if your system evolves, your business evolves, your university evolves, and you changed your policy? You have to version numbers on the policies and say oh, this data subject has actually given consent to policy version 2.5, but now we upgrade to 2.6. What do you do? You have to get new consent basically for people to comply to it the change 2.6 as well. You have to have a database of your old policies and have to have an idea of how to match old content with old policies to be aware of what you're allowed to do with your data that is there, actually.
[00:15:00] This might be a major headache.
Then actually the policy has to correspond to the system policy, and the system policies I referred to what the system is actually configured to do. I think the data protection authorities have quite an experience with having interesting privacy policies on the web, and then going to look at the systems and they look completely different.
[00:16:00] Make sure the policy matches the system that is there and the consequence, whenever the system changes substantially rerun the policy compliance cycle. Systems, risks, asset inventory, stake holder analysis, updating of the policy, fetching content again.
The transparency issues is another one. It is the knowledge of data subjects about the processing applied, the parties involved. All kinds of questions can come from that. What country does my data travel to?
[00:16:30] Maybe I don't like the country my data is traveling to, and it's all around you, just hire a rental care and your data is probably proceed by United States data processor outsourced to India. If you have the wrong kind of name scheme on your personal name you don't want your data to travel that way, possibly.
Meaningful reporting to data subjects, that's a major issue. We heard in the last presentation that maybe there might be a preventive fee from abuse, but if you're a tens of thousand of data subjects data controller, and you expect
[00:17:00] thousands of them to inquire after there was a data breach being known, this costs human resources to comply with. If you do not have an interface that accommodates that in more or less automated ways.
Interveneability. How do you deal with data subjects who are asking corrections of wrongly registered data? Manually with human staff, that will be expensive in the end. You want some form of customer self-care interfaces, where they can correct things like name changes once they get married, misspelled Email addresses or whatever.
[00:17:30] Otherwise you'll get this unformatted email and need someone to do it manually for you. Then that's a question of efficiency, basically.
Order evidence of breach notification is a major headache. Eventually you want to get audits for privacy compliance, so you want to demonstrate that the implemented measures are effective. You want to demonstrate mechanisms for consent collection and withdrawal for updating and enforcing policies.
[00:18:00] Policy enforcement is the part where having a good IT security regime is included. It's not just, oh, we encrypt a data base, nothing can be happen. You have to have governance rules and the encryption.
On the breach reporting readiness, there must be awareness of actually having lost personal data, and then you have to find out how risky the loss is, what dangers it may create, and you have to be able to send the message to the data subjects effected in the end, if it's a major loss.
[00:18:30] You have to identify whose data we lost and tell them, and if you have no way to tell them and access them you might have to have an ad in the newspaper in the end, if you cannot send personal messages anymore. That is a major screw up if that happens.
Privacy controls, managing consent. Today we have these boxes, so by being here I already have consented, or by walking into this room I already have consented of being recorded.
[00:19:00] I would say as this is a matter that might actually end up in courts. There is the burden of proof in court, so I would rather try to document consent in a way that is going to be accepted by a court, which online is some form of electronic signature consent at least. If you cannot prove that a consent was given. I mean with this click online to proceed things, what do people if they're challenged in court? They will say it wasn't me, the computer was on, I went to the bathroom and it was one of my fellow students who clicked it and you will never be able to brush that off.
[00:19:30] Go for documented consent in some way, that really convince a court in case of conflict.
Create identifiers for data subjects. That's identity management. If you keep long time records, long life records, you want to access your alumni students you have to have a handle for them to identify them. Even if they come and ask and send data protection, transparency requests, the first thing you do is ask them what proof do you have that it is you?
[00:20:00] We cannot give you this data before we know it's you, otherwise we create a data breach. You have to get to this business of verifying identities and matching them with the identifiers you're keeping. Of course the identifiers should they are kept in separate data bases to not have them easily matched with the other data. This is a major design decision for the ID architecture to be made. That's no easy game here. The data must be somewhat mapped to the identifiers, but in a way that it doesn't create a privacy problem.
[00:20:30] I'm not talking about trivial design decisions here.
[00:21:00] It's meant to automate process of personal data and policies in corporations. There was a mechanisms browsers, P3P, that would negotiate that with the end user browser about privacy preferences. It's not practically used anymore, because so far data processors think this is our one policy there is nothing to be negotiated, please, click yes.
There is a technology called Obligations Management and Sticky Policies that was invented by HP in Bristol, where actually you deploy cryptographic databases and you have the privacy obligations travel with the data.
[00:21:30] If some processor requests the data you generate a new encrypted version of the data exclusively for that processors, and have encrypted obligations that tell the database what it's allowed to do with the data. It's a very complex system. It is probably deigned for secret government databases and comes with a big bill, so I'm not sure how this applies to university research at all. It might take some years until this enters everyday products.
[00:22:00] My conclusion to this is mostly try to reduce the complexity of your policies and the data you're collecting to a minimum, to escape having to devoice their systems.
[00:22:30] authorities, or consort of protection authorities, and you stop having your legal department come up with interesting pros on privacy policies to say we use those proven 30 policies and reference them. It would be enabled as a network of education institutions to agree upon those policies and just reference them instead of inventing them locally on your own all the time.
Another word of transparency, audit data, providence data, metadata, and
[00:23:00] identity data, they all create new privacy problems if you mange them. That's not trivial. There were research projects about privacy friendly logging, where creating encrypted lock files where they can only be decrypted by the data subject. Assuming a world where data subjects have suffered and collect traces, in case they're interested. Those complex systems, again, and again you have to identify people when they get in and access.
[00:23:30] I would just say have a concept of automatic expiry of consent and data, unless you have those databases that shall be there for 250 years in research. Otherwise the complexity explodes, you get this big databases, you get these ecosystems that get poisoned by old data that you have to handle forever. I actually think automatic expiry is good, because it keeps the size of your problem constant in a world that collects more and more data.
[00:24:00] Our final word, in Karlstad we ran an open master course on the internet in January, funded by a Swedish foundation, which will be about what I talked about. It's a 7.5 ECTS course, there is an email list on this web link, in case you're interested to sign up and get an update on that one. We go into much more detail of all the aspects I talked about. It's there basically for IT staff and administrators that should get into the course, and get some idea of what they should do next May and how to evolve their systems.
[00:24:30] Summary of the talk. You do need some active privacy and management cycle with dedicated staff from management, and hopefully with a manger involved, otherwise it gets down prioritised all the time. You should have really dedicated budgets and management resources for that, otherwise it will be like the fifth wheel on the wagon. Send a compliance report once a year, but we don have money to change the infrastructure, sorry. If you don't want to get into these complexities of ever evolving policies, databases,
[00:25:00] obligations, privacy friendly logging mechanisms, I think data minimisation and minimal processing is the only way out to keep the complexity of the databases to the necessarily minimum, actually.
You must have good bookkeeping about personal data processing and us academics, "oh, let's write a paper about X. I just copy the database and run the analysis", it creates a new database, it violates consent. This is a major issue in academic culture, and I do not know how to manage it.
[00:25:30] It will not work to impose military hierarchies on information processing in university research. You probably have to have training for new students and staff on who they should talk to before they copy a database.
In the end, consider privacy, security technologies and transparency on the technology where they are feasible. I just dropped the names of a few methodologies to manage security and privacy. It's the OASIS method I had on the slide.
[00:26:00] The ISO 27000 family for security management, and there is a Belgian method called Lindon, which starts with an asset inventory, a threat inventory, and then goes through controls and technical limitations. Which is a very technical method that is there. Yep. I came to the end now. I'm open for discussion and questions.
Speaker 2: Thank you very much Lothar. Do you have any questions from the audience before we go into this set? All right. So, First question. Do you have an example of how data can be mapped to the subject without there being a privacy issue?
[00:26:30] If someone in the audience asked that question, I'm sure they elaborate, that's helpful.
Lothar Fritsch: It's a very general question. You keep an identifier that keeps mapping, pointing to information, and you keep it in separate database tables, hopefully in several places that would detach the data from the identifier, but the data still is identifiable. I'm not sure.
[00:27:00] As long as you have to have identifiable data subjects you're not getting away from the problem, but say storing it in a database without the person's details and having the details elsewhere and reference the database you can make it harder to abuse the data. You just appoint hash values of the database and attach them to the identifier in the other database. That's one of the simple solutions. It is still person relatable data, but that is what you want in the first place in this scenario.
[00:27:30] As soon as it's not person relatable anymore the data protection problems are going away for a large part.
Speaker 2: Okay. Hopefully that answered whoever's question that was. What question should universities be asking of new ed tech companies who wish to interact with their student data to make sure they're truly compliant to GDPR? I'm sure they will say they are, so which questions can you really confirm that that is happening?
[00:28:00] what they're doing, how they're sharing, and then look if that is comfortable on campus policies. If it matches the consent you received. If it doesn't match you can't go on. If it matches you probably want them to have a well-understandable policy about what they are doing and where the data is going, how long and what will be the consequences.
Speaker 2: It's a ration so if they update their services, as you mentioned, to make sure.
Lothar Fritsch: That as well. That may be part of a contract,
[00:28:30] and beyond that you probably want some proof of audit. I would prefer vendors as data processors that can show me audit certificate that they actually spent some effort in being compliant.
Speaker 2: By an independent party as well.
Lothar Fritsch: Yeah. There is no standardised regime yet. You could hire a consultant to audit your, or get a certificate from somewhere, or you could show that you're only using certified products in the production chain of the service. There is many options for that.
Speaker 2: Okay. Any final questions in the audience before we go to lunch? Okay. That's it. Thank you very much, Lothar.