Security versus privacy — when should we choose to forget?
My consultancy firm has been advising media, internet technology, fintech and central/local UK government clients on Data Protection & Privacy (among other things) for a number of years.
I like to think we’re pro-privacy advocates championing the otherwise unrepresented ‘Data Subject’ while work towards balance on the security-v-privacy conundrum.
We help our clients work through challenges around data retention for analytical (think Google Analytics) and security (think security/transaction monitoring; fraud detection and technical cybersecurity defence) purposes.
We’re having one of those conversations at the moment so I thought I’d talk about it as a post.
What has changed?
- The General Data Protection Regulation (GDPR) (EU) 2016/679 and UK Data Protection Act (2018) has expanded the definition of ‘personal data’ to a much wider scope.
- As a result of the above, we need to pay a lot more attention to tertiary identifiers that previously didn’t receive a lot of focus — such as online identifiers.
- GDPR programmes made people realise they were never really compliant with previous data protection legislation so did a huge catchup which took more effort than it should have done.
What hasn’t changed?
- We should have already been thinking about how long we keep data for — regardless of purpose: function; analytics or security et al.
- The rights of the Data Subject being important.
- Organisational needs for analytics and security being important.
- Personally Identifiable Information (PII) only being a subset of ‘Personal Data’.
- Personal Data should only be collected legally and retained for as long as reasonably required.
What does this all mean?
Data retained for analytics
Using Google Analytics as my example, unique online identifiers are generated in order to track user behaviour (where they came from, what links they clicked, how long they spent reading a page etc).
This kind of data helps you improve (in this example, your website) whether that is making information easier to find through to increasing sales revenue by promoting popular products.
We should be considering the identifiers created by analytical systems as personal data and respond accordingly: the correct lawful and disclosed purposes for creating that data, disclosing and ensuring how long we retain that data for etc.
Ultimately this means you have to think about the data in analytical systems a lot more than you have before and delete data (after setting and disclosing the associated data retention policy) instead of hoarding it forever.
You might think the requirement is unreasonable (it is not) but in reality the burden has always been on your organisation to create anonymous data sets or conduct analytical work and then delete the original data used — none of that is new.
We can talk about Google Analytics refusing to act as a Data Processor actively requiring you not to send it personal data (inc PII) even though you probably are (and can’t stop, and need to treat the identifiers in Google Analytics as personal data for which you are Data Controller) another time…
Data retained for security
‘Security’ needs data — whether thats IP addresses, transactional monitoring during a live event or through to understanding whether there is a nefarious or otherwise undesired trend by a person or group over a period of time.
Reasonably retaining data for cybersecurity and fraud detection requirements can reasonably override the Data Subject’s interest for you to not hold that data— see GDPR Recital 49 (law enforcement and financial regulation being separate)
In my view, answering the “how long should we keep this data for?” can often be hard because of the associated known and unknown risk decisions you’re making at the same time — not having the data you need to analyse/fix a security problem would be bad.
Security incidents are mostly retrospective (how many times has your organisation caught a security breach while it is happening as opposed to after it stopped and/or once damage was already done?)
A pseudo-informed stab at data retention timings
The timings I am about to describe are based on what various clients have settled on over the years — they aren’t perfect or one-size-fits-all.
The timings make assumptions that you’re pseudonymising and anonymising a long the way to outscope a bunch of data that you can in theory keep forever.
The timings also don’t consider context such as external retention requirements stemming from the Payment Card Industry, legal justice system or financial regulation (and so on).
For simplicity I am using a broad-stroke categorisation method, in reality you will (and should) have many different retention periods for different data fields and/or the same data field for different purposes.
Data retention period: around 2 years.
System scope (quick example list):
- administrative access and activity through administrative functions like SSH Bastions and AWS Console
- access and merge history on repository platforms such as Github.com
- access to email accounts or document services
Why? Inside actors are in my ‘top 5’ of cybersecurity breach vectors whether intentional/coerced or inadvertent/accidental. If/when you discover an issue, it is likely you will need to sufficiently go back in time to understand potential impact and if it has happened before.
Don’t forget, while the internal team or contractor HR journey and rights are a little different —colleagues are Data Subjects as well!
Data retention period: around a year.
System scope (quick example list):
- end-user logins
- material end-user actions (for example, making a payment or what you would consider a ‘transaction’)
- contact and password changes (not the data itself, but the changing thereof)
Why? A year is a reasonable period of time to understand how a user account is interacting with your service to them and retain enough data to build a picture of malicious use or provide tailored customer services/advertising.
There are other personal and anonymous data scopes (for example, Content Delivery Network logs) which you can delete much faster as they mean far less much more quickly and are easier to convert into anonymous form.
You *should* collect the data you *need*
Not collecting data is a very simple way of being very compliant but on the flip side doing so makes little sense if you’re not offering your users or organisation purpose and value — a popular eCommerce site that doesn’t offer user accounts that retains data to save address history, see previous bought items etc is likely a worse user experience.
Do the hard work to get data protection and privacy ‘right’ — find your relevant sweet spots between a website that is awkward to use and doesn’t function properly to creepy data hoarding and data sales.
Use the data you have
Collecting the data is one thing (along with fulfilling disclosure requirements, answering access/portability/erasure queries etc) however it remains important that you’re also using the data well — there is very little point in collecting internal system data if you aren’t using that data for security monitoring (triggering alerts and what not) and then responding to them.
The concepts are simple, implementation is hard
Unfortunately data retention decisions can require a lot of nuance (but nuance isn’t a good reason for ‘rounding up’ retention times).
The underlying and overriding principle
Remember, you should only retain data legally and for as long as you can reasonably justify doing so.