Hello,
I am embarking on a data classification process at my organization, but I am curious how this works in practice - specifically how to classify data depending on context.
As a made-up example of what I'm confused about, let's say I want to consider revealing gender of specific customers as confidential but reporting on gender in anonymous contexts might be public.
How would I indicate that the data classification of the gender attribute would depend on the context of what data it's shared with? e.g. gender + names is confidential but gender + number of units purchased is public? Ideally without needing to document every combination of attributes.
EDIT: Am I complicating this? Is it just noting gender as public and having an overall caveat that the data classification for any document/report is based on the level of the most restrictive attribute?
Thank you,
Ian
Hi Ian, I have had the privilege to head many awareness programs and campaigns in many different companies and can only enquire you to dump it down.
You need to consider your succes kriterier carefull, being what risks you are mitigating. Some just do awareness to satisfy stakeholders, but really value comes from carefull consideration about what value you bring.
Another advice would be to pay attention to the receivers. Usually they do not fancy Security as much as we do and don't get the the why. So be clear about your why and bring the message across as simple as possible 🙂
Number one piece of advice -- keep it simple. No more than 4 classifications.
Classifications guide how the data is handled. It is not about what the data is (e.g. "HR" is not a classification).
I would start with something like this and see how it goes:
If you primarily use one tool to manage your data, I would align your controls with its capabilities and how that vendor applies labels. It will greatly reduce your workload. For example, here are the O365 "sensitivity labels".
Also, don't mix different classification levels within one document or database. Having one database column be "confidential" and another "public" invites escalation of privilege compromises. Much better is to extract the public information into a separate read-only database which the public apps access.
Thanks for the reply. That's a good point to keep it simple and focus on what success looks like! Thanks! -Ian
Thanks for the reply! I have four similar classification levels so that part I'm good on. I was thinking about how to classify data when it's not just about an individual attribute but what attributes it's with.
For example, let's say in my industry I'm legally required to public report on attributes a, b and c. So those attributes would all be classified as public.
Let's say that someone has a report with attributes a and d (and d is classified), I want to be clear that in this context attribute a would also be classified.
I think I'm better understanding what I need to do but need to make sure I communicate it clearer than I'm doing here!
As a fake but more concrete example, let's say I need to publicly report on the ages of people that purchased my product.
18 year olds: 5000 units
19 year olds: 2000 units
etc.
so in that context the age attribute is public.
But the customer name is, let's say confidential so I want to be clear that the age when identified with a specific person is very much not public e.g.
John Doe 18 years old
Jane Doe 19 years old
Let me know if I'm making any sense and if you have any suggestions.
Thanks!
Ian
Another aspect of the KISS principle. Classify at the "document" or "table" level, not the "paragraph" or "attribute" level.
The entire purchase history database would remain confidential, including its "age" attribute.
From that confidential database, you generate a "sales by age" report. This extract itself is newly created data to which has its own a public classification. The original (non-aggregated) data does not change classification.
This can scale to wholesale copying the non-confidential columns into a new table that has a different classification. Even though the age data is "the same", the confidentiality level can vary based upon the table from which it was retrieved.
Kinda like when we are under NDA -- we can not confirm/deny the secret, but we can quote a press release that does just that.
This is how I would approach.
I'm sure I missed something but I hope this helps. Number 4 can be tailored to identify what's acceptable instead if you find it's less work and makes more sense. I'd say it would probably be a waste of time to identify every combination but rather state that no direct identifiers will be made public and only 3 indirect identifiers (or whatever number you settle on after conducting the risk assessment) can be included together as public information.