Data Classification in Practice?

ian · ‎09-01-2021

Hello,

I am embarking on a data classification process at my organization, but I am curious how this works in practice - specifically how to classify data depending on context.

As a made-up example of what I'm confused about, let's say I want to consider revealing gender of specific customers as confidential but reporting on gender in anonymous contexts might be public.

How would I indicate that the data classification of the gender attribute would depend on the context of what data it's shared with? e.g. gender + names is confidential but gender + number of units purchased is public? Ideally without needing to document every combination of attributes.

EDIT: Am I complicating this? Is it just noting gender as public and having an overall caveat that the data classification for any document/report is based on the level of the most restrictive attribute?

Thank you,
Ian

jzwicki · ‎09-02-2021

Hi Ian, I have had the privilege to head many awareness programs and campaigns in many different companies and can only enquire you to dump it down.

You need to consider your succes kriterier carefull, being what risks you are mitigating. Some just do awareness to satisfy stakeholders, but really value comes from carefull consideration about what value you bring.

Another advice would be to pay attention to the receivers. Usually they do not fancy Security as much as we do and don't get the the why. So be clear about your why and bring the message across as simple as possible 🙂

denbesten · ‎09-02-2021

Number one piece of advice -- keep it simple. No more than 4 classifications.

Classifications guide how the data is handled. It is not about what the data is (e.g. "HR" is not a classification).

I would start with something like this and see how it goes:

Public (may share with the general public)
Internal-Use (may share with other employees)
Confidential (only the data owner may add new recipients)
Highly-Confidential (requires "secret handshakes" to discuss)

If you primarily use one tool to manage your data, I would align your controls with its capabilities and how that vendor applies labels. It will greatly reduce your workload. For example, here are the O365 "sensitivity labels".

Also, don't mix different classification levels within one document or database. Having one database column be "confidential" and another "public" invites escalation of privilege compromises. Much better is to extract the public information into a separate read-only database which the public apps access.

ian · ‎09-02-2021

Thanks for the reply. That's a good point to keep it simple and focus on what success looks like! Thanks! -Ian

ian · ‎09-02-2021

Thanks for the reply! I have four similar classification levels so that part I'm good on. I was thinking about how to classify data when it's not just about an individual attribute but what attributes it's with.

For example, let's say in my industry I'm legally required to public report on attributes a, b and c. So those attributes would all be classified as public.

Let's say that someone has a report with attributes a and d (and d is classified), I want to be clear that in this context attribute a would also be classified.

I think I'm better understanding what I need to do but need to make sure I communicate it clearer than I'm doing here!

As a fake but more concrete example, let's say I need to publicly report on the ages of people that purchased my product.

18 year olds: 5000 units

19 year olds: 2000 units

etc.

so in that context the age attribute is public.

But the customer name is, let's say confidential so I want to be clear that the age when identified with a specific person is very much not public e.g.

John Doe 18 years old

Jane Doe 19 years old

Let me know if I'm making any sense and if you have any suggestions.

Thanks!

Ian

denbesten · ‎09-02-2021

Another aspect of the KISS principle. Classify at the "document" or "table" level, not the "paragraph" or "attribute" level.

The entire purchase history database would remain confidential, including its "age" attribute.

From that confidential database, you generate a "sales by age" report. This extract itself is newly created data to which has its own a public classification. The original (non-aggregated) data does not change classification.

This can scale to wholesale copying the non-confidential columns into a new table that has a different classification. Even though the age data is "the same", the confidentiality level can vary based upon the table from which it was retrieved.

Kinda like when we are under NDA -- we can not confirm/deny the secret, but we can quote a press release that does just that.

tmekelburg1 · ‎09-02-2021

This is how I would approach.

The entire row that includes all of the fields would be considered PII and confidential
Identify which fields are direct identifiers, (e.g., names, phone numbers, and other information that unambiguously identifies an individual) and indirect/quasi identifiers, (e.g., identify multiple individuals and can be used to triangulate on a specific individual).
Do I need to omit any fields due to any local laws, federal regulations, or anything not covered under contract?
Identify what identifiers and combinations of are not appropriate for public use (should be in a policy)
Perform a risk assessment to determine the likelihood and impact of a linkage attack for the identifiers deemed public access
Create specific canned reports or read-only reports that the fields can't be edited as @denbesten suggested
Awareness training as @jzwickisuggested

I'm sure I missed something but I hope this helps. Number 4 can be tailored to identify what's acceptable instead if you find it's less work and makes more sense. I'd say it would probably be a waste of time to identify every combination but rather state that no direct identifiers will be made public and only 3 indirect identifiers (or whatever number you settle on after conducting the risk assessment) can be included together as public information.