Hi All
Even synthetic data has privacy needs....
Companies and governments alike demand masses of data for operational, policy and research uses, but between large scale data breaches and the re-identification of published datasets, privacy risks abound.
Worth thinking about.
https://www.salingerprivacy.com.au/2023/07/06/synthetic-data/
Regards
Caute_Cautim
The idea of synthesizing fake data from a real data set to be used for development and training isn't a bad idea, but it's also not a very new one. This is the challenge we have always had. It's only out of laziness ("the developers need 100,000 records, let's just use this chunk of real data") that bad things happen. A good example is the 2006 Fidelity laptop breach, where for a demo, they had real data of HP employees, and the laptop was stolen. Just stupid.
Yes, part of the problem is coming up with synthetic data, but as is the case with a lot of security, the problems lie more in strategy and design. The first question that should always be asked is "Do you need all this data?" Those driving the business tend to think "Let's collect everything we can and then figure out what we need." But that is a risk-laden flaw. They either don't do the calculation or pay no heed to it. It creates a bit of a paradox for fake data. If you need fake data because you're afraid of what will happen to real data, well what is it about your production environment that eliminates that worry? Probably not as much as you think or want.
A good complement to using synthetic data would also be looking at how you can normalize your data in such a way that you diminish its value if attacked. Taking the personally identifiable information and splitting over different tables, hashing certain info, etc. can help. Granted, a thorough enough attack will still give up everything, but there's a lot that can be done on the design end that often isn't.
I think the blog kind of inadvertently leads to this point, it's no the data as much what you do with it.
@JoePete Well given there is 100 Million subscribers on Meta's Threads, there is sufficient synthetic data sitting there to be sifted and analysed, depending on whether you want fake or semi-real data to deal with - and it might have human beings actual life patterns attached too.
Regards
Caute_Cautim