LLM Steganography

Caute_cautim · ‎11-12-2023

Hi All

Well we know Steganography has been used in pictures and documents, now we have it encoded within Large Language Models (Generative AI Models) as this study discovered and it is dangerous for many reasons.

1). Hiding their reasoning

2). Lack of transparency

3). Undermining monitoring of AI systems

https://venturebeat.com/ai/language-models-can-use-steganography-to-hide-their-reasoning-study-finds...

Regards

Caute_Cautim

Early_Adopter · ‎11-13-2023

I think that encoding tokens with minimal characters also improves efficiency over models that don’t. Maybe we could count as Stego or maybe not.

We also have techniques in code such as obsfucation to prevent analysis of the code itself.

I guess there isn’t too much issue unless it removes translparency, explainability etc - so maybe model dev opera have to share their schemes with regulators etc.

Caute_cautim · ‎11-13-2023

@Early_Adopter The key issue is AI Ethics and Governance, if they are intent on hiding techniques or information, it cannot be transparent and therefore it cannot be trusted. All sorts of nefarious methods could be applied, ready to strike out or carry out activities without the owner realising knowing what is going.

I certainly would add it to the security issues surrounding LLMs, and Generative AI models.

Given the law suit raised recently, OpenAI has a lot to answer for.

Regards

Caute_Cautim

Early_Adopter · ‎11-14-2023

@Caute_cautim

Intent is everything here.

It doesn’t 100% follow that the encoding of the data is done to hide the operation of the model.

“where an LLM could encode intermediate steps of reasoning in the generated text in a way that is not understandable to human readers.”

It’s stated here as a could - but in practice it’s almost certainly a would just to save memory etc.

You’re almost certainly going to encode lots of things in your model - starting with tokenisation - as long as you publish you tokenisation schema then humans will still be able to follow the process easily. No ethical issues.

Then you could have someone not publish their eroding schema, ok more of a challenge but I can sit I’ll wok out what they’re doing with statistical analysis and some frequency information - especially if the LLM is running on my machine. This is debatable did you forget, or hope we wouldn’t notice?

If the intent is to encode to deceive and obsfucate then it gets much harder to have any charity, and we’re talking about proprietary things we just won’t share and all the gibbons is encoded/encrypted and doesn’t record its working out.

Anything on the third case is for sure teaching it to lie - and I think the standards need to be able to run with assurance turned on - for some reason my mind wandered back to FIOS 140-2 validation and integrity checks.

So I wouldn’t assume bad intent just because we’re encoding, but I’d look at what is provided with the model for explainability of results. And then do the techniques as applied look like they were designed to deceive.

Careful about what I say given all the lawsuits flying around but “OpenAI” does seem on the surface to be a very large misnomer…