@Caute_cautimIntent is everything here.
It doesn’t 100% follow that the encoding of the data is done to hide the operation of the model.
“where an LLM could encode intermediate steps of reasoning in the generated text in a way that is not understandable to human readers.”
It’s stated here as a could - but in practice it’s almost certainly a would just to save memory etc.
You’re almost certainly going to encode lots of things in your model - starting with tokenisation - as long as you publish you tokenisation schema then humans will still be able to follow the process easily. No ethical issues.
Then you could have someone not publish their eroding schema, ok more of a challenge but I can sit I’ll wok out what they’re doing with statistical analysis and some frequency information - especially if the LLM is running on my machine. This is debatable did you forget, or hope we wouldn’t notice?
If the intent is to encode to deceive and obsfucate then it gets much harder to have any charity, and we’re talking about proprietary things we just won’t share and all the gibbons is encoded/encrypted and doesn’t record its working out.
Anything on the third case is for sure teaching it to lie - and I think the standards need to be able to run with assurance turned on - for some reason my mind wandered back to FIOS 140-2 validation and integrity checks.
So I wouldn’t assume bad intent just because we’re encoding, but I’d look at what is provided with the model for explainability of results. And then do the techniques as applied look like they were designed to deceive.
Careful about what I say given all the lawsuits flying around but “OpenAI” does seem on the surface to be a very large misnomer…