Persistent Threat in Large Language Models

Caute_cautim · ‎08-13-2024

Hi All

Prompt injection has become a prominent area of focus in AI security. Despite extensive discussions on the subject, the actual business risks posed by prompt injection remain unclear. For instance, what are the potential disruptions if an LLM provides incorrect/dangerous information? Can a single bad response propagate through a system and cause significant damage?

To illustrate this, let's consider a real-world application. An AI-powered recruiting system could use Retrieval-Augmented Generation (RAG) to fetch relevant CVs and then ask an LLM to summarize and score these CVs. To mitigate prompt injection, the system might employ a second LLM to validate the output from the first LLM. How can prompt injections embedded in a CV persist throughout this pipeline?

https://www.linkedin.com/pulse/persistent-threat-large-language-models-chenta-lee-mxwge/?trackingId=...

Regards

Caute_Cautim