AI, have you been drinking?
For the last couple months I have been working with AI security. First with the general architecture and data flows for Generative and Agentic AI systems, and lately more with prompt & response security techniques. These later topics are where AI systems offer greenfield for attackers to apply all the old — and a select few new — attack techniques. I was researching how to coerce AI to misbehave, as part of my introduction to prompt engineering, I am stumbling across cases where we do not need attackers at all — the AI systems seem eager to misbehave all on their own. What do I mean by this? Lying, for starters. In the last week I have run across: * AI scanning a cloud environment, then describing work to be completed based on compliance requirements which do not exist* Co-contributor David Mortman had AI creating & checking code into source code control; after the AI stopped making code check-ins, the AI assistant was specifically asked why, and asserted check-ins were still being made* A veterinarian friend, who at one moment was worried about his career when AI provided a perfect analysis of a complex canine disease being treated, received in the next moment a detailed treatment for a condition which does not exist in the species specified * AI provided analysis to a fund manager friend of mine, citing a fictitious press release which provably did not exist, on preferred investment opportunities Inventing false information to deceive is not a hallucination, it’s fabrication. It’s lying. I asked one AI engine for the technical definition of hallucinations — a topic everyone using AI is cautioned about — and it responded “Factual: Inventing historical events or scientific facts.” I always have viewed hallucination as a human behavior of misinterpreting something seen, or seeing something that is not there. Probably confusion in the brain due to not understanding a sensory input. Assigning that to AI kind of made sense, as AI is meant to mimic human reasoning, and GenAI acts upon what it has reasoned. But when your drunk uncle shares a vivid hallucination, the family ignores him and changes the subject. AI is happy to go all in with your drunk uncle and build build a business case from the hallucination; it’s this unchecked propogation of errors that had me worried. As someone who presently benefits from AI’s ability to expand my research and comb vast amounts of data, I get value. For generating output on mundate tasks like coding snippets, it’s a time saver. But make no mistake: it is my responsibility to ensure anything I use, which GenAI created, is factual. It’s on me to test code or fact check content. Which is all possible as long as AI assists a human, and the human reviews generated content for accuracy. Putting that into a security context, how do you stop AI from building atop hallucination? With Agentic AI, who is the fact checker? If you can infuse lies into a document or prompt, AI will learn from it, and (potentially) weave it into the fabric of future decisions. And if AI introduces its own errors into the mix, finding the error and where it propagated will be difficult at best. A common vision of the future has agents talking to other agents: who validates the output is accurate at scale? More agents? My gut tells me we have a giant new data security & reliability challenge. Share:
