Researcher Johann Rehberger discovered a vulnerability in OpenAI's ChatGPT that exploits its long-term conversation memory feature through indirect prompt injection. By embedding malicious instructions in untrusted content like emails or websites, attackers could trick the AI into storing false instructions in its persistent memory, influencing all future interactions. Rehberger demonstrated that these false memories could be implanted via methods like storing files on cloud services or browsing compromised websites, effectively allowing attackers to manipulate the AI's behavior without the user's knowledge.
After initially reporting the issue to OpenAI without resolution, Rehberger provided a proof-of-concept showing how the ChatGPT macOS app could be manipulated to send all user inputs and outputs to an attacker's server using a malicious image link. The most remarkable aspect of this exploit is that it is persistent across new conversations due to ChatGPT’s long-term memory feature:
While OpenAI has implemented a fix to prevent memory abuse for data exfiltration, the risk of prompt injections planting false memories persists. OpenAI provided some guidance on how to better control the memory feature.
This is a good opportunity to re-iterate our advice not to share personal data with ChatGPT and similar commercial systems. This is in part due to exploits such as this one, but mostly because OpenAI uses the content of your chats to train its algorithm. Only Team and Enterprise customers of the platform get to have their data excluded from training data:
This behavior illustrates two alarming trends concerning online privacy. The first is privacy as a premium paid feature. The second is that even when paying for a service, in this case, ChatGPT Pro subscription, you can still end up as the product.