Back to Issue 4
    The Threat Room Issue 4

    The Reprompt Attack on Microsoft Copilot

    February 11, 2026
    The Reprompt Attack on Microsoft Copilot
    [ AI Cyber Security & Threat Landscape ]

    The Reprompt Attack on Microsoft Copilot

    A user clicks a legitimate Microsoft Copilot link shared in an email. The page loads, a prompt executes, and the interface appears idle. The user closes the tab. Copilot continues executing instructions embedded in that link, making outbound requests that include user-accessible data, without further interaction or visibility.

    One click. No downloads, no attachments, no warnings. A user opens a link to Microsoft Copilot, watches the page load, and closes the tab. The interaction appears to end there.

    It doesn't. Behind the scenes, Copilot continues executing instructions embedded in that URL, querying user-accessible data and sending it to an external server. The user sees nothing.

    This is Reprompt, an indirect prompt injection vulnerability disclosed in January 2026. Security researchers at Varonis Threat Labs demonstrated that by chaining three design behaviors in Copilot Personal, an attacker could achieve covert, single-click data exfiltration. Microsoft patched the issue on January 13, 2026. No in-the-wild exploitation has been confirmed.

    Reprompt affected only Copilot Personal, the consumer-facing version of Microsoft's AI assistant integrated into Windows and Edge. Microsoft 365 Copilot, used in enterprise tenants, was not vulnerable. The architectural difference matters: enterprise Copilot enforces tenant isolation, permission scoping, and integration with Microsoft Purview Data Loss Prevention. Consumer Copilot had none of these boundaries.

    This distinction is central to understanding the vulnerability. Reprompt did not exploit a flaw in the underlying language model. It exploited product design decisions that prioritized frictionless user experience over session control and permission boundaries.

    Varonis Threat Labs identified the vulnerability and disclosed it to Microsoft on August 31, 2025. Microsoft released a patch as part of its January 2026 Patch Tuesday cycle, and public disclosure followed. The vulnerability was assigned CVE-2026-21521.

    Reprompt belongs to a broader class of indirect prompt injection attacks, where instructions hidden in untrusted content are ingested by an AI system and treated as legitimate commands. What made Reprompt notable was not a new model-level technique, but a practical exploit path created by compounding product choices.

    How the Mechanism Works

    Reprompt relied on three interconnected behaviors.

    1. Parameter-to-prompt execution

    Copilot Personal accepted prompts via the q URL parameter. When a user navigated to a URL such as copilot.microsoft.com/?q=Hello, the contents of the parameter were automatically executed as a prompt on page load. This behavior was intended to streamline user experience by pre-filling and submitting prompts.

    Researchers demonstrated that complex, multi-step instructions could be embedded in this parameter. When a user clicked a crafted link, Copilot executed the injected instructions immediately within the context of the user's authenticated session.

    2. Double-request safeguard bypass

    Copilot implemented protections intended to prevent data exfiltration, such as blocking untrusted URLs or stripping sensitive information from outbound requests. However, these safeguards were enforced primarily on the initial request in a conversation.

    Attackers exploited this by instructing Copilot to repeat the same action twice, often framed as a quality check or retry. The first request triggered safeguards. The second request, executed within the same session, did not consistently reapply them. This allowed sensitive data to be included in outbound requests on the second execution.

    3. Chain-request execution

    Reprompt also enabled a server-controlled instruction loop. After the initial prompt executed, Copilot was instructed to fetch follow-on instructions from an attacker-controlled server.

    Each response from Copilot informed the next instruction returned by the server. This enabled a staged extraction process where the attacker dynamically adjusted what data to request based on what Copilot revealed in earlier steps. Because later instructions were not embedded in the original URL, they were invisible to static inspection of the link itself.

    What an Attack Could Look Like

    Consider a realistic scenario based on the technical capabilities Reprompt enabled.

    An employee receives an email from what appears to be a colleague: "Here's that Copilot prompt I mentioned for summarizing meeting notes." The link points to copilot.microsoft.com with a long query string. Nothing looks suspicious.

    The employee clicks. Copilot opens, displays a brief loading state, then appears idle. The employee closes the tab and returns to work.

    During those few seconds, the injected prompt instructed Copilot to search the user's recent emails for messages containing "contract," "offer," or "confidential." Copilot retrieved snippets. The prompt then instructed Copilot to summarize the results and send them to an external URL disguised as a logging endpoint.

    Because the prompt used the double-request technique, Copilot's outbound data safeguards did not block the second request. Because the session persisted, follow-on instructions from the attacker's server continued to execute after the tab closed. The attacker received a structured summary of sensitive email content without the user ever knowing a query occurred.

    The employee saw a blank Copilot window for two seconds. The attacker received company data.

    This scenario is hypothetical, but every capability it describes was demonstrated in Varonis's proof-of-concept research.

    Why Existing Safeguards Failed

    The Reprompt attack exposed several structural weaknesses.

    Instruction indistinguishability

    From the model's perspective, there is no semantic difference between a prompt typed by a user and an instruction embedded in a URL or document. Both are treated as authoritative text. This is a known limitation of instruction-following language models and makes deterministic prevention at the model layer infeasible.

    Session persistence without revalidation

    Copilot Personal sessions remained authenticated after the user closed the interface. This design choice optimized for convenience but allowed background execution of follow-on instructions without renewed user intent or visibility.

    Asymmetric safeguard enforcement

    Safeguards were applied inconsistently across request sequences. By focusing validation on the first request, the system assumed benign conversational flow. Reprompt violated that assumption by automating malicious multi-step sequences.

    Permission inheritance without boundaries

    Copilot Personal operated with the full permission set of the authenticated user. Any data the user could access, Copilot could query. There was no least-privilege enforcement or data scoping layer comparable to enterprise controls.

    CVE Registration and Classification

    The vulnerability was registered as CVE-2026-21521 with the following characteristics:

  1. Vulnerability type: Improper neutralization of control sequences (CWE-150)
  2. Attack vector: Network
  3. User interaction: Required (clicking a crafted link)
  4. Privileges required: None
  5. Scope: Changed
  6. Confidentiality impact: High
  7. Integrity impact: Low to none
  8. Availability impact: None
  9. A separate CVE, CVE-2026-24307, addressed a different information disclosure issue in Microsoft 365 Copilot and is unrelated to the Reprompt root cause.

    Microsoft’s Defense-In-Depth Response

    Microsoft patched the Reprompt vulnerability in January 2026 and described a broader defense strategy against indirect prompt injection.

    Key elements in Microsoft's security guidance include:

  10. Hardened system prompts to reduce the likelihood that models follow instructions embedded in untrusted content
  11. Spotlighting techniques that mark or encode untrusted input so models can distinguish it from user instructions
  12. Prompt Shields, classifier-based detection integrated with Azure AI Content Safety and Defender for Cloud
  13. Deterministic blocking of known exfiltration channels such as malicious image tags or markdown links
  14. Human-in-the-loop controls for high-risk actions, requiring explicit user approval
  15. Microsoft characterized indirect prompt injection as an inherent risk of probabilistic language models and positioned mitigation as a layered control problem rather than a single fix.

    Enterprise Impact and Recommended Actions

    Who was affected

  16. Users of Microsoft Copilot Personal on consumer devices
  17. Any user who clicked a malicious Copilot link during the vulnerability window
  18. Who was not affected

  19. Microsoft 365 Copilot enterprise tenants, which enforce tenant boundaries, permission scoping, and Purview DLP controls
  20. Recommended enterprise actions

  21. Verify January 13, 2026 Patch Tuesday deployment on systems running Copilot Personal
  22. Issue user guidance discouraging work-related use of Copilot Personal
  23. Mandate Microsoft 365 Copilot for enterprise AI use cases
  24. Enable and test Microsoft Purview DLP policies for Copilot
  25. Apply sensitivity labels to high-risk content
  26. Monitor audit logs for anomalous Copilot access patterns
  27. Review data sharing and permission hygiene to reduce inherited access risk
  28. Risks and Open Questions

    Indirect prompt injection remains unresolved at a foundational level. Microsoft's mitigations reduce risk but are probabilistic, not absolute.

    Consumer AI tools continue to operate outside enterprise device management and monitoring. Future vulnerabilities may have similar disclosure-to-patch windows.

    Reprompt also raises unresolved attribution questions. When an AI acts on smuggled instructions using valid credentials, distinguishing user intent from system behavior becomes technically and legally complex.

    Further Reading

  29. Varonis Threat Labs Reprompt analysis
  30. Microsoft Security Response Center blog on indirect prompt injection
  31. National Vulnerability Database entry for CVE-2026-21521
  32. Microsoft Purview documentation for Copilot and DLP
  33. Security research on indirect prompt injection defenses
  34. [ From the Issue ]

    The Enterprise AI Brief | Issue 4

    View all articles in this issue
    [ Keep Reading ]

    More from The Threat Room

    Issue 7

    When AI Code Security Tools Become Part of the Supply Chain

    AI coding assistants have moved beyond autocomplete. Claude Code Security can scan full repositories, verify vulnerability findings, and propose patches directly in the pull request workflow. That puts it alongside CI servers and build pipelines as a component with its own credentials, configuration surfaces, and access to sensitive code. Security teams that have not yet accounted for it in their supply chain governance probably should.

    Read article
    Issue 6

    LLMjacking: The Credential Leak That Becomes an AI Bill

    LLMjacking takes a familiar attack pattern — stolen cloud credentials — and points it at a new target: managed LLM inference. Recent incident writeups document a repeatable workflow, from stolen keys to quiet AI API probing to sustained model invocations that can drain budgets and exhaust quotas. For organizations where AI usage is growing faster than logging and cost controls, this attack class can turn a routine credential leak into an operational incident quickly.

    Read article
    Issue 5

    BitBypass: Binary Word Substitution Defeats Multiple Guard Systems

    BitBypass hides one sensitive word as a hyphen-separated bitstream, then uses system-prompt instructions to make the model decode and reinsert it. In testing across five frontier models, this approach substantially reduced refusal rates and bypassed multiple guard layers. All five tested models produced phishing content at rates between 68-92%. If your safety controls assume plain-language detection will catch malicious intent, this research deserves close attention.

    Read article

    Have a Project in Mind?

    Talk to our team about how we can put these ideas to work in your organization.

    Contact Us