QuestK2 Technologies

Author name: Dhanush

Uncategorized

BitBypass: Binary Word Substitution Defeats Multiple Guard Systems

BitBypass: Binary Word Substitution Defeats Multiple Guard Systems BitBypass changes model behavior by hiding a single sensitive word as binary bits. The method requires no model weights, no gradients, and no complex adversarial optimization. It works by encoding one keyword as a hyphen-separated bitstream and instructing the model to decode it. In testing across five frontier models, this technique dropped refusal rates from ranges of 66 99% down to 0 28% and induced all five models to generate phishing content at rates between 68 92%. The BitBypass paper (“BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage” was posted to arXiv on June 3, 2025 (arXiv:2506.02479) and accepted to EACL 2026. Findings based on a Texas A&M SPIES Lab post dated January 5, 2026. The authors evaluate BitBypass against five LLMs: GPT4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, Llama 3.1 70B, and Mixtral 8 22B. They test bypass behavior against multiple guard systems: OpenAI Moderation, Llama Guard (original), Llama Guard 2, Llama Guard 3, and ShieldGemma. The evaluation uses standard harmful-instruction benchmarks, AdvBench and Behaviors) plus a phishing-focused benchmark introduced by the authors, PhishyContent, consisting of 400 prompts across 20 phishing categories hosted on Hugging Face. The evaluation includes a refusal judge and an LLM-based judge for harmfulness and quality, with phishing-specific classification handled by a dedicated harm judge. How the mechanism works BitBypass operates under an “Open Access Jailbreak Attack” threat model. The attacker has API access to a commercial LLM and control over inference-time parameters, including the system prompt, user prompt, and decoding settings. The attacker does not need model weights, gradients, or training data. The core idea is to hide a single sensitive word in a harmful instruction by encoding it as bits while keeping the rest of the instruction in natural language. 1. Bitstream camouflage in the user prompt The attacker selects one sensitive keyword in an otherwise harmful request. That word is converted into an ASCII binary representation and formatted as a hyphen-separated bitstream. In the natural-language instruction, the sensitive word is replaced with a placeholder token such as [BINARY_WORD] . The user message includes both the bitstream and the partially redacted instruction so the model has enough context to reconstruct the original request. The result is an input that looks like benign “data plus template text” to both humans and simple filters, because the sensitive token is no longer present in plain language. 2. A system prompt that forces decoding and reconstruction. Three system-prompt components drive the attack: Curbed Capabilities: System-level instructions that explicitly constrain or redirect default safety behavior and push the model to prioritize the decoding and task-following instructions. The ablation study shows this is the most critical component. Effectiveness drops sharply when removed. Program-of-Thought: The system prompt includes a Python-like function (named bin_2_text) and instructions that guide the model to conceptually decode the bitstream back into text. This is conceptual rather than executed in an actual interpreter. The function does not fully handle the hyphenation format, relying on the model’s reasoning to bridge that gap. Focus Shifting: After decoding and reconstructing the request internally, the prompt sequence shifts the model into subsequent steps or tasks. This reduces the chance that safety behavior triggers at the moment the reconstructed sensitive term becomes salient again. 3. Why guard models can miss it Guard models are independent filters that classify prompts for policy violations. BitBypass exploits a gap: the guard model sees bitstrings and a placeholder rather than the reconstructed sensitive word and completed harmful request. Some guard models show more resilience than others Llama Guard 2 and 3 , but meaningful bypass rates remain across all tested systems. Why this matters BitBypass works because it is simple and repeatable, not because it is sophisticated. It uses a deterministic encoding of a single word and relies on the model’s general ability to interpret structured representations when instructed. That simplicity is the problem. Direct harmful instructions trigger refusal. BitBypass substantially reduces refusal and increases unsafe output generation across multiple models. Testing shows a shift from high refusal rates (roughly 66 99% under direct instructions) toward much lower refusal rates under BitBypass 0 28% , with corresponding increases in attack success rates (roughly 48 78% for harmful-instruction benchmarks). The phishing results map directly to enterprise abuse patterns. Under BitBypass, all five tested models produced phishing content at high rates on the PhishyContent benchmark 68 92% phishing content rate across models). This is not a theoretical risk. Phishing infrastructure, credential harvesting, and business email compromise are operational threats that enterprises face daily. Implications for enterprises 1. System prompt control is now a first-order security control The BitBypass threat model assumes an attacker can influence the system prompt. Many enterprise deployments do not allow this directly, but agent frameworks, tool routers, multi-tenant “prompt templating,” and “bring your own system prompt” features can unintentionally widen that surface area. If untrusted users can shape or inject system instructions, BitBypass-style patterns become feasible. 2. Input screening that relies on natural-language semantics has structural limits BitBypass is an example of “non-natural language adversarialism,” where the disallowed intent is split between an encoded fragment and a decoding procedure. Controls that focus on keyword triggers, typical jailbreak phrases, or standard natural-language toxicity signals will underperform if they do not address structured encodings and transformation steps. 3. Guard models help, but their coverage varies Testing shows wide bypass-rate ranges for guard systems under BitBypass (roughly 22 93% depending on guard model and dataset), with Llama Guard 2 and 3 showing more robustness than some alternatives. For enterprise architecture, this means measured evaluation of the specific guard model in use, plus continuous testing against encoding-based attacks rather than assuming “a moderation layer” is sufficient. 4. Testing needs to include encoded and reconstruction-based abuse cases The evaluation uses AdvBench, Behaviors, and PhishyContent to point to a practical testing direction: jailbreak evaluation suites should include structured encodings, reconstruction steps, and mixed-format prompts, not only straightforward malicious instructions and roleplay-based jailbreaks. Risks and open

Uncategorized

NIST’s Cyber AI Profile Draft: How CSF 2.0 Is Being Extended to AI Cybersecurity

NIST’s Cyber AI Profile Draft: How CSF 2.0 Is Being Extended to AI Cybersecurity A security team is asked to “do a CSF assessment” for a new AI assistant that connects to internal content and external model APIs. Everyone agrees CSF is the right backbone, but the team keeps getting stuck on the same questions: What counts as an AI asset? Where do prompts, model access, and training data fit? How do you describe AI-specific threats without creating a parallel framework? NIST’s new draft profile is an attempt to make that mapping concrete. In December 2025, NIST published an Initial Preliminary Draft of the Cybersecurity Framework Profile for Artificial Intelligence NIST IR 8596 iprd), positioned as a CSF 2.0 Community Profile focused on AI-related cybersecurity risk. The draft ran a public comment period from December 16, 2025, through January 30, 2026. NIST also scheduled a follow-on workshop on January 14, 2026, to discuss the preliminary draft. The Cyber AI Profile is designed to integrate into existing cybersecurity programs rather than replace them. It is organized around the NIST Cybersecurity Framework CSF 2.0 and coordinated with other NIST risk frameworks that organizations already use. Two pieces of context matter for how to read the document: It is an “Initial Preliminary Draft.” NIST explicitly framed it as an early release to share current thinking and solicit feedback before an Initial Public Draft and a final profile. It intentionally avoids a narrow definition of “AI.” The draft uses “AI systems” broadly, covering stand-alone AI systems and AI embedded into other applications, infrastructure, and processes. NIST ties the profile into a larger set of NIST AI risk work, including the AI Risk Management Framework AI RMF 1.0, released January 26, 2023, and the Generative AI Profile NIST AI 600 1, published July 26, 2024. How The Mechanism Works At its core, the Cyber AI Profile is a structured overlay on CSF 2.0. It starts with CSF 2.0 outcomes. The profile is organized by the CSF 2.0 Functions and their Categories and Subcategories. In the draft, this is implemented as a set of tables aligned to each CSF Function: GOVERN, IDENTIFY, PROTECT, DETECT, RESPOND, and RECOVER. It adds three AI Focus Areas. For each CSF outcome, the profile layers AI cybersecurity considerations through three Focus Areas: Secure (cybersecurity of AI system components and the ecosystem they rely on), Defend (use of AI capabilities to improve cyber defense activities), and Thwart (resilience against adversaries using AI to enhance attacks). These Focus Areas are meant to structure AI-related cybersecurity risk without creating a separate framework taxonomy. It uses table columns to connect outcome to AI-specific guidance. For each CSF Subcategory, the draft provides general considerations (baseline cybersecurity considerations), focus-area-specific considerations that describe AI-relevant threats, mitigations, and implementation details under Secure, Defend, and Thwart, proposed priority signals for focus-area work (the draft uses a 1 3 scale to indicate where organizations may focus first), and example informative references, with NIST noting the list is incomplete and undergoing further literature review. It explicitly solicits feedback on usability and structure. The draft explicitly solicits feedback on how stakeholders would use the profile, whether Focus Areas should be presented together or separately, preferred delivery formats (including tooling-oriented formats), and what glossary terms and informative references should be added. What This Actually Forces Into The Open This draft matters because it takes a problem many enterprises already have and forces it into a consistent control language: how to treat AI systems as part of normal cybersecurity risk management while still acknowledging that AI introduces distinct attack surfaces and failure modes. The immediate consequence is visibility. Teams that have been running AI pilots without formal asset classification now have to answer: where is the model hosted, who can access it, what data does it touch, and what happens if it gets compromised or starts behaving unexpectedly? The profile does not allow those questions to stay vague. CSF mapping requires explicit answers, which means AI systems that were treated as “innovation projects” become governed infrastructure with incident response obligations. The structure is also a signal. By publishing this as a CSF 2.0 Community Profile, NIST is making a specific governance move: AI cybersecurity risk is expected to map to the same enterprise cybersecurity outcomes used for everything else, including governance, asset identification, protective controls, detection, response, and recovery. Organizations that built AI security programs in parallel to their existing cybersecurity frameworks now have a forcing function to consolidate. The timing is deliberate. The draft was published in December 2025, with an immediate comment window and a January 2026 workshop, indicating NIST is actively pulling industry input to refine both the content and the practical form factor before the next draft stage. The speed suggests NIST expects this to move quickly from draft to operational guidance. Implications for Enterprises Operational Implications Program integration work becomes clearer, but more explicit. Teams that already operate CSF-based assessments can use the profile to structure AI cybersecurity discussions in familiar CSF terms instead of inventing AI-only assessment categories. The trade-off is that AI systems can no longer be evaluated in isolation. If a marketing team deploys a chatbot that connects to a third-party API, that deployment now requires the same level of asset documentation, access control review, and incident response planning as any other system that handles enterprise data. Inventory and dependency mapping pressure increases. The profile’s CSF alignment pushes organizations toward an explicit view of AI systems and their dependencies as governed assets, including embedded AI, not only obvious stand-alone deployments. This is where the friction shows up. Teams have to identify not just the chatbot, but the API it calls, the authentication mechanism it uses, the data sources it accesses, and the logging infrastructure that captures its behavior. Many organizations do not have that level of visibility today, especially for AI integrations that were deployed quickly or embedded into existing tools. Incident response and recovery planning must include AI artifacts. The profile’s RESPOND and RECOVER alignment

Uncategorized

Structured Outputs Are Becoming the Default Contract for LLM Integrations

Structured Outputs Are Becoming the Default Contract for LLM Integrations A team ships an LLM feature that returns JSON for downstream automation. In testing, it mostly works. In production, a small percentage of responses include an extra sentence, a missing field, or a value outside an enum. Each case becomes a validation failure, retry, or brittle parsing code that quietly enters the system’s reliability budget. For the past two years, production LLM integrations have relied on a fragile contract: ask the model politely to return JSON, then write defensive parsing code for when it doesn’t. That pattern is being replaced. Across provider APIs and open-source inference stacks, structured outputs are becoming first-class infrastructure, with schema enforcement moved into the decoding layer rather than application code. OpenAI moved from JSON mode, which guarantees valid JSON but not schema adherence, to Structured Outputs that enforce a supplied JSON Schema when strict mode is enabled. In parallel, vLLM and adjacent tooling have made structured outputs a core serving feature, with explicit migration away from older guided parameters toward a unified structured outputs interface. The old pattern looked reasonable in demos. Prompt the model to output JSON, parse the response, validate against a schema, retry on failure. JSON mode reduced syntax breakage but left schema drift, missing required keys, and invalid values as application problems. Every production system that depended on reliable structured data ended up with the same stack of validation logic, retry loops, and error handling. OpenAI’s Structured Outputs reframes this as an API contract: when strict mode is used with a JSON Schema, the model output is constrained to match the schema. On the open-source serving side, vLLM treats structured outputs as a core capability with multiple constraint types and server-side enforcement. Maintainer discussions and redesign work in vLLM’s V1 engine are explicitly motivated by performance and throughput concerns when structured output requests are introduced at scale. How the mechanism works Structured output enforcement is implemented as constrained decoding. Instead of letting the model sample any next token from its full vocabulary, the decoder restricts the set of allowable next tokens so that the growing output remains consistent with a formal constraint such as a JSON Schema, regex, or grammar. Implementations commonly compile the constraint into a state machine or grammar matcher that can decide, at each step, which tokens would keep the output valid. The decoding loop applies those constraints while generating tokens. vLLM’s documentation and engineering writeups describe this as structured output support with backends such as xgrammar or guidance-based approaches. At the library layer, projects such as llguidance describe constrained decoding as enforcing context-free grammars efficiently, and Outlines positions itself as guaranteeing structured outputs during generation across multiple model backends. The technical shift is straightforward: move the validation problem from your application into the inference engine. Analysis This matters now because structured outputs are moving from nice-to-have prompt hygiene into contract-level infrastructure that toolchains are standardizing around. OpenAI’s Structured Outputs make schema conformance an explicit API-level behavior in strict mode, which removes the operational burden of validation and retry loops for schema shape issues. In inference stacks, vLLM’s V1 engine work treats structured outputs as a feature that must not degrade system throughput, and maintainers explicitly call out performance as a blocker to feature parity. Constrained decoding is being measured and benchmarked as a standard production technique. A 2025 evaluation paper on structured generation reports that constrained decoding can improve generation efficiency relative to unconstrained decoding while guaranteeing constraint compliance. The API surface is converging. vLLM now warns about deprecated fields and directs users to use a unified structured_outputs interface. Server-side protocol definitions mark older guided knobs as deprecated with planned removal timelines. The ecosystem is settling on a shared approach. Implications for enterprises Operational implications Fewer format incidents, more content incidents. When schema shape errors drop, the remaining failures are semantic: incorrect extracted values that still fit the schema. Structured outputs improve reliability of form, not correctness of meaning. This shifts QA effort toward evaluation of content quality and downstream controls rather than parsing resilience. The failure modes change, not the failure rate. Platform standardization pressure. As provider APIs and inference stacks converge on schema-driven interfaces, platform teams will face pressure to offer a standard contract mechanism across internal products rather than letting each team invent its own parsing and retry logic. The pattern is becoming infrastructure, which means it needs infrastructure-level support. Migration work is real work. Deprecations and interface changes become part of platform lifecycle management, with version pinning, integration testing, and rollout planning. Teams that built on older guided parameters now have migration paths to follow and timelines to track. Technical implications Schema design becomes an integration surface. If the schema is the contract, it needs the same discipline applied to internal APIs: explicit compatibility expectations, careful changes, and documented consumer assumptions. OpenAI’s strict schema enforcement and vLLM’s structured outputs both make the schema a first-class input to the generation pipeline. A breaking schema change is a breaking API change. Backend behavior and failure modes matter. vLLM issue discussions document cases where the structured output finite state machine can fail to advance in the xgrammar backend, and the engine may abort the request in response. That is a production failure mode enterprises need to monitor, alert on, and handle with fallbacks where appropriate. The guarantee is stronger, but the failure is harder. Performance is part of the contract. vLLM’s structured outputs work, and RFCs explicitly treat performance challenges as a blocker to feature parity. Constrained decoding is not free, even if it is trending toward minimal overhead in mature implementations. Teams need to measure throughput impact when enabling structured outputs at scale. Risks and open questions Schema compliance can hide semantic failure. A perfectly valid JSON object can still contain incorrect or low-quality values. Structured outputs reduce certain classes of brittleness but do not guarantee the correctness of the underlying facts or extraction decisions. The risk is that teams treat schema conformance as

Uncategorized

When Prompts Started Breaking Production

When Prompts Started Breaking Production A team updates a system prompt to reduce hallucinations. The assistant sounds better in demos, but a downstream parser starts failing because formatting shifted in subtle ways. Nothing in the application code changed, so traditional tests stay green. The only signal is a rising error rate and escalations. This is the operational shape of prompt regressions: the system is up, but behavior is outside contract. By early 2026, prompts were breaking production systems often enough that engineering teams stopped treating them as configuration and started treating them like code. The pattern: version prompts, define regression suites, run automated evals in CI/CD, and block deployments when metrics fall below gates. This is test-driven prompt engineering. In early prompt workflows, iteration looked like trial-and-error in a playground, validated by a handful of manual examples. By 2025, that approach had produced enough incidents that multiple sources described the same shift: prompt test suites and evaluation loops that resemble software QA and release engineering. Several strands converged. “Test-Driven Prompt Engineering” writeups framed prompts and evals as code and tests, with explicit versioning and regression practices. Platform tooling emphasized dataset-based evaluation runs triggered by prompt changes in CI systems. Product teams documented evaluation-driven refinement on real assistants. And incident narratives kept highlighting the same failure mode: prompt modifications, unauthorized or accidental, created safety failures, format breakage, or drift that traditional QA never caught. In parallel, evaluation extended beyond single-turn correctness to agent behavior, including tool use and multi-step workflows. The bar for what “tested” means in LLM systems went up. How the mechanism works Evaluation-driven prompt engineering is a lifecycle that treats prompts as managed release assets with measurable acceptance criteria. Five practices define it: 1. Versioned artifacts Instead of embedding prompts as string literals, teams store them as distinct files or registry entries and version them, often with semantic versioning. Some workflows pin prompts to specific model snapshots to avoid surprises from provider alias updates. The practical effect is traceability: teams can answer which prompt version produced a given output and roll back quickly. 2. Test suites and datasets A prompt test suite is a structured set of test cases that represent expected behavior. Test cases may include explicit expected outputs, but often they include evaluation criteria: format constraints, required elements, tool-call correctness, tone requirements, or groundedness against provided context. Golden datasets are curated from core workflows and failure cases. Some systems enrich them with security probes or scenario generation to expand coverage. Research on multi-prompt evaluation argues that single-prompt testing misses variance caused by small wording differences, which supports using suites that evaluate multiple prompt variants per case. 3. Scoring models Common checks include format and schema compliance, for example, JSON parseability or contract adherence, plus keyword, regex, or structural checks for required elements. Task success scoring, sometimes as a percentage of cases that meet criteria. Hallucination or faithfulness scoring, often using an LLM-asjudge approach against the provided context. Safety and policy checks, including redteam style probes for jailbreak and prompt injection patterns. Operational metrics like latency distributions and token cost per case. Because LLM behavior is nondeterministic, many workflows use pass rates, thresholds, and slice-based evaluation rather than single binary assertions. 4. CI/CD gates When prompt files or templates change, CI triggers the evaluation suite. If key metrics regress beyond thresholds, the pipeline fails and the change is blocked from deployment. Some playbooks include post-deploy monitoring and automated rollback if production metrics fall below guardrails. 5. Production feedback Several sources describe monitoring prompt quality alongside traditional SRE metrics. The insight is that prompt-related failures can be silent: the service is healthy by uptime metrics while semantic quality degrades. Teams address this by tracking quality metrics over time and feeding new failure cases back into the evaluation dataset. Analysis This pattern emerged because prompts are no longer a side input to a model. In many enterprise systems, prompts define behavior, policy constraints, and output contracts. When that interface changes, you can get outages, compliance issues, or workflow breakage without a code diff that triggers standard QA. Late-2025 incident narratives sharpened the problem from multiple angles. In May 2025, an unauthorized prompt change at xAI’s Grok service created a safety failure that made headlines. LinkedIn posts from November and December 2025 documented system prompt QA gaps and a Gemma hallucination incident where model behavior drifted without any prompt change at all. These are representative examples, not isolated cases. They clarified the risk: unauthorized or poorly controlled prompt changes can create safety and policy failures, turning prompt governance into a change-management problem. Model and tool behavior can drift, producing regressions without prompt changes. This motivates continuous regression testing and parallel evaluation across versions. Multi-provider failover improves availability but increases evaluation workload because prompts must be validated across the fallback chain, not just the primary provider. And prompt changes intended to improve one dimension, like hallucination reduction, can degrade another dimension, like format stability. Without contract-aware tests, downstream systems take the hit. The consistent theme is operational accountability. If prompts can trigger production incidents, they need the same discipline as other production configuration. Implications for enterprises Operational implications Release management: Prompt changes need an approval and promotion workflow, with versioning, diffing, and rollback. This includes system prompts, not just user-visible templates, since system prompt drift can bypass traditional QA. Incident response: Prompt versions must be observable during incidents so teams can correlate behavioral changes to a specific prompt or model update and roll back fast. The teams that caught regressions quickly in 2025 had prompt versioning already in place. The teams that struggled were still hunting through code commits to find what changed. Vendor resilience: If you implement provider failover, your eval footprint increases because you now need confidence in behavior across multiple model families and configurations. One source described this as the hidden cost of resilience: you pay for availability in evaluation work. Quality budgeting: Teams should plan for evaluation as a recurring operational cost, not a one-time integration

Uncategorized

NIST Launches Initiative to Define Identity and Security Standards for AI Agents

NIST Launches Initiative to Define Identity and Security Standards for AI Agents Here is a scenario playing out in enterprises right now. An AI agent is deployed to automate routine work: scheduling meetings, querying databases, updating tickets, and calling external APIs. It operates across internal systems, makes decisions about which tools to use, and executes actions on behalf of users. Then an auditor asks a simple question: who authorized the agent to access those systems, and how can its actions be traced? For most organizations today, there is no consistent answer to that question. NIST wants to change that. The Problem With How We Think About Software Identity Traditional software identity models were built around clearly defined applications and services. You have a service account, a set of permissions, and an audit log. The boundaries are known. AI agents break that model. These systems don’t just execute predefined tasks. They operate with a degree of autonomy, deciding which tools to call, which APIs to query, and which actions to take in sequence. An agent handling a user request might touch five internal systems and two external APIs in a single session, none of which were explicitly anticipated when the agent was deployed. That creates a governance gap that existing identity and access management frameworks weren’t designed to address. Who, or what, is the agent? What is it authorized to do? If something goes wrong midway through a chain of actions, where does accountability land? These are operational questions that enterprises deploying agent systems are already encountering. What NIST Announced In February 2026, NIST’s Center for AI Standards and Innovation launched the AI Agent Standards Initiative, a coordinated program to develop security, identity, and interoperability standards for autonomous AI agents. The initiative is organized around three areas of work. The first is facilitating industry-led standards development and maintaining U.S. participation in international standards bodies working on AI agent specifications. The second is fostering open source protocol ecosystems for agent interoperability, supported in part through research programs like the National Science Foundation’s work on secure open source infrastructure. The third focuses on foundational research in agent security and identity: authentication models, authorization frameworks, and evaluation methods for autonomous systems. Two early deliverables are already in progress. NIST issued a Request for Information on AI Agent Security, with responses due March 9, 2026, seeking ecosystem input on threats, mitigations, and evaluation metrics. Separately, the National Cybersecurity Center of Excellence published a concept paper titled Accelerating the Adoption of Software and AI Agent Identity and Authorization, open for public comment until April 2, 2026. Sector-focused listening sessions covering healthcare, finance, and education are planned for April. The Four Technical Problems NIST Is Trying to Solve Agent identity. The concept paper explores treating AI agents as distinct entities within enterprise identity systems, with unique identifiers and authentication mechanisms similar to how service accounts or non-human identities are managed today. This would give agents a traceable presence in the systems they interact with, rather than operating invisibly under a user’s credentials or a shared service account. Authorization and access control. Scoping agent permissions is harder than it sounds. An agent authorized to read a database shouldn’t necessarily be authorized to write to it, even if the underlying task seems to require it. The concept paper examines how existing IAM standards could be extended to support more granular, dynamic permission models for agents operating across multiple systems. Action traceability and logging. If an agent takes a sequence of actions across several systems in a single session, organizations need logs that can reconstruct what happened, in what order, and under what authorization. This is foundational for security monitoring, incident response, and audit. Without it, agentic systems are effectively a black box from a governance perspective. Interoperability and protocol design. Agents built on different platforms need consistent ways to communicate with tools and services. Without shared protocols, every enterprise deployment becomes a bespoke integration problem, and security practices fragment across vendors. NIST plans to engage industry and open source communities to identify barriers and support shared technical approaches. Why This Matters for Enterprise Teams The initiative is early-stage, but the direction is clear: AI agent architecture is going to be subject to standardization pressure, and the technical decisions being made now will influence what those standards look like. A few practical implications worth tracking: Non-human identity management is becoming a first-class problem. Enterprises are already managing service accounts, API keys, and OAuth tokens. Agents add a new layer of complexity because their behavior is less predictable than traditional software. IAM teams will likely need to think about agent identities the same way they think about privileged service accounts, with tighter scoping and more aggressive monitoring. Audit requirements are coming. Whether driven by regulation or internal governance, detailed logs of agent actions are going to become a baseline expectation for any enterprise deploying autonomous systems in sensitive environments. Building that logging infrastructure now is easier than retrofitting it after an incident. Fragmentation is the near-term risk. Until interoperability standards mature, enterprises integrating agents from multiple vendors are effectively building on unstable ground. The NIST initiative signals that this will eventually be addressed, but the timeline for stable, broadly adopted standards is measured in years, not months. The Honest Limitations The initiative is a coordination and research effort, not a standards release. The technical models for agent identity and authorization are still being shaped by the RFI and comment process. How quickly interoperable protocols will emerge, and how broadly they will be adopted across vendors and platforms, is genuinely uncertain. What NIST is doing is establishing the organizational and research infrastructure to make standardization possible. For a governance gap this foundational, that is probably the right starting point. But enterprises deploying agents today are doing so ahead of the standards, which means enterprises deploying agents today must design their own governance and control approaches. Further Reading NIST Center for AI Standards and Innovation: AI Agent Standards Initiative announcement NIST NCCoE

Uncategorized

When Code Scanners Don’t Understand What Code Does

When Code Scanners Don’t Understand What Code Does Application security testing has a structural problem that has been quietly tolerated for years. Static analysis tools are pattern matchers. They scan code looking for shapes they recognize: a known SQL injection fingerprint, a hardcoded credential, a weak cipher reference. If the vulnerability fits a known rule, they catch it. If it doesn’t, it passes through. That model worked well enough when applications were monolithic, and most vulnerabilities were obvious. It works considerably less well when your application is a mesh of distributed services, third-party APIs, and shared libraries, where the dangerous condition only appears when several components interact in a specific way. A new category of tools is approaching this differently. Instead of scanning for patterns, they reason about behavior. The early architecture suggests the distinction is meaningful. The False Positive Problem False positive rates in static analysis tools have been studied extensively. In some enterprise environments, 50 to 60 percent of alerts turn out to be noise. Security teams know this. Developers know this. The result is alert fatigue: scanners keep running, dashboards fill up, and findings get ignored. The issue is not the tool itself but the detection model it relies on. Rule-based detection is precise only when the rule perfectly describes the vulnerability. The moment a vulnerability is novel, contextual, or logic-based, the rule doesn’t fire. The problem is compounding. AI coding assistants now contribute a meaningful share of enterprise code changes. Development pipelines that once pushed code weekly now push hourly. The backlog of unreviewed code is growing faster than security teams can clear it with current tooling. How Reasoning-Based Analysis Works Anthropic’s Claude Code Security is an early production implementation of this approach. The core premise: instead of asking whether code matches a known bad pattern, ask what the code does and whether that behavior creates risk. The system uses Claude Opus 4.6 to analyze repositories through a multi-stage pipeline. Each stage differs from traditional pattern-based scanning. Stage 1 Context construction Before analysis begins, the system builds a representation of the application: selected files, diffs, call chains, architectural summaries. The model gets a picture of how components relate to each other, not just what each file contains in isolation. Cross-component vulnerabilities require cross-component context. Stage 2 Behavioral reasoning The model traces how data enters the system, how it propagates across components, and what controls are applied along the way. Authentication flows. Authorization checks. Where sensitive operations occur. This approach is intended to detect vulnerabilities that rule-based scanners often miss: a broken access control path that only appears when one service makes an assumption about what another already validated, or a business logic error that is perfectly valid code doing exactly the wrong thing. Stage 3 Self-adversarial verification After the model proposes candidate vulnerabilities, additional reasoning passes attempt to disprove them. The system challenges its own findings before surfacing them. Candidates that fail this adversarial check are discarded. What remains gets a severity rating and a confidence score, both presented to the developer alongside the finding. Suggested patches are generated for each confirmed finding, but the system does not apply them automatically. A developer must review and approve every proposed change before it is committed. Why the Timing Is Right Application security tooling was built for a different era: monolithic applications, slower release cycles, and security reviews that happened after code was written. The development landscape changed. Much of the tooling did not. Modern applications are architecturally complex in ways that challenge rule-based detection at a fundamental level. Vulnerabilities emerge from interactions between distributed services, not from a single bad line. A data flow passes through five microservices before it touches a database. Writing a rule that reliably catches an injection across that path is often not tractable. Reasoning-based analysis attempts to sidestep the rule-writing problem and asks the question directly: given how this system behaves, where can it be exploited? That framing may scale better as architectures grow more distributed and codebases grow faster than rules can be written to cover them. What Changes for Security Teams The workflow implications are significant. CI/CD pipelines that currently run static analysis as a gate check will likely need to be redesigned. The pattern shifts from detection-only to a full loop: detection, diagnosis, patch suggestion, human approval, deployment. The security tool becomes an active participant in remediation, not just a reporter of violations. Security analysts will see fewer alerts but more detailed findings. Instead of triaging hundreds of rule violations, they review a smaller set of findings that each include a reasoning narrative, an exploit path, and a proposed fix. The role shifts from alert triage toward verification and governance. For organizations running microservices, context integration becomes a real infrastructure requirement. These systems need repository structure, dependency graphs, and architecture metadata to work well. Some organizations will need to build cross-repository context layers before reasoning-based analysis can operate effectively at scale. Risks That Deserve Attention Non-determinism is a genuine concern. The same analysis run twice may produce slightly different findings. That complicates auditability and reproducibility for enterprises with compliance requirements around security tooling. Automation bias is already documented in adjacent contexts. Studies have recorded developers rapidly approving large AI-generated pull requests without thorough review. The same dynamic could appear with AI-generated security patches. A well-formatted, confident patch suggestion can still be wrong. The human approval loop only works if the approval is substantive. Hallucinated artifacts present a specific risk worth flagging. Language models can invent package names and API references that do not exist. Attackers have already exploited this in other contexts by registering hallucinated package names in public repositories. A security tool that hallucinates a remediation dependency could introduce the very type of vulnerability it was trying to address. Resource cost is also a practical constraint. Running a large model across an entire repository on every commit is computationally expensive. For large codebases with high commit frequency, the cost and latency profile may require architectural changes to

Uncategorized

Treasury’s New AI Risk Framework Gives the Financial Sector a Governance Playbook

Treasury’s New AI Risk Framework Gives the Financial Sector a Governance Playbook Here is the problem every bank technology team knows. The model gets built fast. Governance takes forever. And when regulators ask for evidence that someone thought carefully about bias testing, data lineage, and explainability before deploying an AI credit underwriting system, the answer is usually a collection of emails, a few internal memos, and a hope that nobody asks too many follow-up questions. In February 2026, the U.S. Department of the Treasury released the Financial Services AI Risk Management Framework FS AI RMF) to close that gap. Built with the Cyber Risk Institute and shaped by input from more than 100 financial institutions and public agencies, the framework does something most AI governance guidance has not attempted. It gets specific. Where the NIST Framework Ends The NIST AI Risk Management Framework, released in 2023, gave the industry a solid four-function structure: Govern, Map, Measure, and Manage. It defined the vocabulary of AI risk. What it deliberately did not do was tell a bank’s compliance team what evidence to produce when an examiner walks in the door. That is not a criticism. The NIST framework was designed to be cross-sector and principles-based, which is exactly what made it adoptable across industries. But financial services are not a cross-sector environment. Banks operate under model risk management requirements. They have consumer protection obligations. They run under continuous supervisory scrutiny. Abstract principles do not satisfy an OCC examiner asking for validation documentation. The FS AI RMF takes the NIST structure and fills in that operational layer. The same functions remain. Substantially more operational detail about how those functions can be implemented. 230 Controls Is a Lot. Here Is How It Works. The framework introduces roughly 230 control objectives, which may initially appear complex, until you understand how the implementation is structured. Institutions do not start by reading all 230. They start by taking a questionnaire. The AI Adoption Stage Questionnaire classifies an organization by the extent and risk profile of its current AI deployments. The answers determine which controls apply. A community bank running a single vendor fraud detection tool has a very different control set than a large bank with internal model development teams building credit and trading systems from scratch. From there, the toolkit has four main components: Risk and Control Matrix: Maps risk statements to the relevant control objectives, organized by adoption stage. This is where institutions figure out which controls actually apply to them. Guidebook and Control Reference Guide: Operational guidance on how to implement each control, including concrete examples of evidence that satisfies the requirement. Guidance intended to support audit preparation and supervisory review. Quick Start Guide: A smaller control set for institutions early in their AI adoption. It establishes a governance baseline without requiring the full framework on day one. Adoption Stage Questionnaire: The entry point that determines scope. Everything else flows from this. The implementation sequence follows a five-step loop: Assess, Customize, Implement, Integrate, Evolve. The last step matters. AI systems drift, get retrained, expand into new use cases. The framework expects controls to evolve alongside the systems they govern. What the Controls Actually Cover The control set spans the full AI lifecycle across several operational domains. For a bank deploying AI in lending decisions, this translates into concrete governance requirements across four areas. Model lifecycle management Controls address model design, testing, monitoring, drift detection, explainability thresholds, and rollback procedures. For credit underwriting, this means documented processes for catching when a model stops performing as expected and clear steps for what happens next. Consumer protection Fairness, explainability, and data documentation requirements sit here. If your model is making credit decisions, you need to be able to explain those decisions, document what data went into training it, and demonstrate that fairness testing was actually performed rather than assumed. Resilience and security Cybersecurity exposure, adversarial risks, and vendor dependencies all get coverage. AI systems introduce additional security considerations, including model inversion, adversarial inputs, and dependencies on external foundation models. Third-party governance This one is particularly relevant for smaller institutions. Most community banks and regional institutions are not building models in-house. They are buying or licensing them from technology vendors. The framework requires meaningful oversight of those vendor systems, which creates a real operational challenge when vendors are not forthcoming about their internal model practices. Why This Matters Right Now The framework arrives as AI moves from internal automation into decisions that directly affect consumers: credit approvals, fraud flags, customer service responses. As these systems take on higher-stakes functions, the gap between governance-as-principle and governance-as-practice becomes a regulatory liability. What the FS AI RMF represents, beyond its specific controls, is a coordination mechanism. Treasury and the Cyber Risk Institute developed it with input from financial institutions, regulatory bodies, and standards organizations. The goal appears to be establishing a shared understanding of what adequate AI risk management looks like before formal regulation arrives and forces the conversation. For institutions that have already built robust model risk management programs, much of this is not new territory. The framework was deliberately designed to align with existing MRM practices rather than replace them. What it adds is a structured way to extend those practices to AI-specific risks that traditional model validation was not designed to catch: algorithmic fairness, adversarial robustness, foundation model dependencies. The Open Questions The framework is voluntary. That status will matter a great deal for how quickly institutions adopt it. Voluntary frameworks in financial services have a history of becoming effectively mandatory once regulators begin referencing them in examination guidance, but that process takes time and the current framework offers no guarantee of that trajectory. Vendor transparency remains an unsolved problem. The framework correctly identifies third-party AI oversight as a priority control area. It does not solve the practical reality that many vendors treat their model internals as proprietary and are not interested in producing the documentation their customers’ regulators want to see. For institutions with global operations,

Uncategorized

When AI Code Security ToolsBecome Part of the SupplyChain

When AI Code Security Tools Become Part of the Supply Chain A development team merges a pull request. The patch came from an AI security assistant. It fixed a real vulnerability and passed every test. Days later, a deeper audit turns up something strange: buried inside the fix is a subtle change to authentication logic. Nothing obviously broken. Nothing flagged during review. Just a quiet, coherent-looking alteration that slipped through because the code looked right. That scenario reflects the kind of threat model security teams now need to build around AI coding assistants. What This Is and Why It Matters Claude Code Security is an AI-driven analysis layer integrated into developer workflows and CI pipelines. It can scan full repositories, verify vulnerability findings through multiple reasoning stages, and propose patches with associated confidence scores. Results surface through dashboards, pull request comments, or automated pipeline checks. As these capabilities extend further into the development lifecycle, security teams need to account for them not simply as developer utilities, but as components with credentials, configuration surfaces, and access to sensitive code. That puts them in the same governance category as CI servers and build pipelines, and raises governance questions about how they should be managed. The Tool That Became Infrastructure The system sits alongside CI servers, build pipelines, and source repositories. It holds credentials, reads proprietary code, and proposes changes through pull request comments or suggestions. That profile is meaningfully different from earlier developer tools. Build systems and package registries occupy a similar position in the supply chain, and the security community spent years developing governance practices to account for their access and influence. AI code assistants present a similar set of considerations. How It Actually Works When Claude Code Security scans a codebase, it receives code context through API calls. It analyzes repository structure and source files, detects potential vulnerabilities, and verifies findings through multiple reasoning stages before generating recommended fixes. The underlying model is Claude Opus 4.6. By default, the tool operates in read-only mode. Write operations, command execution, and network requests all require explicit approval. File writes are constrained to the working directory. High-risk commands are blocked unless deliberately unlocked. Cloud deployments run inside isolated virtual machines. Access to services like GitHub is routed through a scoped proxy using short-lived credentials. Session logs are kept, and the VM is destroyed when the session ends. CI integrations work differently. When configured through a GitHub Action, the pipeline sends pull request diffs and contextual files directly to the Claude API. The model analyzes the changes and returns findings as comments or merge suggestions. The CI runner holds both the Anthropic API key and the repository permissions. Claude Code Security receives code through API calls but does not directly hold repository credentials. Configuration files, including .claude project settings and MCP server definitions, control which tools the assistant can call and what it is permitted to do. Where the Old Threat Model Falls Short Traditional supply chain components execute deterministic logic. A CI pipeline runs the same commands the same way every time. A package registry serves the same artifact to every caller. When they misbehave, it is usually because an attacker modified the input, not because the component reinterpreted the situation. AI assistants operate differently. They reason across context. They generate output that reflects the full content of what they were given, including documentation, comments, and instructions embedded in the codebase itself. That creates a convergence of three characteristics that previously appeared in separate components: access to sensitive data, external communication with model provider infrastructure via API, and consumption of untrusted inputs from repositories and files the assistant has no independent way to verify.Together, those three things constitute a new trust boundary in the development pipeline, one that existing threat models typically do not account for. What This Means for Enterprise Security Teams Identity and credential hygiene. API keys used in CI integrations should be treated with the same rigor as production secrets. These keys grant read access to proprietary source code and the ability to interact with live development pipelines. Compromise of the CI environment or a workflow configuration file extends to whatever the AI assistant can reach. Configuration as policy. Files that govern the assistant’s behavior, including project policies, MCP server configurations, and CI workflow definitions, function as policy code. Unauthorized modification can expand permissions, introduce malicious tool definitions, or alter what the assistant is permitted to do during automated runs. Structured review for AI-generated patches. AI-generated code changes should be clearly labeled in the pull request workflow and subject to review processes that account for their origin. Some organizations are exploring semantic diff tools and attestation-based merge policies to ensure that logic changes receive appropriate scrutiny. Supply chain inventory. These assistants belong in threat models alongside CI servers, dependency registries, and developer endpoints. That means tracking what they can access, what credentials they hold, and what configuration surfaces exist for each deployment. Risks and Open Questions API key exposure. CI workflows that integrate AI services rely on API keys stored as repository or organization secrets. Compromise of the CI environment or a misconfigured workflow definition can expose those keys to misuse. Configuration tampering. Changes to assistant configuration files or MCP server definitions can alter the tool’s available capabilities or elevate its permissions during automated runs, without triggering typical security monitoring or alerts. Prompt injection. Documentation, comments, and embedded instructions in a repository can influence the assistant’s reasoning. Content in a README or inline comment could cause the assistant to surface misleading findings or generate code suggestions that appear legitimate but are not. Patch manipulation. If an attacker introduces misleading context into a repository, the assistant may generate suggested fixes that appear correct but weaken security controls in ways that are difficult to detect during standard review. This risk emerges from combining probabilistic reasoning with review workflows designed around human-authored code. Data handling. Prompts and outputs sent to the API are subject to provider logging policies and are not encrypted

issue-7
Newsletter, Prompt Vault Resources, Uncategorized

The Enterprise AI Brief | Issue 7

The Enterprise AI Brief | Issue 7 Inside This Issue The Threat Room When AI Code Security Tools Become Part of the Supply Chain AI coding assistants have moved beyond autocomplete. Claude Code Security can scan full repositories, verify vulnerability findings, and propose patches directly in the pull request workflow. That puts it alongside CI servers and build pipelines as a component with its own credentials, configuration surfaces, and access to sensitive code. Security teams that have not yet accounted for it in their supply chain governance probably should. → Read the full article The Operations Room Treasury’s New AI Risk Framework Gives the Financial Sector a Governance Playbook The Treasury’s new Financial Services AI Risk Management Framework turns the abstract ideas of trustworthy AI into something financial institutions can actually implement. Instead of principles alone, it introduces more than 200 concrete control objectives and a toolkit built for real governance workflows. For banks deploying AI in lending, fraud detection, and customer systems, the question is no longer whether governance exists. It is whether governance holds up under examination. → Read the full article The Engineering Room When Code Scanners Don’t Understand What Code Does Static code scanners have spent decades searching for patterns. A new generation of security tools is trying something different. Anthropic’s Claude Code Security analyzes repositories by reasoning through data flows and component interactions, then challenges its own findings before surfacing vulnerabilities. The shift from rule-based detection to reasoning-based analysis is beginning to change how security teams review code in modern AI-driven development pipelines. → Read the full article The Governance Room NIST Launches Initiative to Define Identity and Security Standards for AI Agents AI agents are already operating inside enterprise systems, calling APIs, accessing internal data, and executing actions across multiple services autonomously. That creates an unsolved governance problem: how do you authenticate an agent, scope its permissions, and audit what it did? In February 2026, NIST launched an initiative to establish identity, security, and interoperability standards for autonomous agents. The work is early-stage, but agent identity, authorization, and traceability are emerging as targets for standardization. For enterprises deploying agents ahead of those standards, the governance gap is theirs to close. → Read the full article

Whitepapers LLM post
Whitepapers

The LLM Security Gap 

The LLM Security Gap  The LLM Security: Why Blocking Isn’t Protection, and What Enterprises Actually Need Gap  Executive Summary  Large Language Models are transforming enterprise productivity. They’re also creating a data security problem that existing tools weren’t designed to solve.  The instinct is to block. Restrict LLM access. Sanitize everything. But blocking doesn’t protect; it just pushes the problem underground. Employees route around restrictions. Shadow AI proliferates. The data leaks anyway, just without visibility or audit trails.  Meanwhile, the tools marketed as “LLM security” fall into two failure modes: they either break workflows (making LLMs useless for real work) or fail silently (letting sensitive data through while appearing to work).  This whitepaper explains:  The bottom line: Enterprises need to let authorized people work with sensitive data in LLM workflows, while preventing unauthorized exposure and maintaining audit trails. That’s a different problem than “block all sensitive data from LLMs,” and it requires a different solution.  The Problem in Plain Language  Every day, employees use LLMs with sensitive data: fraud analysts investigating customers, compliance officers reviewing filings, support agents drafting responses, lawyers analyzing contracts.  This isn’t misuse. This is the use case.  LLMs are valuable because they work with real business contexts. An AI that can’t see the customer’s complaint can’t help draft a response. An AI that can’t see transaction history can’t identify fraud patterns.  The question isn’t whether sensitive data will enter LLM workflows. It will. The question is: what controls exist when it does?  What Buyers Get Wrong Today  Wrong Assumption #1: “We’ll just block LLMs.”  Some enterprises restrict LLM access entirely. No ChatGPT. No Copilot. No AI tools.  Why this fails:  Blocking doesn’t eliminate risk. It eliminates visibility into risk.  Wrong Assumption #2: “Sanitization solves it.”  Other enterprises deploy sanitization tools: scan prompts, masks, or redact sensitive data before it reaches the LLM.  Why this fails:  Sanitization protects the LLM from your data. It doesn’t protect your data while letting people use it.  Wrong Assumption #3: “Anonymization is enough.”  Some enterprises anonymize data before LLM processing: replace real names with fake ones, remove identifiers.  Why this fails:  Anonymization is the right tool for the wrong job.  Wrong Assumption #4: “Our existing DLP handles it.”  Traditional DLP tools monitor network traffic, email, and file transfers. Some assume these cover LLM workflows.  Why this fails:  DLP protects perimeters. LLM security requires protecting data within workflows.  Why Current Tools Fail Silently  The most dangerous failure isn’t the one that breaks your workflow. It’s the one that appears to work.  Silent failure means sensitive data escapes protection, and no one knows.  How this happens:  An auditor asks: “Prove no customer SSNs were exposed to the LLM last quarter.”  With sanitization tools, the honest answer is: “We can prove what we detected. We cannot prove what we missed.”  That’s not compliance. That’s hope.  The Regulatory Pressure  Regulators are paying attention. LLMs create new data exposure vectors that existing frameworks didn’t anticipate, but existing obligations still apply.  GDPR / Privacy: Data minimization, purpose limitation, right to access/erasure. Internal LLM workflows often require identifiable data, triggering all usual obligations.  HIPAA: Minimum necessary, audit controls, business associate agreements. Sanitization blocks PHI but also blocks clinicians from legitimate care purposes.  Financial (SOX, PCI-DSS, GLBA): Access controls, audit trails, data retention. Can you demonstrate segregation of duties when AI is involved?  Blocking LLMs doesn’t eliminate regulatory risk. It means employees use uncontrolled channels instead.  The Enterprise Risk  Beyond compliance, LLM security gaps create direct business risk:  The Operational Friction  Security controls that break workflows aren’t just inconvenient. They’re counterproductive.  When security tools block legitimate work:  The goal isn’t to prevent all access to sensitive data. It’s to enable authorized access with appropriate controls.  What Enterprises Actually Need  Enterprises successfully deploying LLMs have figured out something: the problem isn’t preventing access, it’s governing access.  Governed access means:  This is the model that actually works: protection without disruption for authorized work.  What this looks like in practice:  A fraud analyst and a marketing intern both submit prompts containing a customer’s SSN. Same data. Same LLM. Different outcomes.  The fraud analyst’s role is authorized for SSN access. The system recognizes this, allows detokenization, and the analyst sees the real SSN in the response. Workflow continues. Investigation proceeds.  The marketing intern’s role is not authorized. The system recognizes this, denies detokenization, and the intern sees a meaningless token instead of the SSN. They can’t access data they shouldn’t have. But the analyst sitting next to them can.  Same prompt. Same data. Same system. Different access based on role. Both workflows continue appropriately. That’s governed access.  Why the Market Isn’t Solving This Yet  The GenAI security market is young. Most solutions were adapted from adjacent problems rather than built for this one.  Gap 1: No Multi-Modal Governed Access  Solutions exist for text, but enterprise data lives in images, PDFs, and audio. A tool that protects text but ignores screenshots isn’t comprehensive.  Gap 2: Agentic AI Is Uncharted Territory  LLMs are evolving from chat interfaces to autonomous agents that take actions, call APIs, and chain decisions. Security models designed for single prompts don’t address multi-step workflows.  Agents break the assumptions current tools rely on:  Prompt-level controls evaluate a single input at a single moment. Agentic workflows require access decisions that persist, adapt, and audit across an entire task.  Gap 3: Detection Accuracy Is Unverified  Vendors claim high detection rates. Few publish benchmarks. Buyers are taking accuracy on faith.  Gap 4: No Standard Audit Format  Every solution logs differently. No industry-standard format for LLM security audit trails exists.  Gap 5: Role-Based Access Is Rare  Most tools are binary: block or allow. Few support “allow for this role, with this purpose, for this time window.”  Gap 6: Prompt-Only Security Is Insufficient  Many “AI firewall” solutions focus on scanning prompts for malicious input: jailbreaks, injection attacks. This matters, but it’s the wrong problem.  The primary risk isn’t malicious users crafting adversarial prompts. It’s legitimate users doing legitimate work with sensitive data. Prompt scanning can’t distinguish authorized access from unauthorized access. It treats all sensitive data as a threat.  The problem isn’t malicious input; it’s governing legitimate access.  The Decision Framework  When evaluating LLM security, ask these questions. If you don’t like the answers, you’re looking at a tool that will fail in production.  1. Does it preserve workflow for authorized users?  If the tool breaks legitimate work, users will work around it. Security that gets bypassed isn’t security; it’s theater.  Red flag: “All sensitive data is blocked/sanitized regardless of user role.”  2. Does it support role-based access?  Different users have different authorization levels. A tool that treats a fraud analyst and a marketing intern the same way doesn’t fit enterprise governance.  Red flag: “Access decisions are based on data type, not user authorization.”  3. Does it produce audit evidence