Key Takeaways 
  • Federal AI adoption is accelerating faster than governance and approved security tooling. Risk now spans models, infrastructure, and the software supply chain. 
  • AI threats often mimic normal usage, which makes it difficult to detect with static methods. 
  • Meeting mandated federal standards requires continuous, evidence-based validation, rather than point-in-time checks. 
  • AI security now depends on unified visibility, risk prioritization, and closed-loop remediation. 
  • Qualys TotalAI brings these elements together by unifying testing, risk scoring, and remediation into a single workflow within the FedRAMP Moderate Authorized framework. 

Why Federal AI Security Requires More Than Standard Scanning 

AI systems require a security paradigm distinct from traditional IT. Safeguarding these assets requires end-to-end visibility and audit-ready evidence derived from a platform engineered for the full AI stack. Effective federal AI security extends beyond server-level scanning to continuous behavioral testing of the LLM itself, by identifying prompt manipulation, data leakage, and misuse at the inference level. By deploying AI-specific capabilities, agencies can satisfy federal AI modernization mandates while generating the rigorous compliance evidence and measurable risk reduction necessary to protect the mission.

Federal AI adoption has accelerated sharply. An early GAO report found that AI use cases nearly doubled in one year across 11 major agencies, from 571 in 2023 to 1,110 in 2024. Even more recently, ExecutiveGov’s analysis highlights the latest Federal Agency AI use case inventory, doubling to 3,611 across 56 agencies. This shows that AI is no longer experimental; it is an operational imperative.  

Executive Order 14179 and OMB M-25-21 require mandated AI modernization, while prioritizing accelerated deployment and reduction of regulatory barriers. However, the push for broader adoption is currently constrained by a lack of FedRAMP-authorized tools capable of securing the unique risks of LLMs and their infrastructure. CISA Binding Operational Directive (BOD) 23-01 and BOD 26-02 elevate these requirements from best practices to legal mandates: federal agencies are now required to maintain absolute visibility over every asset, meaning AI models, model endpoints, and the edge-based Model Context Protocol (MCP) servers connecting them must be identified and secured to remain in compliance. 

Most agencies struggle to meet these requirements because their existing security tools lack the necessary FedRAMP-authorized, AI-specific oversight. The goal is to gain end-to-end visibility, reduce risk, and produce audit-ready evidence from a single platform built for the entire AI stack. 

How to Secure Federal AI Systems: Understanding the 3-Layer Attack Surface

When a federal agency deploys an AI system, it’s deploying an interconnected architecture: models, training data, APIs, supporting infrastructure, and pipelines connecting them, each with its own exposure. Federal mandates now presume continuous, defensible security across all of it. 

The AI layer 

This is the layer most agencies focus on: models, LLMs, prompts, datasets, APIs, agentic AI tool connectors (notably MCP servers), and inference services. The main attack vector here isn’t malicious code, but natural language. 

A simple text command can manipulate an LLM to leak sensitive data, bypass safety filters, or misuse its functions in ways that look like a normal user request. Standard scanners and firewalls are blind to these semantic threats, because the exploit looks like a conversation. Securing this layer requires continuous behavioral testing for prompt manipulation, jailbreaks, data leaks, multimodal exploits, hallucinations, and model denial-of-service. 

For agencies running AI in mission-critical workflows like fraud detection or cyber defense, a compromised model isn’t just another vulnerability; it’s a core system failure with direct mission impact. 

The infrastructure layer 

This is where AI workloads run: GPU hosts, vector databases, the Python and CUDA libraries powering them, MCP servers bridging models to agency data, containers, cloud services, data pipelines, APIs, and edge devices. It’s large, dynamic, and often shared. 

If the systems powering AI are vulnerable, attackers don’t need to target the model directly; they can reach it through the infrastructure beneath it. Securing this layer means assessing and prioritizing vulnerabilities across every component, and extending risk governance to the APIs, MCP connectors, and data services feeding the models. 

The exposure layer 

Unpatched dependencies and misconfigurations in the AI software supply chain create silent entry points. Modern AI systems depend on open-source frameworks and packages, many of which have known vulnerabilities that are deprioritized as low-severity on their own. But a misconfigured endpoint or outdated library can expose an AI system; risk model-level tools miss entirely. 

The combined effect is what makes this architecture dangerous. An attacker who understands the full stack can chain a path across all three layers, and no single-layer view captures that risk. 


Qualys Insights


Legacy tools were built for a different threat model, one where assets are relatively static, and risk is measured by known CVEs against patched software versions. AI workloads don’t fit that model, and the gaps are systematic. 

Standard vulnerability scanning misses AI-specific threats 

Traditional scanners check the LLM environment, host OS, packages, and container images against a CVE database. If everything is patched, the result is clean. But threats like prompt manipulation, data leakage, multimodal exploits, and model misuse aren’t in that database. These are behavioral, AI-specific threats that require continuous testing, not static analysis of the host. 

Infrastructure scanners stop at the model boundary 

Most agencies have robust vulnerability management programs for hosts, networks, containers, and cloud infrastructure. But these weren’t designed to discover and inventory AI-specific assets like LLM endpoints, model services, datasets, and MCP servers. As a result, the AI workloads running on the infrastructure are systematically unaccounted for. 

Shadow AI doesn’t just live in the cloud 

It lives in local GPU clusters, unmanaged containers, developer workstations running self-hosted models, MCP servers bound to localhost, and third-party SaaS applications operating outside official oversight. Eliminating these blind spots requires visibility across every layer: 

  • On the host: Self-hosted models and the local Python and CUDA libraries that power them. 
  • In the cloud: Unmanaged AI services and vector databases in AWS, Azure, and GCP. 
  • On the network: Unauthorized SaaS AI services and LLM APIs, even outside the official stack. 
  • In the supply chain: MCP SDK usage in application code, signaling AI integrations spreading earlier in the development lifecycle. 

No combination of manual processes and fragmented tools can produce a complete, continuously updated AI inventory at the speed and scale federal environments demand. 

Fragmented tools produce fragmented risk 

In most federal environments, AI teams manage models, infrastructure teams manage hardware, and security teams manage vulnerabilities — each producing its own output, in its own format, without a shared risk model. Prioritization defaults to the loudest alert rather than actual mission impact. When CAIOs and CISOs need to report AI risk, there’s no single source of truth. 

Point-in-time assessments fall short of continuous monitoring requirements 

OMB M-25-21, CISA BOD 26-02, CISA BOD 23-01, and the NIST AI RMF all mandate a continuous, real-time security posture. AI workloads are dynamic — models update, new APIs get exposed, dependencies shift. A periodic scan captures a snapshot of a moving target. By the time a report is finalized, the environment has already changed. 

The Case for Unified AI Risk Management Under FedRAMP 

Combining AI security, infrastructure security, and compliance reporting into a unified risk model is the only way to address both threats and mandates. Attackers don’t operate in silos. A threat that spans the AI layer, an insecure dependency, and the underlying infrastructure is a single risk — siloed tools might address it in the wrong order or miss it entirely. The answer isn’t more tools; it’s a common risk model connecting them. 

OMB M-25-21 requires continuous, enterprise-scale AI risk management with evidence for governance boards. That requires continuous discovery, risk-based prioritization, and audit-ready evidence from a single authoritative source, not manually assembled reports. Zero Trust reinforces this. AI models, model endpoints, MCP servers, APIs, and datasets are all assets. If they aren’t assessed and hardened with the same rigor as the rest of the infrastructure, they become unmanaged weak points in a framework designed to have none. 

Qualys TotalAI with FedRAMP Moderate Authorization: Platform Capabilities

Qualys TotalAI is now FedRAMP Moderate Authorized and available on the Qualys Cloud Platform, which unifies TotalAI for infrastructure risk management, and TruRisk Eliminate™ for operational hardening and attack surface reduction.

TotalAI’s FedRAMP Moderate Authorization removes the main barrier to deployment for civilian agencies, DoD components, and defense contractors. It works with existing Qualys agents, requiring no new tools or disruptions to your systems. 

AI Asset Discovery and Inventory 

Continuously discover and inventory all AI assets, including models, APIs, prompts, datasets, MCP servers, and supporting infrastructure. Our three purpose-built discovery engines, Cloud Agents on the host, Cloud Connectors in the cloud, and Network Detection and Response (NDR) on the network, eliminate shadow AI blind spots and produce a continuously updated, audit-ready inventory without manual effort. A “confirmed vs. potential” split lets governance teams distinguish models that have been scanned and assessed from models that exist in cloud accounts but have not yet been approved. 

AI/LLM Security Testing 

Continuously test LLM behavior for prompt manipulation, data leakage, and misuse. We cover 38–40 distinct attack scenarios, including prompt injection, jailbreak techniques, multilingual exploits, bias amplification, package hallucinations, and model denial-of-service, with 650+ AI-specific detections aligned to the OWASP Top 10 for LLM Applications, MITRE ATLAS, and the EU AI Act. 

Multimodal threat detection finds prompts and perturbations hidden within images, audio, and video, exploiting cross-modal features that text-only scanners miss entirely. For sensitive models, an on-premises scanner runs the same tests behind your firewalls. We support any OpenAI-compatible API, plus AWS Bedrock, Azure OpenAI, Google Vertex, and more. Integrate security results into your MLOps pipelines to ensure only validated models reach production. 

Securing the Systems That Power AI (VMDR + ETM) 

Assess and prioritize vulnerabilities across GPU hosts, containers, and cloud services. TruRisk™ maps asset criticality to mission impact, focusing remediation on what matters most. 

One Score from API to LLM 

Get a single TruRisk™ score across all infrastructure, applications, and AI systems. We provide executive-ready summaries for leadership and detailed technical findings for engineering teams. 

Enforcing Remediation and Hardening (TruRisk Eliminate™) 

Automate baseline enforcement and remove attack pathways. Apply patchless mitigation for vulnerabilities where a patch is not available, or a reboot is not an option. Integrate with ITSM workflows, such as ServiceNow, to speed up response times. 

Continuous Monitoring, Evidence-Backed Detections for Audit Readiness 

Every detection includes a full evidence trail, prompt, response, analysis, mapped to OWASP LLM, MITRE ATLAS, and EU AI Act categories. Reports log severity, failed checks, jailbreak counts, scan mode, and option profile for repeatability: same scope, same settings, measurable drift over time.

That evidence feeds RMF, ATO, and ConMon workflows through continuous monitoring and automated checks against NIST AI RMF, NIST SP 800-53, and CMMC 2.0 run on every posture change. 

Securing AI Means Securing Everything It Touches 

Federal agencies must adopt AI quickly, while maintaining a defensible security posture. This requires continuous visibility and risk-based remediation, not siloed tools. Qualys TotalAI provides this on a platform agencies already trust. 

With FedRAMP Moderate Authorization, it’s ready to deploy today. 


See your agency’s AI attack surface before an auditor does.


Frequently Asked Questions (FAQs) 

Which federal mandates govern AI security for U.S. agencies?

OMB M-25-21, CISA BOD 23-01, and CISA BOD 26-02 set the baseline. They require continuous visibility, risk-based controls, and audit-ready evidence — not annual reviews. TotalAI aligns with NIST AI RMF, NIST SP 800-53, FISMA, and CMMC 2.0 through automated, continuous checks, so agencies don’t have to manually assemble compliance evidence before every reporting cycle.

How should federal agencies secure AI infrastructure?

Continuous assessment across GPU hosts, containers, cloud services (AWS, Azure, GCP), serverless functions, and data pipelines, prioritized by mission impact, not just CVSS score. TotalAI maps asset criticality to operational risk, so agencies close the attack paths that matter, not just the ones that score highest.

Why is LLM security testing different from traditional application security?

Traditional AppSec finds code vulnerabilities. LLMs introduce inference-time threats, prompt injection, data leakage, and model misuse that look like normal interactions to static analysis tools. TotalAI’s 650+ AI-specific detections continuously test LLM behavior at runtime, aligned with the OWASP Top 10 for LLM Applications.

How do federal agencies manage AI supply chain risk?

Most federal AI systems pull from open-source components with known, unpatched vulnerabilities. The risk isn’t always at the model layer; it’s in the dependencies underneath it. TotalAI continuously assesses the full AI software supply chain, surfacing misconfigurations and silent entry points that model-level tools miss entirely.

How is AI security deployed in federal environments?

TotalAI deploys via federally approved Cloud Agents, no code changes, no system disruption. Discovery of AI assets across host, cloud, network, and SaaS environments starts immediately.

What does FedRAMP authorization mean for AI security platforms?

FedRAMP Moderate Authorization removes the primary deployment barrier. For federal agencies, it means immediate deployment eligibility with no additional ATO effort, a continuously authorized compliance baseline across NIST AI RMF, CMMC 2.0, FISMA, and NIST SP 800-53, and continuously generated audit-ready evidence.

How does Qualys TotalAI unify AI risk management for federal agencies?

Most agencies manage AI risk across disconnected tools; one for infrastructure, another for AppSec, another for compliance. TotalAI connects AI models, infrastructure, and dependencies into a single risk model, ensuring consistent prioritization and an always-current compliance view.



Source link