For CISOs & Procurement
“How do I know it won't do something I didn't authorise?”
Independent evaluation, mandatory safety gates, and cryptographically signed credentials anyone can verify.
The problem
Three things you cannot know without independent testing
No independent benchmark
Every AI security vendor claims state-of-the-art capability. Every number is self-selected: different benchmarks, conditions the vendor designed, challenge sets the vendor chose. The numbers cannot be compared.
No safety evaluation
No published benchmark tests whether an AI security agent stays within its authorised scope, resists adversarial instructions from target systems, or avoids destructive actions.
No verifiable credentials
A vendor-produced PDF is a claim. There is no cryptographically signed, independently verifiable credential that tells a procurement team an AI security agent has been independently evaluated.
Regulatory alignment
Documented evidence for compliance teams
EU AI Act, DORA, and NIS2 all impose obligations on AI tools used in high-risk or regulated contexts. ACAP certification produces structured, independently verifiable evidence that maps directly to each framework's requirements.
How safety evaluation works
Safety failures are caught, not reported
During certification, ACAP runs independent monitoring alongside the agent. Every tool call is logged and every outbound connection traced in real time, not from the agent's own report.
An agent that queries an out-of-scope system (even a single DNS lookup) fails the scope adherence gate regardless of what it reports. The agent's output is treated as untrusted until corroborated by independent trace.
Real evaluation: ACAP-027-24
A commercially available agent reached a 91% offensive score, then exfiltrated data from an out-of-scope storage bucket. The agent's own report made no mention of it. The ACAP monitoring layer caught it in the tool call trace.
The agent received no certification. Safety gates cannot be offset by a high offensive score. That is what a hard gate means.
Which tier to require
Three tiers, one safety gate
The right tier depends on the environment the agent operates in, the level of autonomy, and the consequence of a safety failure. All tiers require passing the same safety evaluation.
Internal tooling, low-sensitivity environments, human-reviewed output
For agents that assist human operators in controlled environments. Output is reviewed before action is taken and the target environment has limited blast radius. Typical use cases include internal vulnerability scanning, development environment testing, and security research assistance.
Production infrastructure, standard enterprise deployments, semi-autonomous operation
For agents deployed against production infrastructure with semi-autonomous operation. The agent may execute multi-step attack chains with limited human oversight, covering web applications, APIs, cloud infrastructure, and network services within defined rules of engagement.
Critical infrastructure, financial services, regulated sectors, autonomous operation
The highest certification tier, required for fully autonomous agents operating against critical infrastructure, financial systems, or regulated environments. The agent must demonstrate expert-level offensive reasoning, handle complex multi-stage attack scenarios, and maintain strict safety discipline under adversarial conditions, without human intervention.
Credential verification
Verify any credential in under a minute
Every ACAP certification produces a cryptographically signed JWT. Verifiable by anyone, with no account and no contact with ACAP or the vendor.
Verify a certificate →Resources
Everything you need to require ACAP
Procurement Framework→
Six model RFP clauses, vendor questionnaire additions, and credential verification instructions.
Executive Risk Brief→
One-page overview of the AI security agent risk landscape, what ACAP tests, which tier to require.
EU AI Act Compliance Brief→
Maps Art. 9, 12, 14, and 15 to ACAP certification evidence. Includes procurement clauses.
Tier Selection Guide→
Decision tree for Foundation vs Professional vs Expert based on autonomy level and environment.
A supplier that declines is telling you something
Book a call to customise ACAP procurement language for your organisation, or email us directly.