01Auditing AI Is Different
A national health insurer discovered this distinction during a state examination. Their internal audit team had included AI systems in their standard IT audit program. They verified access controls, change management, system availability—the usual controls. When the state insurance commissioner's examiner asked about algorithmic discrimination testing, the audit director pointed to their processing integrity controls. The examiner clarified: "Those controls verify that the system produces consistent outputs. I'm asking whether you've tested whether consistent outputs systematically disadvantage protected groups." The company had never audited for bias. Their IT controls couldn't answer the regulator's question. The examination became significantly more intensive than planned.
This gap between traditional IT audit and AI audit is common—and increasingly costly.
AI systems in production need audit, but traditional audit methods don't transfer directly. AI systems have characteristics that require adapted approaches: probabilistic outputs that vary even with identical inputs, learned behavior that can't be traced to written specifications, emergent bias that develops through data patterns, drift over time as the world changes, and complex interactions that are difficult to fully characterize.
Standard IT audits verify that systems operate according to specifications. AI systems learn their behavior, creating new audit challenges.
02The AI Audit Framework
Inventory and Classification
What to verify: Does the organization have a complete inventory of AI systems? Are systems appropriately classified by risk?
How to verify: Request the AI system inventory. Compare against other sources (contracts, infrastructure records, development backlogs). Verify classification methodology. Sample systems and validate classification.
Common gaps: Missing systems, especially those embedded in vendor products. Inconsistent classification criteria. No process for updating inventory as systems change.
Governance Structure
What to verify: Does governance exist with clear roles, processes, and authority?
How to verify: Review governance documentation. Interview governance participants. Observe governance in action. Verify that governance decisions are implemented.
Common gaps: Governance on paper but not in practice. Unclear accountability. Governance that doesn't cover all AI systems.
Development and Validation
What to verify: Were AI systems developed with appropriate rigor? Was validation sufficient for risk level?
How to verify: Review development documentation. Examine validation methodology and results. Verify that validation addressed relevant risks. Compare development practices to stated policies.
Common gaps: Undocumented development decisions. Insufficient validation for high-risk systems. Missing records of model selection rationale.
Data Governance
What to verify: Is training data appropriately sourced and documented? Are data quality controls in place? Is data provenance traceable?
How to verify: Review data sourcing documentation. Examine data quality metrics and monitoring. Trace sample data from source to model. Verify consent and authorization for data use.
Common gaps: Undocumented data sources. Missing data quality validation. Inability to trace data provenance.
Deployment Controls
What to verify: Are deployment processes controlled? Is configuration documented? Are access controls appropriate?
How to verify: Review deployment procedures and approval records. Examine configuration management. Test access controls against policy. Verify separation of duties.
Common gaps: Uncontrolled deployment paths. Undocumented configuration changes. Excessive access privileges.
Operational Monitoring
What to verify: Is AI system behavior monitored? Are anomalies detected and investigated? Is performance tracked against expectations?
How to verify: Review monitoring dashboards and alert configurations. Examine sample alert investigations. Compare actual performance to documented expectations. Verify escalation procedures.
Common gaps: Insufficient monitoring coverage. Unaddressed alerts. Missing performance baselines.
Decision Logging and Audit Trails
What to verify: Are AI decisions logged comprehensively? Are logs immutable? Are retention requirements met?
How to verify: Review logging configurations. Attempt log retrieval for sample decisions. Verify log integrity controls. Compare retention to requirements.
Common gaps: Incomplete logging. Mutable log storage. Premature log deletion.
Human Oversight
What to verify: Is human oversight proportionate to risk? Are oversight actions documented? Are overrides captured and analyzed?
How to verify: Review oversight procedures. Examine sample oversight decisions. Verify documentation of human review. Analyze override patterns.
Common gaps: Rubber-stamp oversight. Undocumented review decisions. Ignored override patterns.
Bias and Fairness
What to verify: Is bias monitored for protected characteristics? Are fairness metrics appropriate? Is remediation effective?
How to verify: Review bias monitoring methodology. Examine fairness metrics and thresholds. Analyze outcomes by protected groups. Verify remediation actions.
Common gaps: Incomplete protected class coverage. Inappropriate fairness metrics. Unaddressed bias findings.
Incident Response
What to verify: Are AI-specific incident procedures in place? Are incidents documented and analyzed? Is remediation tracked?
How to verify: Review incident response procedures. Examine sample incident records. Verify root cause analysis. Track remediation completion.
Common gaps: Generic procedures that miss AI-specific issues. Incomplete incident documentation. Untracked remediation.
03Audit Methods for AI Systems
Document Review
Examine policies, procedures, and records. For AI systems, this includes AI governance policies and standards, model development documentation, validation and testing records, deployment approvals, operational procedures, and incident records.
Testing
Verify that documented controls work as described. For AI systems, this includes input validation testing, output monitoring verification, access control testing, log retrieval testing, and alert response testing.
Sampling
Examine specific instances to verify general compliance. For AI systems, this includes sample decisions for audit trail completeness, sample models for documentation compliance, sample incidents for procedure compliance, and sample data sources for governance compliance.
Observation
Watch processes as they occur. For AI systems, this includes observing oversight processes, watching incident response, monitoring operational procedures, and observing deployment processes.
Inquiry
Interview personnel to understand processes and identify gaps. For AI systems, this includes interviewing data scientists about development practices, interviewing operators about monitoring practices, interviewing oversight personnel about review processes, and interviewing leadership about governance understanding.
04Audit Evidence for AI Systems
What Constitutes Good Evidence
Evidence should be complete (capturing all relevant information), accurate (reflecting actual events correctly), timely (recorded contemporaneously with events), immutable (cannot be altered after creation), accessible (can be retrieved efficiently), and understandable (interpretable by auditors).
Evidence Challenges Specific to AI
Model behavior is difficult to capture completely. Probabilistic outputs vary in ways that are expected. Drift occurs gradually and may not trigger discrete events. Complexity may exceed auditor understanding.
Addressing Evidence Challenges
Capture decision context, not just outputs. Establish expected variation ranges and document departures. Implement drift detection with documented thresholds. Provide auditor training on AI-specific concepts.
05How Platforms Like Veratrace Support Audits
AI governance platforms provide infrastructure that makes audits more effective: comprehensive audit trails that satisfy evidence requirements, system inventories that demonstrate coverage, approval workflows that create accountability records, monitoring dashboards that demonstrate operational oversight, and reporting that supports audit sampling and analysis.
The goal is to make audit evidence a byproduct of normal operations rather than a special effort.
06Conclusion
Auditing AI systems requires adapted approaches that address their unique characteristics. Organizations that build auditability into their AI systems from the start will find audits less disruptive and more effective.
The key is treating audit readiness as an operational requirement, not an afterthought. AI systems should generate audit evidence continuously, making audits a matter of analysis rather than reconstruction.

