9 ways CISOs can combat AI ideas

0 2 6 minutes read

AI hallucinations are a well-known problem and, when it comes to compliance testing, these convincing but inaccurate tests can cause real damage through poor risk assessments, incorrect policy guidance, or incorrect incident reports.

Cybersecurity leaders say the real problem starts when the AI passes text messages and starts making phone calls. This is where they are asked to determine things like whether security regulators are doing their job, if the company meets compliance standards, or if the incident was handled properly.

Here are nine ways CISOs can tackle the AI vision crisis.

Keep people in the loop for top decisions

Fred Kwong, vice president and CISO at DeVry University, says his team is carefully evaluating AI in the role of governance, risk, and compliance, particularly in third-party risk assessment. He notes that while AI helps review vendors’ questionnaires and supporting evidence that assesses those vendors’ security posture, it doesn’t replace humans.

“What we see is that the interpretation is not as good as I would like it to be, or it is different from the way we interpret it as people,” said Kwong.

He explains that AI often learns regulatory requirements differently than experienced security professionals. As a result, his team is still reviewing the results manually. At the moment, AI doesn’t save much time because trust in the technology isn’t there yet, he said.

Mignona Coté, senior vice president and CISO at Infor, agrees that human oversight is important, especially for risk scoring, control testing, and incident assessment. “Keep a human in the loop, full stop,” says Coté, who sees AI as a productivity tool, not something that should make final decisions on its own.

Treat AI outputs as drafts, not finished products

One of the biggest risks is over-trusting AI, according to security experts. Coté says his organization changed its policy so that AI-generated content cannot go directly to compliance documents without human review.

“The moment your team starts treating AI-generated feedback as a finished work product, you have a problem,” he says. “Treat all your output as a first draft versus a final one. There will come a time when repeated questions will have repeated answers. By labeling those answers and time-stamping them at the time of initiation, they can be addressed in moderation.”

Srikumar Ramanathan, chief solutions officer at Mphasis, says this overconfidence often stems from what he calls “automation bias.” People naturally think that something written clearly and confidently must be right.

To combat this, he says companies must create a culture of “skepticism”. “[That means] viewing AI results as unverified documents that require a signature of human accountability before they are valid,” he explains.

Look for evidence, not polished prose, from salespeople

When vendors claim their AI can “check compliance” or “validate controls,” security leaders say consumers need to ask tough questions.

Kwong says he’s pushing vendors to provide traceability of AI responses so his team can see how the AI reached its conclusions. “Without that traceability, it makes it very difficult for us to see it,” he said.

Ramanathan says buyers should ask if the program can identify specific evidence behind its response, such as a time-stamped log entry or a specific configuration file. If it can’t, the tool is probably generating text that sounds good.

Puneet Bhatnagar, a cybersecurity and identity leader, says the key question is whether AI is actually analyzing live performance data or simply summarizing documents. “If a broker can’t show a consistent evidence trail behind their conclusion, they’re probably creating a story — not doing the analysis,” said Bhatnagar, who recently served as SVP and head of identity management at Blackstone. “Compliance isn’t about language. It’s about evidence.”

Stress test models before extending the trust

Kwong recommends testing AI tools to see how compatible they are. For example, send the same data twice and compare the results.

“If you send the same data again, does it return the same result?” he asks.

If the answers change a lot, that’s a red flag. He also suggests removing key evidence to see how the model reacts. If it still gives you a confident answer, that may indicate hallucinations.

Coté says his team evaluates the results of AI against other tools, including scanning systems and external penetration testing results. “And we don’t recommend trusting any AI tool until it proves to contradict known results over and over again,” he said.

Measure hallucinations and monitor drift

Security leaders say organizations need to track how accurate AI is over time. Kwong says teams should regularly compare AI-generated reviews with human reviews and learn the differences. That process should happen at least quarterly.

Ramanathan suggests tracking metrics like “drift rate,” which measures how often the AI’s conclusions differ from human reviews. “A model that was 92% accurate six months ago and is 85% accurate today is more dangerous than one that has always been at 80% because the trust of your team has been measured at a high value,” he notes.

He also recommends measuring how often the evidence cited supports AI claims. If the levels of hallucinations increase too much, organizations should reduce the amount of authority AI has, for example, reduce it to a less autonomous role in their management models.

View contextual blind spots in compliance calculations

Bhatnagar says the most dangerous manipulations occur when AI is asked to make calls about control effectiveness, control gaps, or the impact of an incident.

AI can produce what he calls “reasonable compliance”, or answers that sound convincing but are wrong because they lack real-world context. Compliance often depends on technical details, compensatory controls, and operational realities that documentation alone cannot reveal.

Ramanathan adds that AI often struggles with permissive language, (“may,” “can”) versus restrictive language (“must,” “need”).

“For example, AI often misinterprets permissive language such as ’employees can access the system after completing training’ as a strict, enforceable rule, treating the permissions you choose as mandatory controls,” explains Ramanathan. “This causes AI to overestimate the authority of permissive or ambiguous language, leading to incorrect assumptions about whether policies are being properly enforced or security measures are working.”

Go back to normal or similar tests

Some vendors are tight-lipped about what their AI tools actually do. Bhatnagar says many tools summarize documents or generate gap reports but vendors market those features as if they do full, automated compliance checks.

The risk increases when multiple customers receive nearly the same test. Organizations may believe that their controls have been thoroughly tested when AI only performs high-level document review.

Ramanathan says this creates false confidence and a wider industry risk. If one popular model has a flaw, that blind spot can spread widely.

Bhatnagar adds that he’s seen vendors marketing AI tools as they test organizations for compliance, even when multiple customers receive the same or nearly identical tests.

In those cases, the tool may not actually be analyzing company-specific policies or credentials but instead producing text that appears to be customized without any basis in fact, he says. “We’re in the early stages of separating AI narrative generation from AI-based authentication,” he says. “That difference will define the next phase of using governance tools.”

Strengthen accountability in audits and legal reviews

From a regulatory perspective, AI does not remove responsibility, according to experts. Ramanathan says regulators are clear that the duty of care rests with corporate executives.

“If an AI-generated assessment misses a material weakness, the organization is responsible for ‘failure to monitor,'” he says. “We’re already in an era where relying on unproven AI results can be seen as gross negligence. If the audit findings are wrong due to an AI error, you don’t just fail the audit, you’re liable to file a misleading regulatory statement. ‘AI told me so’ is not a defense.”

Coté says being able to show that someone reviewed and approved each resulting decision is important during an audit. “The important thing is to prove that a person was present in all successive decisions, with a time stamp and a research method to support that,” he notes.

Be careful with the default control map

Ramanathan says one of the biggest compliance risks comes when companies rely on AI to map internal controls to regulatory frameworks, such as GDPR or SOC 2.

“The biggest danger to compliance is in the automatic drawing of regulation,” he notes. “AI may confidently say that a controller exists or satisfies a requirement based on a linguistic pattern rather than a practical or functional fact.”

For example, an AI tool might see an encryption setting listed in the database configuration and assume that encryption is enabled, even if that feature is turned off in the system.

Ramanathan says this can create a “huge security gap where a company believes it is prepared for an audit, only to discover during a breach that proven AI defenses were missing or poorly configured.”

To mitigate that risk, he says organizations need to clearly frame their policies and rules and link them to enforceable technology rules rather than relying solely on AI to translate documents.