Go To Search
Click to Home
AI Agents: What They Are and How Will We Audit Them?
By Rick Gamache

It‘s been a year since my last article about artificial intelligence, or AI, appeared in this publication. That article kicked off a year-long journey discussing AI with auditors, technologists, and industry leaders about how AI is currently being used and how AI is being assessed across various industries, including government. I encourage all ALGA members to read that article for a basic understanding of AI including, what AI is, what AI is not, and what impacts AI may have on the audit profession.

Generally, auditors are optimistic about AI’s potential to shape our industry in many positive ways. By adding these powerful AI tools to our toolboxes, the future auditor will gain insights that highlight the value an auditor brings to the table. Though, like any other new technology, AI poses almost as much of a challenge to our profession as it does promise.

Many of the people I’ve spoken with about AI had positive things to say about its potential, but the discussions almost always turn to the topic of auditing AI. People are wondering: How will we audit AI and what exactly will we be auditing? These are great questions and ones that we, as a community, need to discuss. At the moment the answers are a bit unclear, but there are some clues that we are able to uncover by looking at how AI works.

One of the most fundamental elements of AI is an “agent”. Not the James Bond kind, but the computer programs, or algorithms, that observe, interact, and autonomously make decisions to achieve the best outcome in a given situation. It is the concept of “best outcome” where we as auditors must first turn our attention.


Agents are expected to act rationally. A rational agent should be able to identify the best outcome in the light of uncertainty. An example of this can be found in an autonomous car traveling along a highway. What happens when an oncoming car crosses the yellow line? The rational agent—in this case, the autonomous car—has one objective: to ensure the best outcome. In this scenario, the survival of the driver is paramount. However, what if your teenage daughter was the passenger? It is likely, we, as parents, would sacrifice ourselves to save our child. Surely, that is the best outcome for most of us when faced with such a decision; but for the autonomous car, no such value system exists. We are many years away from rational agents having human values, but that hasn’t stopped rational agents being used to make value-based decisions.

It is unlikely that auditors will be auditing the “rationality” of AI any time soon, if ever, but what about the auditors that have the important responsibility of auditing industrial control systems where rational agents are likely to be used first? We are at the edge of AI-assisted automation by way of rational agents, particularly in the public utilities sector. Soon, rational agents will be granted the authority to make decisions that were previously made by humans. For example, if a water supply suddenly has high turbidity, and particles that carry bacteria are found to exceed state or federal thresholds, rational agents can take the action of shutting off the water to protect citizens and alerting water department employees to the turbidity problem so that testing can be performed. Likely, the first audit we perform will be measuring what thresholds trigger an agent and how an agent behaves when those thresholds are met. This is also where immediate risk exists.


One of the big misconceptions about AI is the idea that AI can solve its own problems or seek problems to solve; however, that is not the case—at least at the moment. Agents are designed with a function, or multiple defined functions, in mind. At the core of agent design is the development of algorithms that autonomously interact with logical sensors, physical sensors, and actuators. A great example is the Roomba vacuum cleaner. The agent responsible for making a Roomba work is a simple reflex agent that tells the vacuum cleaner to begin vacuuming when dirt is detected on the floor. This simple condition-action rule may be a basic agent, but humans make hundreds if not thousands of similar decisions daily, illustrating that even though reflexive agents may seem like simple algorithms, there are actually critical and complex instructions as to how a Roomba interacts with its environment.

Reflex agents are not enough by themselves. In the example of a Roomba, the agent simply tells the vacuum motor to actuate when dirt is detected. However, a Roomba does not have any capability to distinguish a “good” job of vacuuming from a “bad” job of vacuuming. This is where utility agents come in handy.


To maximize performance, AI requires utility-based agents. “Utility” in the context of AI is the measurement of an agent’s ability to interact with its world and to measure how well the agent is performing in making rational adjustments. These rational adjustments maximize the agent’s utility when performing a task. If the same Roomba is equipped with a utility agent, not only will the Roomba clean the room, it can also determine how clean the room is and make a decision to clean it again, until the desired results are achieved.

For the future auditor, auditing agents and their performance will be just as commonplace as IT audits are. However, what exactly will the future auditor be measuring? To answer this, we have to look at the benchmarks applied by AI engineers when calculating the usefulness of their solutions: algorithm performance. The performance of any given algorithm can be measured in four ways:

  • Completeness – Can the algorithm find a solution where a solution exists?
  • Optimality – Is the solution the most efficient?
  • Time – How much time is required to find the solution?
  • Space – How much memory is needed to perform the function?


Continuing with our Roomba example, completeness is an obvious metric—is the floor clean? However, completeness is more than a measure of quality. Completeness as defined in the AI realm is the measurement of an agent’s ability to find a solution to a problem. For example, a Roomba moving straight across the floor comes across the family dog lounging in its path. We know that a Roomba’s sensors can detect the dog and reroute itself. But what if the dog is moving, or even playing with the Roomba? What does it do then? A well-designed agent will search for an alternative to the dog, perhaps disengaging by moving far away to evade the dog, or to return to home and live to vacuum another day. Sure, it may take a couple of hours to clean the floors, where it normally takes one, but completeness is achieved. In most cases, this may be fine, but not a great solution if you have the in-laws coming over in 45 minutes!


AI’s adoption is directly related to optimality—can the AI do tasks more efficiently than their human counterparts? Even now, we are seeing the benefits of agent optimality. A good example is the popular navigation application “Waze.” Not only does Waze’s AI agent continually measure the optimal route, it also receives input from hundreds of thousands of drivers reporting accidents, disabled vehicles, traffic conditions, etc., and all in the blink of an eye. Waze is able to help make your commute more efficient, getting you home to your dog, and Roomba, much sooner.


Time and space complexities are a delicate balance between speed (time) and resources (memory space). How quickly an agent is able to find a solution is dependent upon how many resources are available to the agent. For agents that execute deep-search tasks, this is a critical performance factor. Facial recognition systems are limited in this way. A facial recognition system calculating six facial characteristics (ears, eyes, nose, mouth, cheeks, and hair color) may have to examine 10 million different combinations of facial features to accurately identify a person. In a modern computer system this takes about a single second with a gigabyte of memory.

However, it doesn’t take long to reach the maximum utility of facial recognition systems if you ignore the time and space paradigm. Doubling the number of facial characteristics a facial recognition system has to measure from 6 to 12, the AI agent may have to sift through 1 trillion combinations to accurately identify an individual, which can take 13 days with a petabyte of memory! This is a dead end for technology like facial recognition systems, where real-time results and accuracy are needed. However, memory limitations are continuously being redefined. As technologies (such as quantum computing) are perfected, time and space may become a notion of the past.

Perhaps the biggest challenge awaiting auditors is the dreaded “black box” factor. Black box is exactly what it sounds like: the underlying invisible lines of code in algorithms that make AI agents work. Conventionally, procedures to measure black box technologies are to ensure that those responsible for it can verify their understanding of how the technology works and how outcomes are consistent with organizational governance, risk, and compliance programs.

With AI becoming more commonplace, stories of bias are beginning to emerge. Anecdotal evidence suggests that some AI systems used in business today operate with human bias. According to IBM, there are more than 180 human biases that have been defined and classified to date. Unfortunately, these biases have found their way into AI solutions, most likely as an afterthought rather than with nefarious purpose, and it is not just about human bias. Bias can be found in the data we collect, including racial, gender, and ideological biases that are then imported into our AI solutions. Because of this, bias becomes particularly problematic for organizations and is the frontline of where auditors are likely to spend most of their time.

Verifying such bias is not going to be an easy task. It is unlikely that we will have an abundance of resources to determine whether or not bias exists in our AI solutions. The relationships among the tremendous amounts of data will prohibit this kind of examination. To close this gap, it is important that strong Service Level Agreements (SLAs), specifically with solution providers, include assurances that their AI algorithms be free of human bias. To test for human bias, the use of AI simulators will become ubiquitous. These AI simulators will be powered by AI itself, with one objective: to test for all possible outcomes to verify that the AI agents being used produce outcomes that are purposeful, valuable, and repeatable, but also act consistently with the organization’s values.

There is no doubt that the use of AI is going to have a profound impact on our profession. I too share the enthusiasm of those who are excited about the future of AI in our profession. Like with the disruptive technologies of the past, such as the personal computer, people who are learning the basics today will have likely find exceptional opportunities in the future.


Rick Gamache is a Senior Cyber Security Consultant with the accounting firm BerryDunn headquartered in Portland, Maine. Rick is a veteran of the United States Coast Guard, serving as a Telecommunications Officer in varying roles from communications planning, cyber security, and intelligence. After leaving military service, Rick performed risk assessments and FISMA audits for the U.S. Navy’s Destroyer program as a Fully Qualified Navy Validator. His work includes assessing the security controls of some of the nation’s most sensitive computing systems and platforms. As an entrepreneur, Rick helped found Wapack Labs where he served as Partner and Director of Cyber Threat Intelligence, growing the business from a startup to a six-million dollar company. As an accomplished author, Rick has written several publicly available risk assessment reports with an emphasis on the Nordic nations of Iceland and Sweden. Additional works include Russia’s annexation of Crimea through the use of cyber-attacks, the risks of outsourcing financial supply chains to India, and other topics of strategic importance. Rick is a freelance writer for Security Affairs Magazine, and has spoken on cyber security related topics in both the United States and Europe.