Improving the speed and power efficiency of AI agents | MIT News

Agentic workflows are artificial intelligence software programs that integrate multiple models and external tools to perform complex tasks, such as analyzing video and answering questions about it.
But the way these widely disparate systems are designed and deployed often creates inefficiencies that can lead to wasted computation, energy, and costs.
To improve efficiency, researchers from MIT and Microsoft have developed an intelligent system that guides the process of designing an agent’s workflow and automatically optimizes how that workflow is executed.
With this new approach, the developer can explain what they want the agent to do in simple language, without needing to specify all the details of their application in advance.
The system automatically calculates the best models and tools to use, as well as the appropriate hardware configuration and integrated resource allocation when the workflow is performed by the cloud provider.
It adjusts those settings on the fly based on each user’s priorities, such as reducing costs or increasing speed.
When tested in several agent operations, this new system reduced the number of computing units required for implementation, significantly reducing energy requirements and costs compared to traditional methods without compromising performance.
“Agent workflow is becoming more complex and is quickly becoming the core of what cloud providers do. Energy consumption is a big concern, so we need to be more aware of how efficient this flow is. It’s too easy to allocate resources, wasting energy and money. Empowering the cloud provider to make this workflow smarter for the best resources for everyone involved, Gohar computer science wins electricity, Gohar science (EECS) student graduate and lead author of the paper in this program.
Collaborated on the paper by Adam Belay, associate professor of EECS and member of the MIT Computer Science and Artificial Intelligence Laboratory; senior author Ricardo Bianchini, technical partner and business vice president at Microsoft Azure; and others at Microsoft Azure. Paper to be presented at the USENIX Symposium on Operating Systems Design and Implementation.
Configuration conference
An agent workflow is a system composed of several independent AI agents that jointly use various models and tools, such as databases or Python programs, to dynamically complete a multi-step task, such as data processing or code generation.
These workflows can act as behind-the-scenes processes that power user-facing applications.
In general, developers have to hard code all the technology options in advance. They need to define which AI agents, models, and tools should be used, and how they should be used. They should also specify the hardware that runs the workflow and how to weigh trade-offs such as speed versus cost.
This is particularly challenging because the agent workflow integrates multiple black box models and various tools, each with its own configuration options, which may be provided by different companies.
When a new AI model is released that can improve the accuracy or efficiency of an application, the developer will need to start from scratch to implement it.
“Even if you wanted to do all of this by hand, it’s unlikely that you’d be able to set up the workflow properly because the potential configuration space is so large,” Chaudhry said.
In addition, a cloud data center that runs an application to customers cannot see inside the workflow to provide its hardware resources in the most efficient way at the time of the user’s request.
With this new system, called Murakkab (Urdu word for structure of things), the researchers want to improve the entire workflow of the agent.
Strong decision making
First, Murakkab enables developers to create workflows by defining their app’s purpose in high-level terms, rather than explaining how. many parts of that workflow must be integrated.
For example, a developer might define a video Q&A application that extracts keyframes, generates a transcript, and answers user questions about the video.
“There are many ways to do this, and all these different models and tools have implications for how quickly the application can complete the task,” he said.
Murakkab takes developers’ specific annotations and automatically identifies the best existing models and tools to integrate into the workflow.
It also determines which components need to work in sequence and which can be used in parallel to improve performance.
“The platform makes configuration decisions later, so if a new model or GPU accelerator comes out tomorrow, the developer doesn’t have to worry about that,” he says.
When a cloud provider deploys that application to a customer, Murakkab optimizes the workflow by configuring its components to meet user constraints, such as prioritizing accuracy while meeting latency requirements.
It dynamically identifies optimal hardware allocations and deployment schedules to maximize efficiency in real time, and generates workflows that are ready for the cloud provider to execute.
“Our system gives cloud providers visibility into multiple operations, so a provider can share computing resources more efficiently while satisfying user constraints,” he said.
When tested on various video Q&A agent workflows and code generation, Murakkab met user requirements while using only about 35 percent of the computation required by other methods. It consumes only about 27 percent more energy and less than 25 percent of the cost.
The flexible nature of Murakkab also allows users to rate the trade. In one instance, the system reduced the power consumption of an agent’s workflow by more than an order of magnitude with only a 2 percent drop in customer accuracy.
The system was also able to identify unexpected optimal settings for the model that selects video frames, improving the performance of the video Q&A task. This kind of optimization is nearly impossible for an engineer to do by hand, Chaudhry said.
Next, the researchers plan to expand their system to more complex workflows and larger computing clusters while exploring opportunities to develop new applications for the agent.
“There are many opportunities to optimize this workflow so that it consumes less energy, but we need to think about this at the scale of large cloud platforms,” said Chaudhry.
This research was supported, in part, by the Semiconductor Research Corporation and the US Defense Advanced Research Projects Agency.



