AI Sparks

MIT researchers teach AI models to interpret charts | MIT News

To speed up and refine decision-making in a fast-paced, global marketplace, businesses may use artificial intelligence generative models to help summarize and interpret the charts that often populate market summaries and financial reports.

But even the latest models of visual language sometimes struggle with this task, as you need a model to integrate visual perception, numbers, and language. A company that invests in a modern model may receive inaccurate or incomplete information.

To fill this performance gap, researchers from MIT and the MIT-IBM Computing Research Lab have developed a multi-faceted tool for AI users specifically designed to teach visual language models (VLMs) how to interpret charts effectively.

They used a new data generation method to create a state-of-the-art dataset that includes over a million different charts. The dataset also includes multiple visual, linguistic, and numerical components for each chart image, allowing the models to make strong inferences about the information on the chart.

The researchers used this dataset, called ChartNet, to train a series of open source VLMs. Many of these smaller models have worked remarkably well for orders of magnitude, trading models for tasks such as data extraction and chart summarization.

By allowing open source models to outperform their commercial counterparts, ChartNet can allow small firms with limited budgets to easily implement AI. The open source dataset can be used to improve the capabilities of AI models for tasks such as business trend analysis and scientific statistical interpretation.

“We developed ChartNet to be a one-stop shop for chart understanding, which includes basically anything an AI model and a clinician training that model might need. We hope our work inspires researchers to achieve modern functionality with small models that don’t require unlimited amounts of computation,” said Jovana Kondic, lead author of the MITECS electrical engineering and computer science paper.

He is joined on the paper by several authors from MIT, the MIT-IBM Computing Research Lab, and IBM Research, including Pengyuan Li, a member of the research staff at IBM Research; Dhiraj Joshi, senior scientist at IBM Research; Isaac Sanchez, software engineer at IBM Research; Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing, MIT director of the MIT-IBM Computing Research Lab, and senior research scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Rogerio Feris, principal scientist and manager at the MIT-IBM Computing Research Lab. The research will be presented at the IEEE Computer Vision and Pattern Recognition Conference.

A dataset bottleneck

Researchers have made great strides in developing generative AI models that excel in natural language processing and natural image consulting. But less work focuses on interpreting the complex multimodal data contained within the charts, Kondic said.

Yet for businesses large and small in almost every industry, understanding the chart is an important task.

“The financial industry is evolving with charts. If visual language models can extract information from charts, such as interpretations of trends, that helps a lot of the workflow that happens downstream,” Joshi said.

The lack of high-quality training data is a major obstacle holding back the development of VLMs that can accurately interpret charts. Most data sets consist of limited chart images pulled from the Internet and often lack the necessary scale and additional information to help the model interpret the underlying data.

“A visual language model, unlike our brain, may need to see thousands of examples during training to recognize something like a line chart,” Kondic said.

Researchers want to overcome those errors by generating artificial data. Synthetic data is created artificially by algorithms to mimic the statistical properties of real data.

The ChartNet dataset contains one million high-quality chart images, along with the corresponding code used to generate each chart, a text description, and a table containing its numerical data. In addition, each data point includes question and answer pairs to teach the model how to correctly answer questions with a chart image.

“These additional data streams direct the model to connect and direct the different pieces of information the chart image encodes,” Kondic said.

Data processing

To build ChartNet, the researchers created a two-step, synthetic data processing pipeline.

First, their automated system translates any existing set of chart images into code. Then the system iteratively extends that code to change different aspects of each chart, such as chart type, data values, title, colors, etc.

“We can start from one chart that we use as a seed and come up with hundreds of augmentations of it. This is how we were able to create a dataset with more than a million different images,” explained Kondic.

They have also incorporated an automated quality check process to ensure that the synthetic data is of high quality. This process ensures that the code works and the rendered chart images are accurate and clean.

“We don’t just want to produce various samples. We also want information to be presented in a meaningful way,” he said.

ChartNet also includes a selection of chart data points defined by human experts. This provides access to additional types of charts and supporting data with validation guarantees.

A user can use the annotated data to fine-tune an existing VLM, improving the performance of a particular program, Joshi added..

The researchers tested ChartNet by training IBM’s Granite Vision series of models and several other open source models of various sizes and testing them on various chart interpretation tasks. The dataset improved the accuracy of all models in chart reconstruction, chart data extraction, chart summarization, and chart query answering.

With ChartNet, small open source models have consistently outperformed large commercial models.

“Many previous training datasets only focused on answering simple questions about the chart. We tried to go beyond that with ChartNet by generating data that supports all aspects of robust chart understanding,” Kondic said.

In the future, the researchers plan to continue expanding ChartNet by integrating data with additional levels of complexity. They also want to get feedback from the research community.

This research was funded, in part, by the MIT-IBM Computing Research Lab.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button