AI Sparks

When it comes to predicting people’s preferences, it pays to consider the “three forces”. MIT News

In his 1927 paper, “The Law of Comparative Judgment,” American psychologist LL Thurstone proposed that when people choose one option among many options, they choose the one with the highest value to them, even though they cannot assign a specific number to that choice.

Thurstone was a pioneer of “psychometrics” – a field built on the premise that mental processes, which we cannot see, can nevertheless be measured and quantified. His 1927 paper laid the foundation for what are now called random-use models, which provide a mathematical framework for explaining people’s preferences — information that can be relied upon, in turn, to make predictions about various hypothetical situations.

Randomized utility models (RUMs) are so named because they examine the “utility,” or benefit, that can be gained from making certain choices — such as deciding which book to read first among the pile of novels you brought back from the library. “These natural models have happened,” explains Gabriele Farina, assistant professor in the Department of Electrical Engineering and Computer Science (EECS) and principal investigator in the Laboratory of Information and Decision Systems (LIDS), “because people are different. For example, a person who usually drinks coffee in the morning, and chooses tea after dinner, may sometimes combine that order completely.

RUMs, of course, are often used in government and industry in situations with greater impact than hot (or cold) beverage options. Models tend to simplify predictions about what people will choose to do in so-called hypothetical (“what-if”) scenarios such as: How will they get to work or school if a highway is closed for construction? What modes and modes of transportation will they take? Or, if the city suddenly receives an estimated $20 million in revenue, how should that revenue be spent to maximize general profits?

Given that RUMs have been with us for almost 100 years, growing in complexity over time, one might think that, at this stage, there would be no room for improvement. However, this is not the case.

A paper presented in April at the International Conference on Learning Advocacy in Rio de Janeiro, Brazil, revealed basic facts that show there is more to be learned from these examples than previously thought. The paper was authored by Yeshwanth Cherapanamjeri, a former MIT postdoc now at Nanyang Technological University in Singapore; Farina, also principal faculty at MIT’s Operations Research Center (ORC); Constantinos Daskalakis, Avanessians Professor of Computer Science at MIT and member of MIT’s Computer Science and Artificial Intelligence Laboratory; and Sobhan Mohammadpour, an MIT PhD student in computer science based at LIDS and EECS.

The group’s findings are, in part, due to the lack of a way in which RUMs are often measured in performance, which has persisted since Thurstone’s days. The data from which the models are considered are mainly derived from so-called pairwise comparisons: In a choice between things A and B – whether it concerns movies on Netflix, competing products on Amazon.com, news posted on Google, and so on – which one would you choose? One of the reasons why this method is so widespread, explains Daskalakis, is that “giving a precise numerical score, like 4.37, to benefit from one thing is very difficult. But that’s where the rub lies, he adds. “With this method of testing people’s preferences, we only look at two things at a time, it is impossible to find correlation between many options.”

The common way of using RUMs assumes that the resources taken from A and B are independent, but in fact they may be connected, and that can be important to know. If someone campaigning for office finds out that a potential voter is in favor of gun control, for example, that person is more likely to also favor subsidized child care. Similarly, a fan of independent movies may also be partial to foreign films, but less enthusiastic about Hollywood blockbusters. “If the digital platform has an eye that does not see that there is such a correlation, we will not be able to measure preferences very accurately,” notes Daskalakis. “And if Netflix keeps showing you a variety of movies you don’t care about, you can sign out and cancel your subscription.”

The MIT team proved that it is impossible to get information about correlation from the comparison of only two methods. Correlations can be observed, however, when a large number of people rate the other three measures in sequence. The same information can also be found in the best-of-three and best-of-two combinations. Essentially, Mohammadpour explains, “you can get a crowd of people to rank three things. You can use the method we’ve developed to combine each of those results into one big model that can give us the big picture.”

Their research effort, according to Farina, is focused on the computer side of RUMs, devising algorithms that can extract popular information and find out how much data is needed to do that or, equivalently, how many tests need to be done. The good news, he says, is that effective algorithms are possible for this purpose. The required number of tests does not increase significantly with the number of items in the catalog or database under review.

“This paper provides an important breakthrough,” commented Emma Frejinger, a computer scientist at the University of Montreal. “It proves statistically why traditional data collection fails and shows that asking users is the best of the three. [choices] unlocks the ability to accurately train these powerful models. These findings provide a more practical guide to collecting better data to develop more accurate settings.”

“Building utility models will always be an active area,” Daskalakis emphasized. “Since RUMs have been so important to the Internet economy since the late 1990s, they are, and will continue to be, important to the alignment of AI models going forward.” More importantly, he adds, “RUMs play an important role in the commercialization and implementation of large language models. [LLMs].” During the training, people are often asked to rate the various results of the candidates in these LLMs, where the models can get a better idea about the type of text – in terms of tone, style, and content – that is most popular.

Since we’re always “surrounded by a sea of ​​options in many different domains,” Daskalakis says, “you can’t ask people to talk about everything they like in every possible situation. So what you can do instead is build a model that predicts what people think about different possible outcomes.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button