When to trust AI models | MIT News

Because machine learning models can make incorrect predictions, researchers often give their models the ability to tell users how confident they are in a particular decision. This is especially important in high-stakes situations, such as when a model is used to identify diseases in medical images or filter job applications.

But quantifying a model’s uncertainty is only useful if it is accurate: if a model says there is a 49 percent chance that pleural effusion will be seen in a medical image, then the model should be correct 49 percent of the time.

Researchers at MIT have introduced a new approach that can improve uncertainty estimation in machine learning models: not only does the technique produce more accurate uncertainty estimates than other techniques, but it does so more efficiently.

What’s more, the technique is scalable, making it applicable to the large-scale deep learning models being increasingly deployed in healthcare and other safety-critical situations.

This technique has the potential to provide end-users without machine learning expertise with better information that they can use to decide whether to trust a model’s predictions or whether they should deploy the model for a particular task.

“It’s clear that these models perform extremely well in some scenarios, and it’s easy to assume that they’ll perform equally well in other scenarios. That makes it particularly important to advance this kind of research to better calibrate the uncertainty in these models and make them more consistent with human notions of uncertainty,” says lead author Nathan Ng, a visiting student at MIT and a graduate student at the University of Toronto.

Ng co-authored the paper with Roger Gross, assistant professor in the University of Toronto’s Department of Computer Science, and lead author Marji Ghassemi, associate professor in the Department of Electrical Engineering and Computer Science and a member of the Biomedical Engineering Institute and the Information and Decision Systems Laboratory. The research will be presented at the International Conference on Machine Learning.

Quantifying Uncertainty

Uncertainty quantification methods often require complex statistical calculations that do not scale well to machine learning models with millions of parameters, and they require users to make assumptions about the model and the data used to train it.

The MIT researchers took a different approach. They use something called the minimum description length principle (MDL), which doesn’t require assumptions that can hinder the accuracy of other methods. MDL is used to better quantify and accommodate the uncertainty in the test points that a model is asked to label.

The technique the researchers developed, called IF-COMP, makes MDL fast enough to be used with large-scale deep learning models deployed in many real-world environments.

MDL considers all possible labels that the model could give to a test point, and if there are many alternative labels that fit this point well, the confidence in the selected label decreases accordingly.

“One way to understand how much trust a model has is to give it counterfactual information and see how likely it is to believe it,” Ng says.

For example, consider a model that is told that a medical image shows pleural effusion: if a researcher tells the model that the image shows edema, and the model is willing to update its belief, the model will become less confident in its original decision.

In MDL, when the model is confident in labeling a data point, it should use a very short code to describe that point. When the decision is uncertain because a point could be labeled with many other labels, it should use a longer code to capture these possibilities.

The amount of code used to label a data point is called probabilistic data complexity. If researchers ask a model how likely it is to update its beliefs about a data point in the presence of contrary evidence, the probabilistic data complexity should decrease if the model is confident.

However, testing each data point using MDL requires a significant amount of computation.

Speeding up the process

With IF-COMP, the researchers developed an approximation technique that can accurately estimate the complexity of stochastic data using a special function called an influence function. They also employed a statistical technique called temperature scaling that improves the calibration of the model’s output. The combination of the influence function and temperature scaling allows for a high-quality approximation of the complexity of stochastic data.

Ultimately, IF-COMP can efficiently generate a well-calibrated uncertainty quantification that reflects the true reliability of the model. The technique can also determine if the model has mislabeled certain data points and reveal which data points are outliers.

The researchers tested their system on these three tasks and found it to be faster and more accurate than other methods.

“Having confidence that models are properly tuned is crucial, and there is an increasing need to detect when certain predictions seem incorrect. Audit tools are increasingly necessary for machine learning problems, as we use large amounts of unvalidated data to create models that are then applied to human-facing problems,” Ghassemi says.

Because IF-COMP is model-agnostic, it can provide accurate uncertainty quantification for many types of machine learning models, allowing them to be deployed in a wider range of real-world environments, ultimately empowering more experts to make better decisions.

“People need to understand that these systems are highly fallible and can fabricate facts on the spot. The models may seem very confident, but there are a lot of things they are willing to believe if there is evidence to the contrary,” Ng said.

In the future, the researchers are interested in applying this approach to large-scale language models and exploring other potential use cases for the minimum description length principle.

Source link

What's Hot

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

When to trust AI models | MIT News

Generative AI coding startup Magic raises $320M in investment from Eric Schmidt, Atlassian and others

It’s time for streaming services to tackle AI music

Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

California Passes Landmark Bill to Regulate Large-Scale AI Models | Artificial Intelligence (AI)

Google employees say AI conferencing tool gives executives easy questions

Salesforce rises as software company bets on AI tools to drive growth

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

The Supreme Court has indicated it would side with Trump if the election is close.

AdsPower: See you at Affiliate World Europe 2024 in Budapest!

TEMU Affiliate Program 2024: Earn up to £100,000 per month!

Hard Bacon files for bankruptcy as Google search changes strain affiliate marketing business

Getting Started in Affiliate Marketing: How to Make Passive Income in 2024

Our Picks

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Most Popular

Working It guide to AI at work

Meta AI is fun, accessible, and free. Maybe it’s time to make AI chatbots a part of your life | Technology News

Generative AI Might Be Overrated

Subscribe to Updates

What's Hot

When to trust AI models | MIT News

Related Posts