Anthropic seeks to fund new, more comprehensive generation of AI benchmarks

Anthropic is launching a program to fund the development of new types of benchmarks capable of evaluating the performance and impact of AI models, including generative models like its own Claude.

Unveiled Monday, Anthropic’s program will award grants to third-party organizations that can, as the company explains in a blog post, “effectively measure the advanced capabilities of AI models.” Interested parties can submit applications, which will be evaluated on a rolling basis.

“Our investment in these assessments aims to improve the entire field of AI safety, providing valuable tools that benefit the entire ecosystem,” Anthropic wrote on its official blog. “Developing high-quality, relevant safety assessments remains a challenge, and demand outstrips supply.”

As we’ve already pointed out, AI has a benchmarking problem. The most commonly cited benchmarks today fail to capture how the average person actually uses the systems under test. It’s also questionable whether some benchmarks, especially those published before the advent of modern generative AI, even measure what they’re supposed to measure, given their age.

The very high-level and more difficult-than-it-seems solution proposed by Anthropic is to create ambitious benchmarks with a focus on AI safety and societal implications through new tools, infrastructures and methods.

The company is calling for tests that assess a model’s ability to perform tasks such as conducting cyberattacks, “enhancing” weapons of mass destruction (e.g. nuclear weapons), and manipulating or deceiving people (e.g. through deepfakes or disinformation). For AI risks related to national security and defense, Anthropic says it is committed to developing some kind of “early warning system” to identify and assess risks, though it doesn’t reveal in the blog post what such a system might entail.

Anthropic also says its new program aims to support research on “end-to-end” benchmarks and tasks that probe AI’s potential to aid scientific study, converse in multiple languages, and mitigate entrenched biases and the toxicity of self-censorship.

To achieve this, Anthropic is considering new platforms that would allow subject matter experts to develop their own assessments and conduct large-scale testing of models involving “thousands” of users. The company says it has hired a full-time coordinator for the program and that it could buy or expand projects that it believes have the potential to grow.

“We offer a range of funding options tailored to the needs and stage of each project,” Anthropic wrote in the post, though an Anthropic spokesperson declined to provide further details on those options. “Teams will have the opportunity to interact directly with Anthropic’s domain experts from Frontier Red, Focus, Trust & Safety, and other relevant teams.”

Anthropic’s efforts to support new AI benchmarks are commendable, provided of course that there are enough funds and personnel to support them. But given the company’s commercial ambitions in the AI race, it might be hard to trust it entirely.

In the blog post, Anthropic is quite transparent about the fact that it wants some of the evaluations it funds to align with AI safety classifications. he developed (with the help of some third parties like the nonprofit AI research organization METR). This is certainly the company’s prerogative. But it can also force program applicants to accept definitions of “safe” or “risky” AI with which they may not fully agree.

Some in the AI community are also likely to take issue with Anthropic’s allusions to “catastrophic” and “deceptive” risks of AI, such as those posed by nuclear weapons. Many experts say there is little evidence to suggest that AI as we know it will acquire world-destroying capabilities and surpass humans in the near future, if ever. Claims of imminent “superintelligence” only serve to distract from pressing AI regulatory issues of the day, such as AI’s hallucinatory tendencies, these experts add.

In its paper, Anthropic writes that it hopes its program will serve as “a catalyst for moving toward a future where comprehensive AI assessment is an industry standard.” It’s a mission that many open, non-corporate efforts to create better AI benchmarks can identify with. But it remains to be seen whether those efforts are willing to partner with an AI vendor whose loyalty ultimately lies with shareholders.

Source link

What's Hot

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Anthropic seeks to fund new, more comprehensive generation of AI benchmarks

Generative AI coding startup Magic raises $320M in investment from Eric Schmidt, Atlassian and others

It’s time for streaming services to tackle AI music

Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

California Passes Landmark Bill to Regulate Large-Scale AI Models | Artificial Intelligence (AI)

Google employees say AI conferencing tool gives executives easy questions

Salesforce rises as software company bets on AI tools to drive growth

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

The Supreme Court has indicated it would side with Trump if the election is close.

AdsPower: See you at Affiliate World Europe 2024 in Budapest!

TEMU Affiliate Program 2024: Earn up to £100,000 per month!

Hard Bacon files for bankruptcy as Google search changes strain affiliate marketing business

Getting Started in Affiliate Marketing: How to Make Passive Income in 2024

Our Picks

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Most Popular

Working It guide to AI at work

Meta AI is fun, accessible, and free. Maybe it’s time to make AI chatbots a part of your life | Technology News

Generative AI Might Be Overrated

Subscribe to Updates

What's Hot

Anthropic seeks to fund new, more comprehensive generation of AI benchmarks

Related Posts