How efficient is ChatGPT really when it comes to coding?

This item is part of our exclusive IEEE Journal Watch Series in partnership with IEEE Xplore.

Programmers have spent decades writing code for AI models, and now, in a full circle, AI is being used to write code. But how does an AI code generator compare to a human programmer?

A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI’s ChatGPT in terms of functionality, complexity, and security. The results show that ChatGPT has a very wide range of success when it comes to producing functional code—with a success rate ranging from 0.66% to 89%—depending on the difficulty of the task, the programming language, and a number of other factors.

Although in some cases the AI generator can produce better code than humans, the analysis also reveals some security issues with AI-generated code.

Yutian Tang, a professor at the University of Glasgow, contributed to the study. He points out that AI-based code generation could offer some benefits in terms of improving productivity and automating software development tasks, but it is important to understand the strengths and limitations of these models.

“By performing a comprehensive analysis, we can uncover potential issues and limitations that arise in ChatGPT-based code generation… [and] “Improving production techniques,” Tang says.

To explore these limitations in more detail, his team sought to test GPT-3.5’s ability to solve 728 coding problems from the LeetCode testing platform in five programming languages: C, C++, Java, JavaScript, and Python.

“A reasonable hypothesis explaining why ChatGPT can better solve algorithm problems before 2021 is that these problems are frequently observed in the training dataset.” —Yutian Tang, University of Glasgow

Overall, ChatGPT was quite effective at solving problems across different coding languages, but especially when it came to solving coding problems that existed on LeetCode before 2021. For example, it was able to produce working code for easy, medium, and hard problems with success rates of around 89, 71, and 40%, respectively.

“However, when it comes to algorithm issues after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions even for easy-level problems,” Tang notes.

For example, ChatGPT’s ability to produce working code for “easy” coding problems dropped from 89% to 52% after 2021. And its ability to generate working code for “hard” problems also dropped from 40% to 0.66% after this period.

“A reasonable hypothesis for why ChatGPT can better solve algorithm problems before 2021 is that these problems are frequently observed in the training dataset,” Tang explains.

Essentially, as coding evolves, ChatGPT has not yet been exposed to new problems and solutions. It lacks the critical thinking skills of a human and can only solve problems it has already encountered. This could explain why it is so much more effective at solving older coding problems than newer ones.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems.” —Yutian Tang, University of Glasgow

Interestingly, ChatGPT is able to generate code with lower runtimes and memory overheads than at least 50% of human solutions to the same LeetCode problems.

The researchers also studied ChatGPT’s ability to correct its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios in which ChatGPT initially generated incorrect coding, either because it didn’t understand the content or because it couldn’t solve the problem.

While ChatGPT was good at fixing compilation errors, it was generally not good at fixing its own errors.

“ChatGPT can generate incorrect code because it doesn’t understand the meaning of algorithm problems. Therefore, this simple error feedback information is not enough,” Tang explains.

The researchers also found that the code generated by ChatGPT had a number of vulnerabilities, such as a missing null test, but most of them were easily fixed. Their results also show that the code generated in C was the most complex, followed by C++ and Python, which has similar complexity to human-written code.

Tangs says that based on these findings, it is important for developers using ChatGPT to provide additional information to help ChatGPT better understand issues or avoid vulnerabilities.

“For example, when encountering more complex programming problems, developers can provide as much relevant knowledge as possible and tell ChatGPT in the prompt what potential vulnerabilities to be aware of,” Tang says.

Articles from your site

Related articles on the web

Source link

What's Hot

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

How efficient is ChatGPT really when it comes to coding?

Generative AI coding startup Magic raises $320M in investment from Eric Schmidt, Atlassian and others

It’s time for streaming services to tackle AI music

Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

California Passes Landmark Bill to Regulate Large-Scale AI Models | Artificial Intelligence (AI)

Google employees say AI conferencing tool gives executives easy questions

Salesforce rises as software company bets on AI tools to drive growth

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

The Supreme Court has indicated it would side with Trump if the election is close.

AdsPower: See you at Affiliate World Europe 2024 in Budapest!

TEMU Affiliate Program 2024: Earn up to £100,000 per month!

Hard Bacon files for bankruptcy as Google search changes strain affiliate marketing business

Getting Started in Affiliate Marketing: How to Make Passive Income in 2024

Our Picks

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Most Popular

Working It guide to AI at work

Meta AI is fun, accessible, and free. Maybe it’s time to make AI chatbots a part of your life | Technology News

Generative AI Might Be Overrated

Subscribe to Updates

What's Hot

How efficient is ChatGPT really when it comes to coding?

Related Posts