OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

8 months ago

SolidityBench by IQ has launched arsenic the archetypal leaderboard to measure LLMs successful Solidity codification generation. Available connected Hugging Face, it introduces 2 innovative benchmarks, NaïveJudge and HumanEval for Solidity, designed to measure and fertile the proficiency of AI models successful generating astute declaration code.

Developed by IQ’s BrainDAO arsenic portion of its forthcoming IQ Code suite, SolidityBench serves to refine their ain EVMind LLMs and comparison them against generalist and community-created models. IQ Code aims to connection AI models tailored for generating and auditing astute declaration code, addressing the increasing request for unafraid and businesslike blockchain applications.

As IQ told CryptoSlate, NaïveJudge offers a caller attack by tasking LLMs with implementing astute contracts based connected elaborate specifications derived from audited OpenZeppelin contracts. These contracts supply a golden modular for correctness and efficiency. The generated codification is evaluated against a notation implementation utilizing criteria specified arsenic functional completeness, adherence to Solidity champion practices and information standards, and optimization efficiency.

The valuation process leverages advanced LLMs, including antithetic versions of OpenAI’s GPT-4 and Claude 3.5 Sonnet arsenic impartial codification reviewers. They measure the codification based connected rigorous criteria, including implementing each cardinal functionalities, handling borderline cases, mistake management, due syntax usage, and wide codification operation and maintainability.

Optimization considerations specified arsenic state ratio and retention absorption are besides evaluated. Scores scope from 0 to 100, providing a broad appraisal crossed functionality, security, and efficiency, mirroring the complexities of nonrecreational astute declaration development.

Which AI models are champion for solidity astute declaration development?

Benchmarking results showed that OpenAI’s GPT-4o exemplary achieved the highest wide people of 80.05, with a NaïveJudge people of 72.18 and HumanEval for Solidity walk rates of 80% astatine pass@1 and 92% astatine pass@3.

Interestingly, newer reasoning models similar OpenAI’s o1-preview and o1-mini were beaten to the apical spot, scoring 77.61 and 75.08, respectively. Models from Anthropic and XAI, including Claude 3.5 Sonnet and grok-2, demonstrated competitory show with wide scores hovering astir 74. Nvidia’s Llama-3.1-Nemotron-70B scored lowest successful the apical 10 astatine 52.54.

SolidityBench scores for LLMs (Hugging Face)

Per IQ, HumanEval for Solidity adapts OpenAI’s archetypal HumanEval benchmark from Python to Solidity, encompassing 25 tasks of varying difficulty. Each task includes corresponding tests compatible with Hardhat, a fashionable Ethereum improvement environment, facilitating close compilation and investigating of generated code. The valuation metrics, pass@1 and pass@3, measurement the model’s occurrence connected archetypal attempts and implicit aggregate tries, offering insights into some precision and problem-solving capabilities.

Goals of utilizing AI models successful astute declaration development

By introducing these benchmarks, SolidityBench seeks to beforehand AI-assisted astute declaration development. It encourages the instauration of much blase and reliable AI models portion providing developers and researchers with invaluable insights into AI’s existent capabilities and limitations successful Solidity development.

The benchmarking toolkit aims to beforehand IQ Code’s EVMind LLMs and besides sets caller standards for AI-assisted astute declaration improvement crossed the blockchain ecosystem. The inaugural hopes to code a captious request successful the industry, wherever the request for secure and businesslike astute contracts continues to grow.

Developers, researchers, and AI enthusiasts are invited to research and lend to SolidityBench, which aims to thrust the continuous refinement of AI models, beforehand champion practices, and beforehand decentralized applications.

Visit the SolidityBench leaderboard connected Hugging Face to larn much and statesman benchmarking Solidity procreation models.

The station OpenAI GPT 4o ranked arsenic champion AI exemplary for penning Solidity astute declaration codification by IQ appeared archetypal connected CryptoSlate.

View source