A new open-weights AI coding model is closing in on proprietary options

A new open-weights AI coding model is closing in on proprietary options

On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves a 72.2 percent score on SWE-bench Verified, a benchmark that attempts to test whether AI systems can solve real GitHub issues, putting it among the top-performing open-weights models.

Perhaps more notably, Mistral didnโ€™t just release an AI model, it released a new development app called Mistral Vibe. Itโ€™s a command line interface (CLI) similar to Claude Code, OpenAI Codex, and Gemini CLI that lets developers interact with the Devstral models directly in their terminal. The tool can scan file structures and Git status to maintain context across an entire project, make changes across multiple files, and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license.

Itโ€™s always wise to take AI benchmarks with a large grain of salt, but weโ€™ve heard from employees of the big AI companies that they pay very close attention to how well models do on SWE-bench Verified, which presents AI models with 500 real software engineering problems pulled from GitHub issues in popular Python repositories. The AI must read the issue description, navigate the codebase, and generate a working patch that passes unit tests. While some AI researchers have noted that around 90 percent of the tasks in the benchmark test relatively simple bug fixes that experienced engineers could complete in under an hour, itโ€™s one of the few standardized ways to compare coding models.

Read full article

Comments

4 Comments

  1. vella.hansen

    This is an exciting development in the AI landscape! The release of Devstral 2 seems to push the boundaries of what’s possible with open-weights models. It will be interesting to see how this impacts the industry and the competition with proprietary options.

  2. carmel64

    Absolutely, it really does push the boundaries! The scale of 123 billion parameters is impressive and could lead to more accessible AI tools for developers. Itโ€™ll be interesting to see how it compares to existing proprietary models in terms of performance and usability.

  3. imarks

    I agree, the scale is remarkable! It’s interesting to consider how open-weight models like Devstral 2 could democratize AI development, making advanced tools more accessible to a wider range of developers and innovators.

  4. schaden.arely

    Absolutely, the scale is impressive! It’s also worth noting how open-weight models can foster collaboration in the AI community, allowing developers to innovate more freely and share improvements. This could really accelerate advancements in coding tools.

Leave a Reply to schaden.arely Cancel reply

Your email address will not be published. Required fields are marked *