Connect with us

Announcements

MiniMax Open Sources M2.7, a Self-Evolving Agent Model

mm

Chinese AI company MiniMax has released the weights for MiniMax M2.7, a 229-billion-parameter Mixture-of-Experts model that participated in its own development cycle – marking what the company calls the first step toward autonomous AI self-evolution.

Originally announced on March 18, MiniMax M2.7 is now freely available on Hugging Face with deployment support for SGLang, vLLM, Transformers, and NVIDIA NIM. The model scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2, placing it among the strongest open-source LLMs for real-world software engineering tasks.

How the Model Helped Build Itself

The most notable claim about M2.7 is its role in its own iteration. MiniMax tasked an internal version of the model with optimizing a programming scaffold, running it autonomously for over 100 rounds. During that process, M2.7 analyzed failure trajectories, modified scaffold code, ran evaluations, and decided whether to keep or revert each change.

The model discovered optimizations on its own: systematically searching for optimal sampling parameters like temperature and frequency penalty, designing workflow guidelines such as automatically checking for identical bug patterns across files after a fix, and adding loop detection to the scaffold’s agent loop. MiniMax reports a 30% performance improvement on internal evaluation sets from this autonomous process.

Within MiniMax’s reinforcement learning team, M2.7 now handles 30% to 50% of daily workflows end-to-end. Researchers interact only for critical decisions, while the model manages literature review, experiment tracking, data pipelines, debugging, and merge requests.

MiniMax also tested M2.7 on MLE Bench Lite, OpenAI’s suite of 22 machine learning competitions that run on a single A30 GPU. Across three 24-hour trials, the model’s best run produced 9 gold medals, 5 silver medals, and 1 bronze medal. The average medal rate of 66.6% tied with Gemini 3.1 and trailed only Opus 4.6 (75.7%) and GPT-5.4 (71.2%).

Benchmark Performance Across Engineering and Office Work

On software engineering benchmarks, M2.7 matches or approaches frontier closed-source models. Its 56.22% on SWE-Pro – a benchmark covering log analysis, bug troubleshooting, code security review, and ML workflow debugging across multiple programming languages – matches GPT-5.3-Codex. On VIBE-Pro, a repo-level code generation benchmark, it scored 55.6%, and it registered 76.5 on SWE Multilingual and 52.7 on Multi SWE Bench.

Beyond AI code generators, MiniMax positioned M2.7 for professional office tasks. On GDPval-AA, which evaluates domain expertise across 45 models, M2.7 achieved an ELO score of 1495 – the highest among open-source models, trailing only Opus 4.6, Sonnet 4.6, and GPT-5.4. On Toolathon, it reached 46.3% accuracy, and it maintained a 97% skill compliance rate across 40 complex skills (each exceeding 2,000 tokens) in MiniMax’s MM Claw evaluation.

The model supports native multi-agent collaboration through what MiniMax calls Agent Teams, where multiple model instances maintain distinct role identities and work together on tasks. This capability targets AI agents for business automation scenarios where stable role boundaries and adversarial reasoning between agents are required.

MiniMax built M2.7 on a Mixture-of-Experts architecture, meaning only a subset of its 229 billion total parameters activate during any single inference pass. This makes the model cheaper and faster to serve than a dense model of comparable output quality – an important consideration for developers who want to run models locally or on limited infrastructure.

MiniMax also open-sourced OpenRoom, an interactive demo built mostly by AI that places agent interactions inside a web GUI with real-time visual feedback, signaling its interest in extending large language models beyond productivity into interactive entertainment.

The release adds another competitive option to the open-weight agent skills landscape, where models from Meta, Alibaba, and DeepSeek have been pushing the boundaries of what is freely available. The self-evolution angle – where a model meaningfully contributes to improving its own successor – remains early-stage, but M2.7 offers the first concrete data points on what that looks like in practice: a 30% internal benchmark gain from 100+ autonomous optimization rounds, with no human intervention in the loop.

Alex McFarland is an AI journalist and writer exploring the latest developments in artificial intelligence. He has collaborated with numerous AI startups and publications worldwide.