Artificial Intelligence
AI as a Researcher: First Peer-Reviewed Research Paper Written Without Humans

Artificial intelligence has crossed another significant milestone that challenges our understanding of what machines can achieve independently. For the first time in scientific history, an AI system has written a complete research paper that passed peer review at an academic conference without any human assistance in the writing process. This breakthrough could be a fundamental shift in how scientific research might be conducted in the future.
Historic Achievement
A paper produced by The AI Scientist-v2 passed the peer-review process at a workshop in a top international AI conference. The research was submitted to an ICLR 2025 workshop, which is one of the most prestigious venues in machine learning. The paper was generated by an improved version of the original AI Scientist, called The AI Scientist-v2.
The accepted paper, titled “Compositional Regularization: Unexpected Obstacles in Enhancing Neural Network Generalization,” received impressive scores from human reviewers. Of the three papers submitted for review, one received ratings that placed it above the acceptance threshold. This breakthrough is a significant advancement as AI can now participate in the fundamental process of scientific discovery that has been exclusively human for centuries.
The research team from Sakana AI, working with collaborators from the University of British Columbia and the University of Oxford, conducted this experiment. They received institutional review board approval and worked directly with ICLR conference organizers to ensure the experiment followed proper scientific protocols.
How The AI Scientist-v2 Works
The AI Scientist-v2 has achieved this success due to several major advancements over its predecessor. Unlike its predecessor, AI Scientist-v2 eliminates the need for human-authored code templates, can work across diverse machine learning domains, and employs a tree-search methodology to explore multiple research paths simultaneously.
The system operates through an end-to-end process that mirrors how human researchers work. It begins by formulating scientific hypotheses based on the research domain it is assigned to explore. The AI then designs experiments to test these hypotheses, writes the necessary code to conduct the experiments, and executes them automatically.
What makes this system particularly advanced is its use of agentic tree search methodology. This approach allows the AI to explore multiple research directions simultaneously, much like how human researchers might consider various approaches to solving a problem. This involves running experiments via agentic tree search, analyzing results, and generating a paper draft. A dedicated experiment manager agent coordinates this entire process to ensure that the research remains focused and productive.
The system also includes an enhanced AI reviewer component that uses vision-language models to provide feedback on both the content and visual presentation of research findings. This creates an iterative refinement process where the AI can improve its own work based on feedback, similar to how human researchers refine their manuscripts based on colleague input.
What Made This Research Paper Special
The accepted paper focused on a challenging problem in machine learning called compositional generalization. This refers to the ability of neural networks to understand and apply learned concepts in new combinations they have never seen before. The AI Scientist-v2 investigated novel regularization methods that might improve this capability.
Interestingly, the paper also reported negative results. The AI discovered that certain approaches it hypothesized would improve neural network performance actually created unexpected obstacles. In science, negative results are valuable because they prevent other researchers from pursuing unproductive paths and contribute to our understanding of what does not work.
The research followed rigorous scientific standards throughout the process. The AI Scientist-v2 conducted multiple experimental runs to ensure statistical validity, created clear visualizations of its findings, and properly cited relevant previous work. It formatted the entire manuscript according to academic standards and wrote comprehensive discussions of its methodology and findings.
The human researchers who supervised the project conducted their own thorough review of all three generated papers. They found that while the accepted paper was of workshop quality, it contained some technical issues that would prevent acceptance at the main conference track. This honest assessment demonstrates the current limitations while acknowledging the significant progress achieved.
Technical Capabilities and Improvements
The AI Scientist-v2 demonstrates several remarkable technical capabilities that distinguish it from previous automated research systems. The system can work across diverse machine learning domains without requiring pre-written code templates. This flexibility means it can adapt to new research areas and generate original experimental approaches rather than following predetermined patterns.
The tree search methodology is a significant innovation in AI research automation. Rather than pursuing a single research direction, the system can maintain multiple hypotheses simultaneously and allocate computational resources based on the promise each direction shows. This approach mirrors how experienced human researchers often maintain several research threads while focusing most effort on the most promising avenues.
Another crucial improvement is the integration of vision-language models for reviewing and refining the visual elements of research papers. Scientific figures and visualizations are critical for communicating research findings effectively. The AI can now evaluate and improve its own data visualizations iteratively.
The system also demonstrates understanding of scientific writing conventions. It properly structures papers with appropriate sections, maintains consistent terminology throughout manuscripts, and creates logical flow between different parts of the research narrative. The AI shows awareness of how to present methodology, discuss limitations, and contextualize findings within existing literature.
Current Limitations and Challenges
Despite this historic achievement, several important limitations restrict the current capabilities of AI-generated research. The company said that none of its AI-generated studies passed its internal bar for ICLR conference track publication standards. This indicates that while the AI can produce workshop-quality research, reaching the highest tiers of scientific publication remains challenging.
The acceptance rates provide important context for evaluating this achievement. The paper was accepted at a workshop track, which typically has less strict standards than the main conference (60-70% acceptance rate vs. the 20-30% acceptance rates typical of main conference tracks. While this does not diminish the significance of the achievement, it suggests that producing truly groundbreaking research remains beyond current AI capabilities.
The AI Scientist-v2 also demonstrated some weaknesses that human researchers identified during their review process. The system occasionally made citation errors, attributing research findings to incorrect authors or publications. It also struggled with some aspects of experimental design that human experts would have approached differently.
Perhaps most importantly, the AI-generated research focused on incremental improvements rather than paradigm-shifting discoveries. The system appears more capable of conducting thorough investigations within established research frameworks than of proposing entirely new ways of thinking about scientific problems.
The Road Ahead
The successful peer review of AI-generated research is the beginning of a new era in scientific research. As foundation models continue improving, we can expect The AI Scientist and similar systems to produce increasingly sophisticated research that approaches and potentially exceeds human capabilities in many domains.
The research team anticipates that future versions will be capable of producing papers worthy of acceptance at top-tier conferences and journals. The logical progression suggests that AI systems may eventually contribute to breakthrough discoveries in fields ranging from medicine to physics to chemistry.
This development also raises important questions about research ethics and publication standards. The scientific community must develop new norms for handling AI-generated research, including when and how to disclose AI involvement and how to evaluate such work alongside human-generated research.
The transparency demonstrated by the research team in this experiment provides a valuable model for future AI research evaluation. By working openly with conference organizers and subjecting their AI-generated work to the same standards as human research, they have established important precedents for the responsible development of automated research capabilities.
The Bottom Line
The acceptance of an AI-written paper at a leading machine learning workshop is a significant advancement in AI capabilities. While the work is not yet at the level of top-tier conference, it demonstrates a clear trajectory toward AI systems becoming serious contributors to scientific discovery. The challenge now lies not only in advancing technology but also in shaping the ethical and academic frameworks that will govern this new frontier of research.