Connect with us

Cybersecurity

OpenAI Launches Codex Security To Find Vulnerabilities in Code

mm

OpenAI released Codex Security on March 6, an AI-powered application security agent that scans codebases for vulnerabilities, validates findings in sandboxed environments, and proposes patches. The tool has already uncovered flaws in OpenSSH, Chromium, and five other widely used open-source projects, earning 14 Common Vulnerabilities and Exposures (CVE) designations.

Codex Security, formerly known as Aardvark, spent roughly a year in private beta before graduating to a research preview available to ChatGPT Pro, Enterprise, Business, and Edu customers. OpenAI is offering complimentary access for the first month.

The agent differs from conventional static analysis tools by building a project-specific threat model before scanning. It analyzes a repository’s architecture to understand what the system does, what it trusts, and where exposure is highest. Teams can edit the threat model to keep findings aligned with their risk posture. When configured with a tailored environment, Codex Security pressure-tests potential vulnerabilities directly against the running system, generating proof-of-concept exploits to confirm real-world impact.

Performance at Scale

Over the past 30 days of beta testing, Codex Security scanned more than 1.2 million commits across external repositories, surfacing 792 critical findings and 10,561 high-severity issues. Critical vulnerabilities appeared in fewer than 0.1% of scanned commits, suggesting the system can process large codebases while keeping noise manageable for reviewers.

OpenAI reports that precision improved substantially during the beta period. In one case, noise dropped by 84% between initial rollout and the current version. Across all repositories, false positive rates fell by more than 50%, and findings with over-reported severity declined by over 90%. The agent also incorporates feedback: when users adjust a finding’s criticality, it refines the threat model for subsequent scans.

Those numbers address a persistent complaint from security teams evaluating AI coding tools. A 2025 analysis of 80 coding tasks across more than 100 large language models found that AI-generated code introduces security vulnerabilities in 45% of cases, making downstream detection tools increasingly important as AI-written code proliferates.

Open-Source Vulnerability Discoveries

OpenAI has been running Codex Security against the open-source repositories it depends on, reporting high-impact findings to maintainers. The disclosed list includes OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium. Of the 14 assigned CVEs, two involved dual reporting with other researchers.

In conversations with maintainers, OpenAI said the primary challenge was not a shortage of vulnerability reports but an excess of low-quality ones. Maintainers needed fewer false positives and less triage burden — feedback that shaped Codex Security’s emphasis on high-confidence findings over volume.

The company also announced Codex for OSS, a program providing free ChatGPT Pro and Plus accounts, code review support, and Codex Security access to open-source maintainers. The vLLM project has already used the tool to find and patch issues within its normal workflow. OpenAI plans to expand the program in the coming weeks.

The launch positions OpenAI as a direct participant in application security, a market where incumbents like Snyk, Semgrep, and Veracode have established footholds. Google recently published a detailed security architecture for its own AI agent features in Chrome, signaling that the intersection of AI agents and security tooling is attracting attention from multiple directions.

Several questions remain unanswered. OpenAI has not disclosed pricing after the free trial period, nor has it specified which frontier model powers Codex Security’s reasoning. The tool currently operates through Codex web rather than offering API-level integration, potentially limiting adoption by teams with existing security automation pipelines. Whether Codex Security can maintain its precision improvements as it scales beyond beta — and whether open-source maintainers adopt the program at meaningful scale — will determine if the agent becomes a lasting fixture in the AI-assisted development stack or remains a research preview.

Alex McFarland is an AI journalist and writer exploring the latest developments in artificial intelligence. He has collaborated with numerous AI startups and publications worldwide.