article

OpenAI Unveils GPT-5-Codex: Autonomous Coding for Over 7 Hours, Project Review, and Refactoring Capabilities

9 min read

OpenAI has officially launched GPT-5-Codex, a specialized version of its advanced AI model, meticulously optimized for autonomous software development and coding tasks. Announced early this morning, GPT-5-Codex represents a significant leap forward in AI-assisted programming, capable of not only rapid interactive responses but also extended, independent execution of complex software engineering projects.

Image

The training of GPT-5-Codex has been intensely focused on real-world software engineering challenges. This specialized model excels in code review, identifying critical vulnerabilities before deployment, and can undertake lengthy, intricate tasks with remarkable autonomy.

GPT-5-Codex is now live across all existing Codex use cases, including the Codex CLI, IDE extensions, web interface, mobile devices, and GitHub code review. It serves as the default model for cloud-based tasks and code reviews. Developers can also opt to use it for local tasks via the Codex CLI or IDE plugins. Notably, Codex functionalities are integrated into ChatGPT’s Plus, Pro, Business, Edu, and Enterprise subscription tiers.

Within just two and a half hours of its release, OpenAI CEO Sam Altman expressed his excitement, noting that GPT-5-Codex was already accounting for approximately 40% of Codex traffic, a figure he anticipates will become the dominant usage within the day.

Image

“Since the launch of Codex CLI in April and Codex Web in May, Codex has steadily evolved into a more efficient programming assistant,” stated OpenAI. “Two weeks ago, we unified Codex into a single product experience, integrated with ChatGPT accounts. This allows for seamless switching between local environments and cloud tasks without losing context.”

The initial reception has been overwhelmingly positive, with some users hailing it as “the best thing since sliced bread.”

Image

OpenAI has formally incorporated GPT-5-Codex into the GPT-5 System Card as an addendum.

Image
Link: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex/

Deep Dive into GPT-5-Codex Capabilities

GPT-5-Codex has been engineered to excel in agentic software engineering within realistic development scenarios.

Its training encompasses sophisticated tasks such as full project construction, feature development, test writing, debugging, large-scale refactoring, and comprehensive code reviews. Compared to the general GPT-5 model, GPT-5-Codex offers enhanced controllability, adheres more precisely to AGENTS.md instructions, and delivers superior code quality. OpenAI commented, “You simply tell it what you want, without needing to write lengthy style guides.”

The model demonstrates superior accuracy over GPT-5 (high) on both the SWE-bench Verified (software engineering) and Code refactoring tasks benchmarks.

Image

Significantly, OpenAI’s performance on SWE-bench Verified now utilizes all 500 tasks within the dataset, addressing previous criticisms for only using 477 tasks. OpenAI clarified that the prior limitation was due to infrastructure issues that have now been resolved. The Code refactoring tasks benchmark includes refactoring challenges from substantial, mature software libraries across languages like Python, Go, and OCaml. For instance, a pull request for Gitea involved modifying 232 files and 3,541 lines of code to introduce a ctx variable for application logic.

Beyond enhanced performance, GPT-5-Codex dynamically adjusts its processing time based on task complexity. It seamlessly integrates two core capabilities: interactive sessions that collaborate with developers and persistent, autonomous execution for long-running tasks.

For minor requests or conversations, GPT-5-Codex responds with greater speed. Conversely, for intricate tasks like major refactors, it can sustain operations for extended periods. OpenAI reported, “In testing, we observed GPT-5-Codex independently running for over 7 hours, continuously iterating on implementation, fixing tests, and ultimately delivering usable code.”

OpenAI shared internal usage data illustrating its efficiency:

Image

GPT-5-Codex has also been specifically trained for code review, proactively identifying critical vulnerabilities. It meticulously analyzes codebases, examines dependencies, and executes code and tests to verify correctness. Evaluations using recent commits from popular open-source projects, validated by experienced engineers, revealed that GPT-5-Codex’s review comments were less prone to errors or irrelevancies, maintaining a focused attention on critical issues.

Image

On frontend tasks, GPT-5-Codex exhibits reliable performance, capable of generating aesthetically pleasing desktop applications and significantly improving user experience in preference tests for mobile websites. In cloud environments, it can process uploaded images or screenshots, review its progress, and return screenshots of the results.

While GPT-5-Codex is deeply optimized for Codex CLI, IDE plugins, cloud environments, and GitHub, and supports various tool calls, OpenAI advises, “Unlike the general GPT-5, we recommend using GPT-5-Codex exclusively within Codex or similar scenarios.”

Codex Updates and Enhancements

In addition to the launch of GPT-5-Codex, OpenAI announced several Codex updates, including a redesigned Codex CLI and new Codex IDE plugins.

Codex CLI

The Codex CLI is now open-source. OpenAI has revamped the CLI based on community feedback over recent months to better support “autonomous programming” workflows, making the model a more robust and dependable partner.

Users can now directly include images, such as screenshots, wireframes, and design mockups, within the CLI. This facilitates shared context, clarifies design decisions, and improves the likelihood of achieving desired outcomes.

For complex tasks, Codex employs a to-do list to track progress and integrates with external systems like web search and MCP, enhancing tool call accuracy.

The terminal interface has also been upgraded for clearer formatting of tool calls and code diffs.

Image

The approval mode has been streamlined into three options:

The CLI also supports compressed conversation states, simplifying the management of extended sessions.

Codex IDE Plugin

Codex is now directly accessible within IDEs. This plugin supports VS Code, Cursor, and other VS Code derivatives, bringing Codex into the editor for seamless preview of local changes and direct code modifications.

Image

OpenAI highlights several advantages of using Codex within an IDE:

Cloud-Based Codex

Beyond the CLI and IDE plugins, a new GitHub integration brings Codex’s cloud-based intelligence closer to developers’ daily workflows, allowing tasks to be assigned to Codex without leaving the editor or GitHub.

Image

OpenAI has also been enhancing cloud performance behind the scenes:

Similar to the CLI and IDE, cloud-based Codex supports image inputs. Developers can upload frontend design specifications or screenshots of UI bugs. Codex will run the generated content within the browser to verify its appearance and attach screenshots to tasks or GitHub PRs.

Code Review Functionality

Codex now includes code review capabilities, designed to identify critical defects. Unlike static analysis tools, Codex performs a comprehensive review by:

This level of scrutiny, typically performed by highly meticulous human engineers, fills a crucial gap, helping teams identify issues earlier, reduce review burdens, and deploy with greater confidence.

Image

When enabled on GitHub:

OpenAI shared, “Internally at OpenAI, Codex has reviewed the vast majority of our PRs, identifying hundreds of issues daily, many of which are caught before human review even begins. This allows teams to move faster while maintaining confidence.”

Ensuring Codex Security

OpenAI has also detailed the security measures implemented during Codex development to protect code and data, along with safeguards against potential misuse.

OpenAI advises, “We always recommend developers review Codex’s work before deployment. Codex provides references, terminal logs, and test results for each task to facilitate manual verification.” However, they emphasize that Codex should serve as an additional reviewer, not a complete replacement for human oversight.

Similar to GPT-5, OpenAI classifies tasks in biology and chemistry as “High” capability for GPT-5-Codex and has implemented corresponding safety protocols to minimize potential risks.

Pricing and Availability

Codex is included in ChatGPT Plus, Pro, Business, Edu, and Enterprise subscriptions.

While Codex CLI is not yet accessible via API key, OpenAI has announced that GPT-5-Codex will be available through the API soon.