OpenAI ChatGPT Agent Takes Control: New AI Assistant Handles Complex Tasks Independently

OpenAI just rolled out ChatGPT Agent, turning the popular chatbot into an autonomous assistant that can complete complex tasks from start to finish.

This new agent doesn’t just chat anymore. It browses the web, runs code, creates presentations, and handles real-world tasks without constant human input. Users can now ask ChatGPT to plan a dinner party, analyze competitors, or update spreadsheets while they focus on other work.

OpenAI Agent Combines Three Previous Tools Into One

The ChatGPT Agent merges capabilities from three separate OpenAI products. It uses Operator’s web browsing skills, Deep Research’s information analysis powers, and ChatGPT’s conversational abilities. This combination creates a single tool that can handle tasks requiring multiple steps and different skill sets.

OpenAI built the agent with a virtual computer that switches between thinking and acting. The system can open websites, download files, run terminal commands, and view results in a browser. All these actions happen within the same task context, so the agent remembers what it did previously and builds on that work.

The agent works with ChatGPT connectors too. Users can link Gmail, GitHub, and other apps so the agent can access relevant information and take actions across different platforms. When needed, users can take control of the browser themselves to log into accounts or guide specific actions.

Real-World Tasks ChatGPT Agent Can Handle Today

The new agent tackles both personal and professional tasks that previously required multiple steps and tools. For work, it can convert screenshots into editable presentations, reschedule meetings, plan company offsites, and update financial spreadsheets while maintaining formatting.

Personal tasks include planning and booking complete travel itineraries, designing dinner parties, finding medical specialists, and scheduling appointments. The agent can also shop for specific ingredients, research products, and handle form submissions on websites.

OpenAI tested the agent on complex knowledge work tasks typically done by professionals. In roughly half the cases, experts judged the agent’s output as comparable to or better than human work. These tasks included competitive analysis reports, detailed amortization schedules, and technical feasibility studies.

ChatGPT Agent Performance Numbers Beat Previous Models

OpenAI shared benchmark results showing significant improvements over earlier AI models. On Humanity’s Last Exam, which tests AI across expert-level topics, ChatGPT Agent scored 41.6%. This doubles the performance of OpenAI’s o3 and o4-mini models on the same test.

For complex math problems on FrontierMath benchmark, the agent achieved 27.4% accuracy when given access to coding tools. Previous models struggled with these problems that typically take expert mathematicians hours or days to solve.

The agent also outperformed humans on DSBench, a data science evaluation covering analysis and modeling tasks. On spreadsheet editing tasks using SpreadsheetBench, it scored 45.5% compared to Microsoft Copilot’s 20.0% performance in Excel.

Investment banking modeling tasks showed similar results. The agent significantly outperformed previous OpenAI models when building financial models, leveraged buyout analyses, and other complex financial documents with proper formatting and citations.

OpenAI Agent Pricing and Availability Details

ChatGPT Agent launched for Pro, Plus, and Team subscribers on July 17, 2025. Pro users get 400 messages per month, while Plus and Team users receive 40 monthly messages. Additional usage requires flexible credit-based payments.

Pro subscribers ($200 monthly) gained immediate access, with Plus and Team users getting access over several days. Enterprise and Education customers will receive access in the coming weeks. European Economic Area and Switzerland users must wait longer due to regulatory considerations.

The pricing puts OpenAI’s agent in direct competition with other AI assistant services. However, the company’s integration of multiple capabilities into one tool may justify the premium pricing for users who need complex task automation.

Safety Measures Address New Agent Risks

OpenAI implemented extensive safety controls because the agent can take real actions on the web. The company classified ChatGPT Agent as “High Biological and Chemical capabilities” under its Preparedness Framework, activating comprehensive safeguards.

The safety stack includes real-time monitoring, prompt injection resistance, and explicit user confirmation for consequential actions. OpenAI trained the agent to refuse high-risk tasks like bank transfers and requires active supervision for critical actions like sending emails.

Privacy controls let users delete all browsing data with one click and log out of active website sessions immediately. During browser takeover mode, when users interact directly with websites, ChatGPT doesn’t collect or store any entered data including passwords.

The company also disabled ChatGPT’s memory feature for the agent to prevent data exfiltration through prompt injection attacks. OpenAI may restore this feature later with additional protections.

ChatGPT Agent Limitations and Future Development

OpenAI acknowledges the agent remains in early stages despite impressive capabilities. The system can make mistakes and sometimes struggles with complex multi-step tasks requiring perfect execution.

Slideshow generation currently produces basic formatting that may need manual refinement. OpenAI noted discrepancies between viewer displays and exported PowerPoint files, though the company is training improved versions to address these issues.

Users cannot yet upload existing slideshows as templates, unlike the spreadsheet editing feature. The company plans regular updates to improve efficiency, reduce required oversight, and expand capabilities while maintaining safety standards.

The Operator research preview will remain functional for several weeks before shutdown. Deep Research remains available as a separate option for users who prefer more detailed, in-depth responses that take longer to generate.