OpenAI has rolled out two new open-weight large language models alongside a new a red teaming challenge with a prize fund of $500,000.

On August 5, at 10am Pacific Time (PT), Sam Altman, OpenAI’s CEO, posted “gpt-oss is out” on his social media.

Gpt-oss, which stands for ‘GPT open source,’ is now available in two versions:

gpt-oss-20b, a medium-sized model that can run on most desktops and laptops with 16GB of memory

gpt-oss-120b, a large model designed to run in data centers and high-end desktops and laptops, requiring 80 GB of memory

At the same time, OpenAI launched a red teaming challenge for gpt-oss-20b on Kaggle, a competition platform for data science and artificial intelligence contests.

The objective is to encourage researchers, developers and AI hobbyists help identify novel safety issues.

GPT OSS Fine-Tuned to Solve Capture the Flag Competitions

According to Altman, gpt-oss-120b “is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4-mini.”

“It’s a big deal, [and] we believe this is the best and most usable open model in the world,” he added.

Both models are available for developers on most AI and cloud platforms, including Azure, Hugging Face, vLLM, Ollama, and llama.cpp, LM Studio, AWS, Fireworks, Together AI, Baseten, Databricks, Vercel, Cloudflare and OpenRouter.

According to Eric Wallace, a researcher at OpenAI, responsible for safety, robustness and alignment, before releasing the models, OpenAI conducted a "first of its kind safety analysis" to "intentionally maximize their bio and cyber capabilities."

The goal of this analysis was to "estimate a rough 'upper bound' on the possible harms from adversaries."

To do this, they fine-tuned the models with in-domain data to maximize biorisk capabilities and with a coding environment to solve capture the flag (CTF) competitions for cybersecurity.

Wallace said his team found that the "malicious-finetuned gpt-oss underperforms OpenAI o3, a model below Preparedness High capability" and that while it "marginally outperforms open-weight models on bio capabilities," it "does not substantially push the frontier."