OpenAI engineers shared internally in early June 2026 that they had discovered a new optimization able to cut inference costs by more than half, according to reports. The first deployment targeted logged-out, anonymous ChatGPT traffic, where the number of required NVIDIA GPUs reportedly dropped to a few hundred.
Continue reading
The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.
Already purchased? Sign in✓ Signed in — this article isn’t included in your current plan.