OpenAI launches GPT-4o: Elevating ChatGPT with multimodal AI

Next-gen AI model, ChatGPT-4o, brings seamless voice and video capabilities to users

May 14, 2024 - 18:24

OpenAI launches GPT-4o: Elevating ChatGPT with multimodal AI

OpenAI has now released the GPT-4o, an evolution of its GPT-4 model that powers its flagship product, ChatGPT.

The most recent update "is much faster" and boosts "capabilities across text, vision, and audio," OpenAI CTO Mira Murati said in a livestream announcement on Monday, according to The Verge.

It will be free for all users, and paying users will continue to "have up to five times the capacity limits" of the free version, according to Murati.

According to an OpenAI blog post, GPT-4o's features "will be rolled out iteratively (with extended red team access starting today)," but both text and image capabilities will be available in ChatGPT as of today.

Furthermore, OpenAI CEO Sam Altman stated that the model is “natively multimodal”. This means that the model has the ability to generate information and understand commands in voice, text, or graphics.

“Developers who want to tinker with GPT-4o will have access to the API, which is half the price and twice as fast as GPT-4-turbo,” Altman added on X.

Speech and video will be available to all users, free or paid, in the coming weeks. The crucial issue is to understand the difference between utilizing speech and video to connect with ChatGPT-4o.

These changes, according to OpenAI, are aimed at "reducing the friction" between "humans and machines" and "bringing AI to everyone".