News

OpenAI’s next training set may be the real-world

Brendyn Lotz

14th May 2024

OpenAI has revealed its latest development GPT-4o.
Designed to make human-computer interactions more natural, the model is able to parse image, text, and audio to offer responses.
The goal appears to be putting GPT-4o into the real world where it can hoover up that data for training.

On Monday evening, ahead of Google’s I/O event, OpenAI hosted a press conference to show off its latest developments in its GPT artificial intelligence platform.

That development is GPT-4o, o here meaning omni, which OpenAI is a step towards more natural interactions between humans and computers. The model accepts any combination of audio, text or image inputs and outputs text, audio or images.

“It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models,” writes OpenAI.

The company showed off GPT-4o’s capabilities in a series of videos and its tough not to be impressed. The model is able to translate a conversation in real time, offer advice for a job interview and more. However, there are also moments that are jarring.

For example, in one clip a human shows GPT-4o a dog and the AI responds as you might expect a human would when confronted with a dog. The bot fusses over the dog and even uses sighs and asks the dog its name. The behaviour is very human which is equal parts alarming and impressive.

OpenAI says this latest model is able to produce responses, particularly from vocal inputs far faster than previous GPT models. This was made possible by training the model on text, vision and audio all on the same neural network rather than combining separate training models.

“With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” OpenAI explains.

It’s clear from several examples showcased by OpenAI that it wants GPT-4o to be used in the real world be that for translation or to find out what things are by asking the bot. That is likely both a sign that the AI model is evolving to be more useful and that training data is harder to come by. There are several questions about how OpenAI trains its AI and whether it knowingly infringes on copyright limitations. The internet has also become a mess of AI-generated content so it’s no longer a viable training source.

@morningbrew
Wish we could see its Goodreads reviews #ai #tech #techtok #wallstreet #business @Macy is on TikTok
♬ original sound – Morning Brew

“GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered,” OpenAI said.

GPT-4o is available for free right now but its limited to text and image capabilities for now. Those subscribed to ChatGPT Plus will get five times more messages than free users. OpenAI is taking a slow approach to rolling this model out and will drip-feed new capabilities to the public after they have been extensively tested.

“Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in the API in the coming weeks,” the firm added.

This is a big step for OpenAI but it does depend on how well GPT-4o works in the hands of users. With that having been said, Apple and Google are going to have to impress when they announce their AI plans. While Apple has until June to make magic happen, Google has just one day to respond and we’re not sure it has something as impressive as GPT-4o. We’re always happy to be surprised though.

About Author

Brendyn Lotz

Brendyn Lotz writes news, reviews, and opinion pieces for Hypertext. His interests include SMEs, innovation on the African continent, cybersecurity, blockchain, games, geek culture and YouTube.