What is OpenAI’s Sora? Understanding the latest AI text-to-video solution

OpenAI just dropped its latest creation, Sora.

What is OpenAI’s Sora? Understanding the latest AI text-to-video solution
Source: Sora/OpenAI

The backstory: Text-to-image artificial intelligence (AI) has been a hot topic in tech for a while now. While text-to-image generators like Midjourney are gaining popularity, companies like Runway and Pika are moving forward with text-to-video models.

OpenAI, a big player in the AI world, has been making waves in the scene, especially with the launch of ChatGPT. The AI tool quickly gained traction, reaching 100 million users in just two months – faster than TikTok or Instagram ever did. Even before ChatGPT, OpenAI introduced DALL-E, its text-to-image model. By 2022, the company released DALL-E 2, although access was initially delayed due to concerns about explicit and biased images. Eventually, OpenAI sorted out these issues, making DALL-E 2 accessible to everyone.

More recently: This month, OpenAI slapped some watermarks on images made with DALL-E 3, but the company acknowledged that these could be easily removed. Meanwhile, Meta, the company behind Facebook and Instagram, announced that it would identify and label images created by other companies' AI services on its platforms by sneaking little hidden markers into the files. Meta is also dipping its toes into the AI-generated audio and video pond, aware of the potential and pitfalls that come with it.

The development: OpenAI just dropped its latest creation, Sora (nicknamed after the Japanese word for “sky”), which can create up to one-minute-long videos from short text prompts. Basically, you tell it what you want, and Sora brings your ideas to life on the screen. OpenAI explained how Sora works in a recent blog post, saying it turns these prompts into scenes with characters, actions and backgrounds.

Apart from making videos, Sora can also enhance still images, patch up missing parts in videos and even make them longer. OpenAI demonstrated this with examples like scenes from the California gold rush and a virtual train ride in Tokyo. CEO Sam Altman also shared some video clips on X made by Sora in response to user prompts. Right now, OpenAI is only giving access to Sora to researchers, visual artists and filmmakers. They'll test the tool to make sure it follows OpenAI's rules, which say no to extreme violence, sexual content and celebrity lookalikes.

Key comments:

“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” said OpenAI in a blog post.

“Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion and multiple characters with vibrant emotions,” said OpenAI on X.        

“One obvious use case is within TV; creating short scenes to support narratives,” said Reece Hayden, a senior analyst at market research firm ABI Research. “The model is still limited though, but it shows the direction of the market.”