What is OpenAI’s Sora, AI Capable of Creating Videos from Text

March 17, 2024September 15, 2024

On February 15th, US time, OpenAI unveiled ‘Sora’, a new generative AI that can create videos up to one minute long from simple text prompts. However, the release date for this highly anticipated new system from OpenAI has not yet been determined.

The following video was actually generated by Sora.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Prompt: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. the art style is 3d and realistic, with a focus on lighting and texture. the mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with… pic.twitter.com/aLMgJPI0y6
— OpenAI (@OpenAI) February 15, 2024

With just one sentence of prompting, they generate such high quality video.

This article will provide an overview of Sora, its features and problems. Please read to the end.

What is OpenAI’s Sora?

Sora is a video generation AI model released by OpenAI on February 15, 2024.
Although not available to the public at this time (as of February 2024), it will eventually be made available to general users.

Sora is capable of generating much higher quality videos than previous video generation AIs, and can create videos up to one minute in length. The following video, which was actually generated, shows that it has reached a level where it is indistinguishable from live-action video.

However, at this stage, due to challenges in physical simulation and the prevention of misinformation spread, the general release of Sora is on hold until appropriate safety measures are in place. OpenAI is collaborating with expert teams to ensure Sora’s safety and plans to develop tools for detecting generated videos.

Sora represents a significant milestone in the advancement of AI technology, marking a step toward the ultimate goal of achieving AGI (Artificial General Intelligence).

Source: OpenAI ‘Creating video from text

What can Sora do?

Sora, released by OpenAI, does more than just generate video from text.

Let’s take a look at what kind of functions it has.

Text-to-Video

First is the Text-to-Video function. This function allows users to generate a video simply by giving text instructions.
Until now, conventional text-to-video video generation AIs could only generate videos of a few dozen seconds at most. Sora, however, can generate videos up to one minute in length, and the quality is so good that it can be mistaken for live-action video.

For example, Sora can generate one-minute videos like the following

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Although there are some discrepancies, such as the Japanese in the background, it is hard to tell the difference from the actual video shot.
If you can generate a high-quality one-minute video with simple text instructions, you will be able to use it to create short videos for posting on TikTok, short videos for advertisements, etc.

Image-to-Video

Sora supports not only text input, but also image input. In other words, it can animate images.

For example, the following images can be animated and processed.

https://openai.com/research/video-generation-models-as-world-simulators

If ChatGPT’s image generation function can create an arbitrary image and animate that image, the range of applications is likely to expand dramatically.

Video-to-Video

As with the previous image, Sora also allows video input.
For example, the original video below can be changed to an underwater world view.

Original Video

https://openai.com/research/video-generation-models-as-world-simulators

Converting to an underwater world view

https://openai.com/research/video-generation-models-as-world-simulators

As shown above, Sora allows various edits to be made to the original video.
Normally, video editing as described above would be extremely difficult, and even if it could be done, the editing work would take an enormous amount of time and cost. Simply throwing the entire process to Generation AI will reduce the editing process, and will likely reduce the workload of those who are involved in video production.

Image Generation

Sora can also generate high-quality images. It can generate images with a resolution of up to 2048 x 2048, and can even produce “people images that look just like photographs,” as shown below.

https://openai.com/research/video-generation-models-as-world-simulators

It generates images with a level of quality that is not recognizable as photographs.

Incidentally, the current paid version of ChatGPT uses an image generation AI called “DALL-E 3. Since Sora is also a service developed by OpenAI, the same company that operates ChatGPT, the quality of image generation at ChatGPT is expected to be further improved.

Sora’s Challenges

Although Sora is capable of generating high-quality videos, there are some challenges.

For example, the AI does not fully understand physics, so expressions such as “glass breaking” as shown below were difficult.

https://openai.com/research/video-generation-models-as-world-simulators

Another problem is that “in scenes that include a large number of entities such as people or animals, these entities may suddenly appear from unnatural locations.

The ability to generate high-quality video also carries the risk of “realistic video being misused,” and OpenAI is currently continuing its research to remedy the problem.

When the system is ready, the general public will be able to use Sora.

Summary

In this video, we discussed Sora, a video generation AI released by OpenAI.

Since the release of ChatGPT at the end of 2022, generative AI has undergone remarkable evolution. With more and more new services being released, the era in which AI is commonplace is already just around the corner.

To live successfully in the age of AI, let’s learn more and more about generative AI and use it more and more in our business and personal lives.

よかったらシェアしてね！