OpenAI’s new — and first! — Video production model Sora can pull off some truly impressive cinematic feats. But the model is even more capable than OpenAI was initially shown to be, at least judging from a technical point of view paper posted this evening.
The paper, titled “Video Production Models as World Simulators,” co-authored by several OpenAI researchers, peels back key aspects of Sora’s architecture — for example revealing that Sora can generate videos of arbitrary resolution and aspect ratio (up to 1080p). According to the paper, Sora is able to perform a range of image and video editing tasks, from creating a rewind video to extending video forward or backward in time to changing the background of an existing video.
But most interesting to this author is Sora’s ability to “simulate digital worlds,” as the OpenAI co-authors put it. In an experiment, OpenAI unleashed Sora in Minecraft and had it render the world—and its dynamics, including physics—while simultaneously controlling the player.
So how can Sora do this? Well, like was observed by Nvidia Senior Researcher Jim Fan (via Quartz), Sora is more of a “data-driven physics engine” than a creative one. It’s not just creating a photo or video, but determining the physics of each object in an environment — and rendering a photo or video (or an interactive 3D world, as the case may be) based on those calculations.
“These capabilities suggest that continuous scaling of video models is a promising path toward developing highly capable simulators of the physical and digital worlds and the objects, animals, and people that inhabit them,” the authors write.
Now, Sora’s usual limitations apply in the video game realm. The model cannot accurately approximate the physics of basic interactions such as glass breakage. And even with interactions can model, Sora is often inconsistent — for example, rendering a person eating a burger but showing no bite marks.
However, if I’m reading the paper correctly, it looks like Sora could pave the way for more realistic — maybe even photorealistic — procedurally generated games. This is equally exciting and terrifying (consider the implications of deepfake, for one) — which is probably why OpenAI chooses to put Sora behind a very limited access program for now.
Here’s hoping we learn more sooner rather than later.