Like “Avengers” director Joe Russo, I’m increasingly convinced that movies and TV shows entirely created by artificial intelligence will be possible in our lifetime.
A series of AI revelations in recent months, particularly the ultra-realistic sound of OpenAI’s text-to-speech engine, have provided a glimpse into this brave new frontier. But Meta’s announcement today put the future of our AI-generated content into particularly sharp relief — at least for me.
Meta debuted Emu Video this morning, an evolution of the tech giant’s video production tool, Emu. Given a caption (eg “A dog running across a grass”), an image or photo combined with a description, Emu Video can create a four-second animated clip.
Emu Video clips can be edited with a complementary AI model called Emu Edit, which was also announced today. Users can describe the modifications they want to make in Emu Edit in natural language — e.g. “the same clip, but in slow motion” — and see the changes reflected in a newly created video.
Now, video production technology is not new. Meta has experimented with this before, as has Google. Meanwhile, startups like Runway are already building businesses on it.
But Emu Video’s 512×512, 16fps clips are easily some of the best I’ve seen in terms of fidelity — to the point where my untrained eye has trouble telling them apart from the real thing.
Well — at least some of them. Emu Video seems to have the most success animating simple, mostly static scenes (eg waterfalls and city skyline timelapses) that stray from photorealism — that is, in styles like cubism, anime, “papercraft” and the steampunk. A clip of the Eiffel Tower at dawn “as a painting”, with the tower reflected in the River Seine below it, reminded me of an e-card you might see American Greetings.
Even in Emu Video’s best work, however, AI-generated weirdness manages to creep in — like weird physics (e.g., skateboards moving parallel to the ground) and horrible props (fingers curling behind legs and feet that bind together). Objects often appear and fade from view for no particular reason, such as the birds above in the aforementioned Eiffel Tower clip.
After too much time spent browsing Emu Video’s creations (or at least the examples Meta chose), I began to notice another obvious sign: the subjects in the clips weren’t… well, I am doing very. As far as I can tell, Emu Video doesn’t seem to have a strong understanding of action verbs, perhaps a limitation of the model architecture.
For example, a cute anthropomorphic raccoon in an Emu video clip will hold a guitar, but not scratching the guitar — even if the clip’s caption included the word “strum.” Or two unicorns will “play” chess, but only in the sense of sitting inquisitively in front of a chessboard without moving the pieces.
So clearly there is work to be done. However, Emu Video is the most basic b-roll It wouldn’t be out of place in a movie or TV show today, I’d say — and the moral implications of that frankly terrify me.
Beyond the danger of deepfakes, I fear for the animators and artists whose livelihoods depend on creating the kinds of scenes that AI like Emu Video can now approach. Meta and its AI rivals will likely argue that Emu Video, which Meta CEO Mark Zuckerberg says integrates with Facebook and Instagram (hopefully with better toxicity filters from Meta’s AI-generated stickers), increase order replace people artists. But I’d say that takes the optimistic, if not disingenuous, view — especially when it comes to money.
Earlier this year, Netflix used AI-generated background images in a three-minute animated short. The company he claimed that technology could help with anime’s alleged labor shortage—but it conveniently glossed over how low pay and often arduous working conditions put artists out of business.
In a similar controversy, the studio behind the credits sequence for Marvel’s “Secret Invasion” admitted to using AI, notably the text-to-image conversion tool Midjourney, to create much of the show’s artwork. Series director Ali Selim argued that the use of artificial intelligence fits the show’s paranoid themes, but most of the artist and fan community he strongly disagreed.
The actors could also be on the block. One of the major sticking points in the recent SAG-AFTRA strike was the use of artificial intelligence to create digital likenesses. The studios eventually agreed to pay the actors for the AI-generated likenesses. But could they rethink that as technology improves? I think it is possible.
Adding insult to injury, AI like Emu Video is typically trained on images and videos produced by artists, photographers and filmmakers — and without notifying or compensating those creators. In a white paper Accompanying Emu Video’s release, Meta says only that the model was trained on a dataset of 34 million “video-text pairs” ranging in length from five to 60 seconds — not where those videos came from, their copyright status or if Meta has their license.
(After this article was published, a Meta representative told TechCrunch via email that Emu was trained on “data from licensed partners.”)
There have been adjustments and moves towards industry-wide standards to allow artists to “opt out” of training or receive payment for AI-generated works to which they contributed. But if Emu Video is any indication, technology – as is often the case – will soon be way ahead of ethics. Maybe he already has.