AI startups that is not OpenAI is retiring this week, it seems — sticking to their product roadmaps, even as coverage of the OpenAI mess dominates the airwaves.
See: Stability AI, which this afternoon was announced Stable Video Diffusion, an artificial intelligence model that creates video by animating existing images. Based on Stability’s existing text-to-image Stable Diffusion model, Stable Video Diffusion is one of the few video creation models available in open source — or commercially, for that matter.
But not to everyone.
Stable Video Diffusion is currently in what Stability describes as “research preview.” Those wishing to run the model must agree to certain terms of use, which describe Stable Video Diffusion’s intended applications (eg “educational or creative tools”, “design and other artistic processes”, etc.) and non- intended (“actual or true representations of people or events”).
Given how other such AI research previews – inclusive Own stability — have historically passed, this writer wouldn’t be surprised to see the model begin circulating on the dark web in short order. If so, I’d be concerned about the ways in which the static video could be abused, since it doesn’t appear to have a built-in content filter. When Stable Diffusion was released, it didn’t take long for actors with questionable intentions to use it to create non-consensual deepfake porn — and worse.
But I digress.
Stable Video Diffusion comes in two models, actually — SVD and SVD-XT. The first, SVD, converts still images to 576×1024 video at 14 frames. The SVD-XT uses the same architecture, but increases the frames to 24. Both can produce video at between three and 30 frames per second.
According to a white paper released alongside Stable Video Diffusion, SVD and SVD-XT were first trained on a dataset of millions of videos and then “optimized” on a much smaller set of hundreds of thousands to about a million clips. Where those videos came from isn’t immediately clear — the document implies that many came from public research datasets — so it’s impossible to say whether any of them were copyrighted. If it were, it could open users of Stability and Stable Video Diffusion to legal and ethical challenges regarding usage rights. Time will tell.
Whatever the source of the training data, the models—both SVD and SVD-XT—create reasonably high-quality four-second clips. In this writer’s estimation, the featured samples on the Stability blog could come in contact with results from Meta’s recent video production model as well as AI-generated examples we’ve seen from Google and startups artificial intelligence businesses Runway and Pika Labs.
But Stable Video Diffusion has limitations. Stability is transparent about this, writing on the models’ Hugging Face pages — The pages from where researchers can apply for access to Stable Video Diffusion — that models cannot create motionless or slow-motion video, be controlled by text, render text (at least not legibly), or consistently create faces and “right” people.
Still — while it’s early days — Stability notes that the models are quite extensible and can be adapted for use cases like creating 360-degree views of objects.
So what can Stable Video Diffusion evolve into? Well, Stability says it plans “a variety” of models that “build on and extend” SVD and SVD-XT, as well as a “text-to-video” tool that will bring text messages to models online. The ultimate goal seems to be commercialization — Stability correctly notes that Stable Video Diffusion has potential applications in “advertising, education, entertainment and beyond.”
Certainly, Stability hit, as investors in the startup increase the pressure.
In April Semafor mentionted that Stability AI was running out of cash, prompting a managerial hunt to boost sales. According to Forbes, the company has repeatedly delayed or defaulted on wages and payroll taxes, prompting AWS — which Stability uses to train its models — to threaten to revoke Stability’s access to its GPU instances.
Stability AI recently lifted up $25 million through a convertible note (that is, debt that turns into equity), bringing its total to over $125 million. But no new financing has been closed at a higher valuation. the startup was last valued at $1 billion. The stable is said to be looking to quadruple that in the coming months, despite stubbornly low revenue and a high burn rate.
Stability suffered another blow recently with the departure by Ed Newton-Rex, who was VP of audio at the startup for just over a year and was instrumental in launching Stability’s music production tool, Stable Audio. In a public letter, Newton-Rex said he left Stability over a dispute over copyright and how copyrighted data should — and shouldn’t — be used to train AI models.