Pruna aiA European boot that works in AI compression algorithms makes its optimization frame open source on Thursday.
Pruna AI creates a framework that applies several methods of efficiency, such as temporary storage, pruning, quantification and distillation in a given AI model.
“We are also standardizing the savings and loading of compressed models, applying these compression methods and also evaluating your compressed model after you compress it,” said Pruna Ai Co-Ponder and CTO John Rachwan in Techcrunch.
In particular, the Pruna AI framework can evaluate if there is a significant quality loss after compression of a model and the performance profits you get.
“If I had to use a metaphor, we are similar to the way they embrace standard transformers and diffusers – how to call them, how to save them, load them, etc. We do the same, but for efficiency methods,” he added.
Large AI laboratories have already used various compression methods. For example, Openai is based on distillation to create faster versions of flagship models.
This is likely how Openai developed the GPT-4 turbo, a faster version of GPT-4. Similarly, the Flux.1-Schnell The image creation model is a distilled version of the Flux.1 model from the Black Forest Labs.
Distillation is a technique used to extract knowledge from a large AI model with a “educator-lecturer” model. Developers send requests to a teacher model and record the results. The answers are sometimes compared to a data set to see how accurate they are. These results are then used to educate the student model, which is trained to approach the teacher’s behavior.
“For big companies, what they usually do is build these things at home and what you can find in the world open source is usually based on individual methods. For example, let’s say a quantification method for LLMS or a temporary storage method for diffusion models,” Rachwan said. “But you can’t find a tool that gathers all of this, makes it all easy to use and combine together. And this is the great value that Pruna brings right now.”
While Pruna AI supports all kinds of models, from large language models to diffusion models, text -based speech models and computer vision models, the company focuses more specifically on imagery and video models right now.
Some of existing Pruna AI users include Scenario and Flare. In addition to the open source version, Pruna AI has a business offer with advanced optimization capabilities, including an optimization factor.
“The most exciting trait we have released soon will be a compression factor,” Rachwan said. “Basically. You give it your model. You say,” I want more speed, but don’t throw my precision by more than 2%. “And then, the agent will only do his magic.
PRUNA AI charges the time for Pro. “It’s similar to how you would think of a GPU when you rent a GPU on AWS or any cloud service,” Rachwan said.
And if your model is a critical part of your AI infrastructure, you will end up saving a lot of money in conclusions with the optimized model. For example, Pruna AI has made a lama model eight times smaller without excessive loss using the compression frame. Pruna AI hopes her clients will think of the compression framework as an investment she pays for herself.
Pruna AI increased a $ 6.5 million seed funding a few months ago. Investors in boot include EQT Ventures, Daphni, Motier Ventures and Kima Ventures.