Cohered for the non -profit AI research workshop, AI Startup Cohere, this week has released a multimodal “open” model AI, Aya Vision, the lab he claims is the best in class.
Aya Vision can perform tasks such as image capillary writing, answering questions about photos, text translation and creation in 23 large languages. Cohere, which also makes Aya vision available for free via WhatsApp, is called “an important step towards technical discoveries accessible to researchers worldwide”.
“While AI has made significant progress, there is still a big gap in how well models perform in different languages - one that is made even more noticeable in multimodal duties that include both text and pictures,” Cohere writes in one blog. “Aya Vision aims to explicitly help close this gap.”
Aya Vision comes in some flavors: Aya Vision 32b and Aya Vision 8b. The most sophisticated of the two, Aya Vision 32B, sets a “new border”, said Cohere, who exceeds 2x models in size, including Meta’s Llama-3.2 90b vision at some visual understanding points. Meanwhile, Aya Vision 8B grades better in some ratings than 10x models its size, according to Cohere.
Both models are available From the AI Dev platform hugged the face under a Creative Commons 4.0 license with The acceptable addition of Cohere’s use. Cannot be used for commercial applications.
Cohere said the Aya vision was trained using a “different pool” of English data sets, which the workshop was translated and used to create synthetic commentary. Commentary, also known as labels or labels, helping models understand and interpret data during the training process. For example, commentary on training an image identification model can take the form of markings around objects or captions mentioned in each person, place or object depicted in an image.
The use of synthetic comments by Cohere – that is, AI comments – is in the trend. Despite its potential disadvantages, opponents, including Openai, are increasingly exploiting synthetic data to train models as the real data well dries. Gartner research company estimates This 60% of the data used for AI and Analytics projects last year was synthetic.
According to Cohere, Aya Vision’s training on synthetic commentary has allowed the laboratory to use fewer resources while achieving competitive performance.
‘This shows our critical focus on efficiency and [doing] More using less calculation, “Cohere wrote on his blog.” This also allows more support for the research community, which often has more limited access to resource calculation. “
Along with Aya’s vision, Cohere also released a new reference suite, Ayavisionbench, designed to explore the skills of a “Vision-Language” model, such as recognizing the differences between two images and the conversion of screenshots to the code.
The AI industry is in the middle of what a “evaluation crisis” has called, a consequence of the dissemination of reference points that give overall scores that are slightly associated with the adequacy of the duties that most AI users are interested in. Cohere claims that Ayavisionbench is a step towards restoring it, providing a “wide and provocative” context for evaluating the cross -section and multimodal understanding of a model.
With every luck, this is true.
“[T]Data set is a powerful reference point for evaluating linguistic vision models in multilingual and actual settings “, cohere researchers wrote in a post to hugging face. “We make this assessment set available to the research community to promote multilingual multimodal evaluations.”