A pair of degraded, not even with extensive AI know -how, say they have created an AI openly available model that can create a podcast clip similar to Google’s Notebooklm.
The purchase of synthetic speech tools is huge and increasing. Elevenlabs is one of the biggest players, but there is no lack of challengers (see Playai, Sesame, etc.). Investors believe that these tools have enormous potential. According to PitchbookThe newly established companies developing AI Tech has increased over $ 398 million in VC funding last year.
Toby Kim, one of the co -founders of Korea Nari LabsThe team behind the recently released model, said he and his co -founder began learning about AI speech three months ago. Inspired by Notebooklm, they wanted to create a model that offered more control of the voices produced and “freedom in the script”.
Kim says they used Google TPU’s Research Cloud program, which provides researchers free access to the company’s TPU AI chips to train Nari’s Dia model. Weighing in 1.6 billion parameters, the DIA can create a dialogue from a scenario, allowing users to adapt the tones of speakers and introduce deviations, coughs, laughs and other non -verbal indications.
The parameters are the models of internal variables they use to make predictions. In general, models with more parameters perform better.
Available from the AI Dev platform Hug and GithubDIA can run to most modern computers with at least 10GB VRAM. It creates a random voice unless caused by a description of an intended style, but it can also clone a person’s voice.
In the short test of Dia’s TechCrunch via Nari’s tissue demonstrationDia worked quite well, not to complete the creation of two -way talks on any matter. The quality of the voices seems competitive with other tools out there and the voice cloning function is one of the easiest this journalist has tried.
Here is a sample:
As many vocal generatorsDIA offers little way to safeguards, however. It would be insignificant to get a breach or a recording. On the pages of Dia’s work, Nari discourages the model’s abuse to imitate, deceive or otherwise participate in illegal campaigns, but the team says “not responsible” for misuse.
Nari has also not revealed what data he woke up to train DIA. It is possible that the DIA developed using copyright content – a commentator In Hacker News he notes that a sample sounds like the hosts of the NPR Podcast “Planet Money”. Copyright -protected training models are a widespread but legally doubtful practice. Some AI companies claim that their fair use protects against responsibility, while rights holders claim that fair use does not apply to training.
In any case, Kim says that Nari’s plan is to create a synthetic vocal platform with a “social appearance” over the Dia and the largest future models. Nari also intends to release a technical report on DIA and extend the support of the model in languages beyond English.
