Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Kaspersky Suspects Chinese Hackers Put Backdoor in Daemon Tools in ‘Broad’ Attack

India’s first GenAI unicorn shifts to cloud services as AI model ambitions face reality

Moment Energy raises $40M to meet ‘infinite energy demand’ with EV batteries

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors

    5 May 2026

    OpenAI host Cerebras is on track for a major IPO

    5 May 2026

    In Harvard study, AI provided more accurate emergency room diagnoses than two human doctors

    4 May 2026

    ‘That’s cool’ creator says AI startup stole his art

    4 May 2026

    OpenAI announces new advanced security for ChatGPT accounts, including a partnership with Yubico

    3 May 2026
  • Apps

    Meta will use artificial intelligence to analyze height and bone structure to detect whether users are underage

    5 May 2026

    Image AI models are now driving app development, surpassing chatbot upgrades

    5 May 2026

    5 days to get 50% off a second Disrupt 2026 pass

    4 May 2026

    The Jack Dorsey-backed Vine reboot goes public

    4 May 2026

    Google Photos uses artificial intelligence to make the iconic wardrobe from ‘Clueless’ a reality.

    3 May 2026
  • Crypto

    Coinbase to lay off 14% of staff as part of broader restructuring

    5 May 2026

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025
  • Fintech

    Stripe introduces Link, a digital wallet that autonomous AI agents can also use

    1 May 2026

    Y Combinator alum Skio sells for $105 million in cash, raised only $8 million, founder says

    1 May 2026

    Amazon, Meta join the fight to end Google Pay and PhonePe’s dominance in India

    30 April 2026

    Steve Ballmer slams founder he backed, who pleaded guilty to fraud: ‘I was cheated and I feel stupid’

    25 April 2026

    Salmon raises $100 million in equity and debt to bring digital credit to unbanked Filipinos

    24 April 2026
  • Hardware

    This tiny, magnetic e-reader could keep you from doomscrolling

    4 May 2026

    Apple surprised by AI-driven demand for Macs

    1 May 2026

    As Tim Cook departs, Apple hits record sales — but chip shortage looms

    1 May 2026

    More Gemini features are coming to Google TV

    30 April 2026

    OpenAI could be building a phone with AI agents that replace apps

    28 April 2026
  • Media & Entertainment

    Netflix delays Greta Gerwig’s ‘Narnia’ for big theatrical push to 2027

    2 May 2026

    Roku’s $3 streaming service Howdy hits 1 million subscribers, per recent report

    29 April 2026

    Australia forces Big Tech companies to pay for news or face 2.25% tax.

    28 April 2026

    India’s app market is booming — but global platforms are raking in most of the profits

    23 April 2026

    YouTube extends its AI similarity detection technology to celebrities

    21 April 2026
  • Security

    Kaspersky Suspects Chinese Hackers Put Backdoor in Daemon Tools in ‘Broad’ Attack

    5 May 2026

    The US government is warning of a serious CopyFail bug affecting major versions of Linux

    5 May 2026

    Hackers are still exploiting the cPanel bug to gain control of thousands of websites

    4 May 2026

    Ubuntu services were affected by outages after the DDoS attack

    1 May 2026

    Dental software maker fixes bug that exposed patients’ medical records

    1 May 2026
  • Startups

    India’s first GenAI unicorn shifts to cloud services as AI model ambitions face reality

    5 May 2026

    FDA Approval, Fundraising and the Reality of Building Healthcare According to BioticsAI Founder

    1 May 2026

    Legal AI startup Legora hits $5.6 billion valuation, and its battle with Harvey just got hotter

    1 May 2026

    Bill Gurley, Jack Altman back startup Pursuit, which helps companies sell to the government

    30 April 2026

    BCI startup Neurable wants to license ‘mind reading’ technology to wearable consumer devices

    29 April 2026
  • Transportation

    Moment Energy raises $40M to meet ‘infinite energy demand’ with EV batteries

    5 May 2026

    Ouster’s new color lidar is coming to replace cameras

    4 May 2026

    TechCrunch Mobility: How do you ticket a robotaxi?

    4 May 2026

    Uber taps Hertz to clean, charge and fix Lucid Motors’ robotaxi

    3 May 2026

    Uber wants to turn its millions of drivers into a sensor network for self-driving companies

    2 May 2026
  • Venture

    Get 50% off a second Disrupt 2026 pass to bid more, faster

    5 May 2026

    Nicolas Sauvage bets on the boring parts of AI

    4 May 2026

    Musely secures $360 million from General Catalyst without giving up equity

    2 May 2026

    The climate tech IPO window could finally open

    30 April 2026

    Sources: Anthropic Could Raise New $50B Round at $900B Valuation

    30 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Eleutherai releases Mass AI Training Data with License and Open Text Text
AI

Eleutherai releases Mass AI Training Data with License and Open Text Text

techtost.comBy techtost.com9 June 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Eleutherai Releases Mass Ai Training Data With License And Open
Share
Facebook Twitter LinkedIn Pinterest Email

Eleutherai, an AI research organization, has released what he claims to be one of the largest licensed and open -field text collections for AI models.

The data set, called Common Pile V0.1, took about two years to complete in collaboration with AI startups by the pool, the facial hug and others, along with various academic institutions. Weighing in 8 terabytes in size, the common pile v0.1 was used to train two new AI models from Eleutherai, Comma V0.1T and Comma V0.1-2T, that Eleutherai’s claims perform the same level as models developed using unlawful data.

AI companies, including OpenAi, are involved in lawsuits on AI training practices, which are based on tissue scraping – including copyright protected materials such as books and research journals – to build models of training data. While some AI companies have licensing arrangements with certain content providers, they argue that the US legal doctrine for fair use protects against responsibility in cases where they are trained in copyright -protected work.

Eleutherai argues that these lawsuits have “drastically reduced” transparency by AI companies, which the organization says it has harmed the broader AI research sector, making it more difficult to understand how their models and imperfections work.

“[Copyright] Appeals have not virtually changed data supply practices [model] Education, but have drastically reduced the transparency companies involved, “writes Stella Biderman, Eleutherai Executive Director, in A blog In hugging face early on Friday. “The researchers in some companies have talked about also reporting special lawsuits as the reason why they were unable to release the research they do in areas with a high level of data.”

The Common Pile V0.1, which can be downloaded from the AI ​​Dev and Github platform of Hugging Face and GitHub, was created in consultation with legal experts and is based on sources, including 300,000 public books digitized by the Congress Library and the Interview. Eleutherai also used a whisper, a speech model in Openai, to transcribe audio content.

Eleutherai claims that Comma V0.1-1t and Comma V0.1-2T are proof that the common pile v0.1 carefully edited to allow developers to manufacture models competitively with privately owned alternatives. According to Eleutherai, models, which are 7 billion in size parameters and were trained only in a fraction of the common v0.1 pile, competitive models such as Meta’s first Llama AI model for reference points for coding, image understanding and mathematics.

The parameters, sometimes referred to as weights, are the internal components of an AI model that guides its behavior and answers.

“In general, we believe that the common idea that non -permission leads to performance is unjustified,” Biderman writes in place. “As the amount of accessible open licensed and public data increases, we can expect the quality of models trained on open content permit to improve.”

The common pile v0.1 seems to be partly an attempt to correct Eleutherai’s historical mistakes. Years ago, the company released the pile, an open collection of training text that includes copyright protected material. AI companies have been submitted under fire – and legal pressure – to use the pile to train models.

Eleutherai is committed to releasing open sets of data more often in collaboration with research and infrastructure researchers.

Updated 9:48 am Peaceful: Biderman resident clarified In a post on X that Eleutherai contributed to the release of data and models, but that their development included many partners, including the University of Toronto, which helped lead the research.

data Eleoerai Eleutherai license mass open releases text training
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIn WWDC 25, Apple should make modifications with developers after deficiencies and AI lawsuits
Next Article US US Grocery Distributor warns the disorder after Cyberettack
bhanuprakash.cg
techtost.com
  • Website

Related Posts

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors

5 May 2026

OpenAI host Cerebras is on track for a major IPO

5 May 2026

In Harvard study, AI provided more accurate emergency room diagnoses than two human doctors

4 May 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Kaspersky Suspects Chinese Hackers Put Backdoor in Daemon Tools in ‘Broad’ Attack

5 May 2026

India’s first GenAI unicorn shifts to cloud services as AI model ambitions face reality

5 May 2026

Moment Energy raises $40M to meet ‘infinite energy demand’ with EV batteries

5 May 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Stripe introduces Link, a digital wallet that autonomous AI agents can also use

1 May 2026

Y Combinator alum Skio sells for $105 million in cash, raised only $8 million, founder says

1 May 2026

Amazon, Meta join the fight to end Google Pay and PhonePe’s dominance in India

30 April 2026
Startups

India’s first GenAI unicorn shifts to cloud services as AI model ambitions face reality

FDA Approval, Fundraising and the Reality of Building Healthcare According to BioticsAI Founder

Legal AI startup Legora hits $5.6 billion valuation, and its battle with Harvey just got hotter

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.