Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Harness hits $5.5B valuation with $240M raise to automate AI’s ‘post-code’ divide

TIME named “Architects of AI” Person of the Year

WhatsApp’s biggest market becomes the toughest test

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    TIME named “Architects of AI” Person of the Year

    15 December 2025

    Runway releases its first global model, adds native audio to latest video model

    14 December 2025

    OpenAI hits back at Google with GPT-5.2 after ‘code red’ memo.

    14 December 2025

    Trump’s AI executive order promises ‘a rulebook’ – startups may find legal loophole instead

    13 December 2025

    Ok, so what’s up with the LinkedIn algo?

    12 December 2025
  • Apps

    WhatsApp’s biggest market becomes the toughest test

    15 December 2025

    Google debuts ‘Disco’, a Gemini-powered tool for building web apps from browser tabs

    14 December 2025

    Google’s AI testing feature for clothes now only works with a selfie

    14 December 2025

    DoorDash driver faces felony charges after allegedly spraying customers’ food

    13 December 2025

    Google Translate now lets you listen to real-time translations on your headphones

    13 December 2025
  • Crypto

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025

    Only 5 days until Disrupt 2025 sets the startup world on fire

    22 October 2025
  • Fintech

    Coinbase starts onboarding users again in India, plans to do fiat on-ramp next year

    7 December 2025

    Walmart-backed PhonePe shuts down Pincode app in yet another step back in e-commerce

    5 December 2025

    Nexus stays out of AI, keeping half of its new $700M fund for India startup

    4 December 2025

    Fintech firm Marquis notifies dozens of US banks and credit unions of data breach after ransomware attack

    3 December 2025

    Revolut hits $75 billion valuation in new capital raise

    24 November 2025
  • Hardware

    Pebble founder unveils $75 AI smart ring to record short notes with the push of a button

    10 December 2025

    Amazon’s Ring launches controversial AI-powered facial recognition feature on video doorbells

    10 December 2025

    Google’s first AI glasses are expected next year

    9 December 2025

    eSIM adoption is on the rise thanks to travel and device compatibility

    6 December 2025

    AWS re:Invent was an all-in pitch for AI. Customers may not be ready.

    5 December 2025
  • Media & Entertainment

    Understanding the Dangerous Netflix-Warner Bros. Deal

    15 December 2025

    Disney signs deal with OpenAI to allow Sora to create AI videos with its characters

    11 December 2025

    YouTube TV will launch genre-based subscription plans in 2026

    11 December 2025

    Founder of AI startup Tavus says users talk to AI Santa ‘for hours’ a day

    10 December 2025

    Spotify releases music videos in the US and Canada for Premium subscribers

    9 December 2025
  • Security

    The flaw in the photo booth manufacturer’s website exposes customers’ photos

    13 December 2025

    Home Depot exposed access to internal systems for a year, researcher says

    13 December 2025

    Security flaws in the Freedom Chat app exposed users’ phone numbers and PINs

    11 December 2025

    Petco takes down Vetco website after exposing customers’ personal information

    10 December 2025

    Petco’s security bug affected customers’ SSNs, driver’s licenses and more

    9 December 2025
  • Startups

    Harness hits $5.5B valuation with $240M raise to automate AI’s ‘post-code’ divide

    15 December 2025

    Mesa shuts down credit card that rewards cardholders for paying their mortgages

    14 December 2025

    Port raises $100M valuation from $800M round to take on Spotify’s Backstage

    14 December 2025

    Eclipse Energy’s microbes can turn dormant oil wells into hydrogen factories

    13 December 2025

    Interest in Spoor’s AI bird tracking software is soaring

    13 December 2025
  • Transportation

    TechCrunch Mobility: Rivian’s survival plan involves more than cars

    14 December 2025

    India’s Spinny lines up $160m funding to acquire GoMechanic, sources say

    14 December 2025

    Inside Rivian’s big bet on self-driving with artificial intelligence

    13 December 2025

    Zevo wants to add robotaxis to its car-sharing fleet, starting with newcomer Tensor

    13 December 2025

    Driving aboard Rivian’s fight for autonomy

    12 December 2025
  • Venture

    Runware raises $50 million in Series A to make it easier for developers to create images and videos

    12 December 2025

    Stanford’s star reporter understands Silicon Valley’s startup culture

    12 December 2025

    The market has “changed” and founders now have the power, VCs say

    11 December 2025

    Tiger Global plans cautious business future with new $2.2 billion fund

    8 December 2025

    Sources: AI-powered synthetic research startup Aaru raises Series A at $1B ‘headline’ valuation

    6 December 2025
  • Recommended Essentials
TechTost
You are at:Home»AI»Eleutherai releases Mass AI Training Data with License and Open Text Text
AI

Eleutherai releases Mass AI Training Data with License and Open Text Text

techtost.comBy techtost.com9 June 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Eleutherai Releases Mass Ai Training Data With License And Open
Share
Facebook Twitter LinkedIn Pinterest Email

Eleutherai, an AI research organization, has released what he claims to be one of the largest licensed and open -field text collections for AI models.

The data set, called Common Pile V0.1, took about two years to complete in collaboration with AI startups by the pool, the facial hug and others, along with various academic institutions. Weighing in 8 terabytes in size, the common pile v0.1 was used to train two new AI models from Eleutherai, Comma V0.1T and Comma V0.1-2T, that Eleutherai’s claims perform the same level as models developed using unlawful data.

AI companies, including OpenAi, are involved in lawsuits on AI training practices, which are based on tissue scraping – including copyright protected materials such as books and research journals – to build models of training data. While some AI companies have licensing arrangements with certain content providers, they argue that the US legal doctrine for fair use protects against responsibility in cases where they are trained in copyright -protected work.

Eleutherai argues that these lawsuits have “drastically reduced” transparency by AI companies, which the organization says it has harmed the broader AI research sector, making it more difficult to understand how their models and imperfections work.

“[Copyright] Appeals have not virtually changed data supply practices [model] Education, but have drastically reduced the transparency companies involved, “writes Stella Biderman, Eleutherai Executive Director, in A blog In hugging face early on Friday. “The researchers in some companies have talked about also reporting special lawsuits as the reason why they were unable to release the research they do in areas with a high level of data.”

The Common Pile V0.1, which can be downloaded from the AI ​​Dev and Github platform of Hugging Face and GitHub, was created in consultation with legal experts and is based on sources, including 300,000 public books digitized by the Congress Library and the Interview. Eleutherai also used a whisper, a speech model in Openai, to transcribe audio content.

Eleutherai claims that Comma V0.1-1t and Comma V0.1-2T are proof that the common pile v0.1 carefully edited to allow developers to manufacture models competitively with privately owned alternatives. According to Eleutherai, models, which are 7 billion in size parameters and were trained only in a fraction of the common v0.1 pile, competitive models such as Meta’s first Llama AI model for reference points for coding, image understanding and mathematics.

The parameters, sometimes referred to as weights, are the internal components of an AI model that guides its behavior and answers.

“In general, we believe that the common idea that non -permission leads to performance is unjustified,” Biderman writes in place. “As the amount of accessible open licensed and public data increases, we can expect the quality of models trained on open content permit to improve.”

The common pile v0.1 seems to be partly an attempt to correct Eleutherai’s historical mistakes. Years ago, the company released the pile, an open collection of training text that includes copyright protected material. AI companies have been submitted under fire – and legal pressure – to use the pile to train models.

Eleutherai is committed to releasing open sets of data more often in collaboration with research and infrastructure researchers.

Updated 9:48 am Peaceful: Biderman resident clarified In a post on X that Eleutherai contributed to the release of data and models, but that their development included many partners, including the University of Toronto, which helped lead the research.

data Eleoerai Eleutherai license mass open releases text training
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIn WWDC 25, Apple should make modifications with developers after deficiencies and AI lawsuits
Next Article US US Grocery Distributor warns the disorder after Cyberettack
bhanuprakash.cg
techtost.com
  • Website

Related Posts

TIME named “Architects of AI” Person of the Year

15 December 2025

Runway releases its first global model, adds native audio to latest video model

14 December 2025

OpenAI hits back at Google with GPT-5.2 after ‘code red’ memo.

14 December 2025
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Harness hits $5.5B valuation with $240M raise to automate AI’s ‘post-code’ divide

15 December 2025

TIME named “Architects of AI” Person of the Year

15 December 2025

WhatsApp’s biggest market becomes the toughest test

15 December 2025
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Coinbase starts onboarding users again in India, plans to do fiat on-ramp next year

7 December 2025

Walmart-backed PhonePe shuts down Pincode app in yet another step back in e-commerce

5 December 2025

Nexus stays out of AI, keeping half of its new $700M fund for India startup

4 December 2025
Startups

Harness hits $5.5B valuation with $240M raise to automate AI’s ‘post-code’ divide

Mesa shuts down credit card that rewards cardholders for paying their mortgages

Port raises $100M valuation from $800M round to take on Spotify’s Backstage

© 2025 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.