Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

YouTube Live Streams will now withhold ads during peak engagement to protect the atmosphere

Someone planted backdoors in dozens of WordPress plugins used on thousands of websites

Tesla adds ‘ribs’, other stats to track how often drivers use Full Self-Driving software

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Microsoft is working on yet another OpenClaw-like agent

    14 April 2026

    OpenAI has acquired AI personal finance startup Hiro

    14 April 2026

    Largest orbital computing cluster is open for business

    13 April 2026

    Anthropic restricts Mythos traffic to protect the Internet — or does Anthropic?

    12 April 2026

    Sam Altman responds to ‘inflammatory’ New Yorker article after his home was attacked

    12 April 2026
  • Apps

    X brings voice memos back to X Chat

    14 April 2026

    Avec’s Tinder-style email app lets you swipe through your inbox

    14 April 2026

    Roblox introduces ‘Kids’ and ‘Select’ accounts for age-appropriate access to games and chats

    13 April 2026

    You can now edit your comments on Instagram

    13 April 2026

    Meta AI app climbs to No. 5 in App Store after release of Muse Spark

    12 April 2026
  • Crypto

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025
  • Fintech

    Cash app launches ‘pay later’ feature for P2P transfers

    3 April 2026

    Doss raises $55 million for AI inventory management that connects to ERP

    24 March 2026

    Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

    23 March 2026

    Amid legal turmoil, Kalshi is temporarily banned in Nevada

    20 March 2026

    Nominations for the Startup Battlefield 200 are still open

    19 March 2026
  • Hardware

    Amazon is ending support for older Kindle devices

    9 April 2026

    Intel signs Elon Musk’s Terafab chip project

    8 April 2026

    The Xiaomi 17 Ultra has some impressive extras that make taking photos really fun

    6 April 2026

    In Japan, the robot doesn’t come for your job. fills the one no one wants

    6 April 2026

    Peter Thiel’s big bet on solar-powered cow collars

    5 April 2026
  • Media & Entertainment

    YouTube Live Streams will now withhold ads during peak engagement to protect the atmosphere

    14 April 2026

    X says he’s reducing payouts to clickbait accounts

    12 April 2026

    TechCrunch is headed to Tokyo — and it’s bringing the Startup Battlefield with it

    10 April 2026

    Spotify now allows everyone to turn off videos in its app

    9 April 2026

    As YouTube expands into TV, it sees more interactive video across all formats

    9 April 2026
  • Security

    Someone planted backdoors in dozens of WordPress plugins used on thousands of websites

    14 April 2026

    Anodot hack leaves over a dozen compromised companies facing extortion

    14 April 2026

    Booking.com confirms that hackers accessed customer data

    13 April 2026

    Convicted spyware maker Bryan Fleming avoids jail time on conviction

    12 April 2026

    The Trump administration plans to cut the cybersecurity agency’s budget by $700 million

    11 April 2026
  • Startups

    Walmart-owned Flipkart, Amazon are squeezing India’s e-commerce startups

    12 April 2026

    This founder helped build SpaceX’s most powerful rocket engine. Now he’s building a “fighter for orbit.”

    12 April 2026

    Sierra’s Bret Taylor says the era of button-clicking is over

    11 April 2026

    After the data breach, the $10 billion startup Mercor is one month old

    11 April 2026

    What founders can learn from Anjuna’s layoffs and recovery

    10 April 2026
  • Transportation

    Tesla adds ‘ribs’, other stats to track how often drivers use Full Self-Driving software

    14 April 2026

    Uber and Nuro begin testing premium robotaxi service in San Francisco

    14 April 2026

    Slate Auto raises $650 million to fund its affordable EV truck plans

    13 April 2026

    TechCrunch Mobility: Who’s chasing all the self-driving talent?

    13 April 2026

    Slate Auto: Everything you need to know about the Bezos-backed EV startup

    12 April 2026
  • Venture

    Vercel CEO Guillermo Rauch signals IPO readiness as AI agents drive revenue

    14 April 2026

    Nvidia-backed SiFive hits $3.65 billion valuation for open AI chips

    11 April 2026

    How to make the Startup Battlefield Top 20 — and what each company gets regardless

    10 April 2026

    Collide Capital Raises $95M to Back Future-of-Work Fintech Startups

    9 April 2026

    VC Eclipse has a new $1.3 billion fund to back — and build — “natural AI” startups

    8 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Eleutherai releases Mass AI Training Data with License and Open Text Text
AI

Eleutherai releases Mass AI Training Data with License and Open Text Text

techtost.comBy techtost.com9 June 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Eleutherai Releases Mass Ai Training Data With License And Open
Share
Facebook Twitter LinkedIn Pinterest Email

Eleutherai, an AI research organization, has released what he claims to be one of the largest licensed and open -field text collections for AI models.

The data set, called Common Pile V0.1, took about two years to complete in collaboration with AI startups by the pool, the facial hug and others, along with various academic institutions. Weighing in 8 terabytes in size, the common pile v0.1 was used to train two new AI models from Eleutherai, Comma V0.1T and Comma V0.1-2T, that Eleutherai’s claims perform the same level as models developed using unlawful data.

AI companies, including OpenAi, are involved in lawsuits on AI training practices, which are based on tissue scraping – including copyright protected materials such as books and research journals – to build models of training data. While some AI companies have licensing arrangements with certain content providers, they argue that the US legal doctrine for fair use protects against responsibility in cases where they are trained in copyright -protected work.

Eleutherai argues that these lawsuits have “drastically reduced” transparency by AI companies, which the organization says it has harmed the broader AI research sector, making it more difficult to understand how their models and imperfections work.

“[Copyright] Appeals have not virtually changed data supply practices [model] Education, but have drastically reduced the transparency companies involved, “writes Stella Biderman, Eleutherai Executive Director, in A blog In hugging face early on Friday. “The researchers in some companies have talked about also reporting special lawsuits as the reason why they were unable to release the research they do in areas with a high level of data.”

The Common Pile V0.1, which can be downloaded from the AI ​​Dev and Github platform of Hugging Face and GitHub, was created in consultation with legal experts and is based on sources, including 300,000 public books digitized by the Congress Library and the Interview. Eleutherai also used a whisper, a speech model in Openai, to transcribe audio content.

Eleutherai claims that Comma V0.1-1t and Comma V0.1-2T are proof that the common pile v0.1 carefully edited to allow developers to manufacture models competitively with privately owned alternatives. According to Eleutherai, models, which are 7 billion in size parameters and were trained only in a fraction of the common v0.1 pile, competitive models such as Meta’s first Llama AI model for reference points for coding, image understanding and mathematics.

The parameters, sometimes referred to as weights, are the internal components of an AI model that guides its behavior and answers.

“In general, we believe that the common idea that non -permission leads to performance is unjustified,” Biderman writes in place. “As the amount of accessible open licensed and public data increases, we can expect the quality of models trained on open content permit to improve.”

The common pile v0.1 seems to be partly an attempt to correct Eleutherai’s historical mistakes. Years ago, the company released the pile, an open collection of training text that includes copyright protected material. AI companies have been submitted under fire – and legal pressure – to use the pile to train models.

Eleutherai is committed to releasing open sets of data more often in collaboration with research and infrastructure researchers.

Updated 9:48 am Peaceful: Biderman resident clarified In a post on X that Eleutherai contributed to the release of data and models, but that their development included many partners, including the University of Toronto, which helped lead the research.

data Eleoerai Eleutherai license mass open releases text training
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIn WWDC 25, Apple should make modifications with developers after deficiencies and AI lawsuits
Next Article US US Grocery Distributor warns the disorder after Cyberettack
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Microsoft is working on yet another OpenClaw-like agent

14 April 2026

OpenAI has acquired AI personal finance startup Hiro

14 April 2026

Booking.com confirms that hackers accessed customer data

13 April 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

YouTube Live Streams will now withhold ads during peak engagement to protect the atmosphere

14 April 2026

Someone planted backdoors in dozens of WordPress plugins used on thousands of websites

14 April 2026

Tesla adds ‘ribs’, other stats to track how often drivers use Full Self-Driving software

14 April 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Cash app launches ‘pay later’ feature for P2P transfers

3 April 2026

Doss raises $55 million for AI inventory management that connects to ERP

24 March 2026

Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

23 March 2026
Startups

Walmart-owned Flipkart, Amazon are squeezing India’s e-commerce startups

This founder helped build SpaceX’s most powerful rocket engine. Now he’s building a “fighter for orbit.”

Sierra’s Bret Taylor says the era of button-clicking is over

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.