Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

All we like is soulfulness

Two Americans convicted of helping North Korea steal $5 million in fake IT worker scheme

This energy startup’s bet on 100-year-old grid technology is paying off

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Runway’s CEO Says AI Could Help Hollywood Make 50 Movies Instead of One $100 Million Blockbuster

    16 April 2026

    OpenAI updates its Agents SDK to help enterprises build safer, more capable agents

    16 April 2026

    Reid Hoffman weighs in on the ‘tokenmaxxing’ debate.

    15 April 2026

    Anthropic’s co-founder confirms the company briefed the Trump administration on Mythos

    15 April 2026

    Microsoft is working on yet another OpenClaw-like agent

    14 April 2026
  • Apps

    Canva’s AI assistant can now call on various tools to make designs for you

    16 April 2026

    AI learning app Gizmo soars with 13 million users and $22 million in investment

    16 April 2026

    Adobe’s new Firefly AI assistant can use Creative Cloud apps to complete tasks

    15 April 2026

    How the Freecash rewards app made it to the top of the app stores

    15 April 2026

    X brings voice memos back to X Chat

    14 April 2026
  • Crypto

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025
  • Fintech

    Airwallex is set to take on Stripe and the rest of the payments industry — in the physical world

    16 April 2026

    Cash app launches ‘pay later’ feature for P2P transfers

    3 April 2026

    Doss raises $55 million for AI inventory management that connects to ERP

    24 March 2026

    Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

    23 March 2026

    Amid legal turmoil, Kalshi is temporarily banned in Nevada

    20 March 2026
  • Hardware

    Amazon Unveils Slimmer Fire TV Stick HD, Opens Ember Artline TVs for Pre-Order

    16 April 2026

    Motorola is suing social platforms and creators over posts raising concerns about speech in India

    16 April 2026

    AI data center startup Fluidstack is in talks for a $1 billion round at an $18 billion valuation months after raising $7.5 billion, report says

    15 April 2026

    Amazon is ending support for older Kindle devices

    9 April 2026

    Intel signs Elon Musk’s Terafab chip project

    8 April 2026
  • Media & Entertainment

    All we like is soulfulness

    16 April 2026

    Wait, could they still break up Live Nation?

    16 April 2026

    HBO Max is coming to India through an exclusive JioHotstar deal

    15 April 2026

    YouTube Live Streams will now withhold ads during peak engagement to protect the atmosphere

    14 April 2026

    X says he’s reducing payouts to clickbait accounts

    12 April 2026
  • Security

    Two Americans convicted of helping North Korea steal $5 million in fake IT worker scheme

    16 April 2026

    Sweden blames Russian hackers for attempted ‘catastrophic’ cyberattack on thermal plant

    15 April 2026

    Adobe fixes PDF zero-day security flaw that hackers have been exploiting for months

    15 April 2026

    Someone planted backdoors in dozens of WordPress plugins used on thousands of websites

    14 April 2026

    Anodot hack leaves over a dozen compromised companies facing extortion

    14 April 2026
  • Startups

    This energy startup’s bet on 100-year-old grid technology is paying off

    16 April 2026

    Hightouch reaches $100M ARR powered by AI-powered marketing tools

    16 April 2026

    StrictlyVC San Francisco is less than a month away

    15 April 2026

    Walmart-owned Flipkart, Amazon are squeezing India’s e-commerce startups

    12 April 2026

    This founder helped build SpaceX’s most powerful rocket engine. Now he’s building a “fighter for orbit.”

    12 April 2026
  • Transportation

    Monarch Tractor collapse ends with takeover by Caterpillar

    16 April 2026

    Ford EV and chief technology officer are leaving the auto industry

    16 April 2026

    Chipmakers AMD, Arm and Qualcomm are investing in this buzzing self-driving technology startup

    15 April 2026

    London is closing in on its first robotaxi service as Waymo begins trials

    15 April 2026

    Tesla adds ‘ribs’, other stats to track how often drivers use Full Self-Driving software

    14 April 2026
  • Venture

    Anthropic rejects VC funding that values ​​it at $800B+, for now

    16 April 2026

    Financial risk management platform Pillar raises $20 million in rounds led by a16z

    15 April 2026

    Vercel CEO Guillermo Rauch signals IPO readiness as AI agents drive revenue

    14 April 2026

    Nvidia-backed SiFive hits $3.65 billion valuation for open AI chips

    11 April 2026

    How to make the Startup Battlefield Top 20 — and what each company gets regardless

    10 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»A new AI coding challenge has just published its first results – and is not beautiful
AI

A new AI coding challenge has just published its first results – and is not beautiful

techtost.comBy techtost.com24 July 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
A New Ai Coding Challenge Has Just Published Its First
Share
Facebook Twitter LinkedIn Pinterest Email

A new AI coding challenge revealed his first winner-and set a new bar for AI software engineers.

On Wednesday at 5 pm PST, the Laude Non -Profit Institute announced the first winner of the K award, a multilevel Coding Challenge started by Databricks and co -founder Andy Konwinski. The winner was a Brazilian engineer called Eduardo Rocha de Andrade, who will receive $ 50,000 for the prize. But more amazing than the victory was his final score: he won the right answers to just 7.5% of test questions.

“We are happy to have built a reference point that is really difficult,” Konwinski said. “The benchmarks should be difficult if they are going to matter,” he continued, adding: “The scores would be different if the big laboratories had entered their largest models, but this is the kind of point.

Konwinski is committed to $ 1 million in the first open source model that can rate higher than 90% in the test.

Similar to the well -known Swench system, the K Award Tests models against signs of Github issues as a test for how good models can deal with real world planning problems. However, while the Swench is based on a stable set of problems that can train models, the K award is designed as “version without SWENCH infection”, using a timed input system to protect against any special reference training. For the first round, the models are due to March 12th. The organizers of the K award then built the test using only GitHub issues highlighted after this date.

The 7.5% top score is intense in contrast to Swe Bench itself, which currently shows a top 75% top score in the easiest “verified” test and 34% of the toughest “complete” test. Konwinski is still not sure if inequality is due to the infection in the Swench or simply to challenge the collection of new issues from Github, but expects that the K will soon answer the question.

“As we have more routes of the thing, we will have a better feel,” he told TechCrunch, “because we expect people to adapt to the dynamics of competition every few months.”

TechCrunch event

Francisco
|
27-29 October 2025

It may seem like a strange place to remain, given the wide range of AI coding tools that are already available to the public – but with reference points to become very easy, many critics see projects such as the K as a necessary step towards resolving The growing AI evaluation problem.

“I am quite refreshing to build new tests for existing reference points,” says Princeton Sayash Kapoor researcher, who presented a similar idea In a recent document. “Without such experiments, we can’t really say if the issue is infection, or even just aiming at the table with man with a man in the loop.”

For Konwinski, it’s not just a better point of reference, but an open challenge for the rest of the industry. “If you hear the advertising campaign, it’s like seeing AI doctors and AI lawyers and AI software engineers, and that’s not true,” he says. “If we can’t even get more than 10% in a cooling infection, this is the control of reality for me.”

Andy Konwinski beautiful challenge Coding K prize Laude Institute published results
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleOpen Source X opponent Mastodon begins to raise funds with new in -app donation feature
Next Article Former Y Combinator, A16Z experts hold a summit for founders only for founders
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Runway’s CEO Says AI Could Help Hollywood Make 50 Movies Instead of One $100 Million Blockbuster

16 April 2026

OpenAI updates its Agents SDK to help enterprises build safer, more capable agents

16 April 2026

Reid Hoffman weighs in on the ‘tokenmaxxing’ debate.

15 April 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

All we like is soulfulness

16 April 2026

Two Americans convicted of helping North Korea steal $5 million in fake IT worker scheme

16 April 2026

This energy startup’s bet on 100-year-old grid technology is paying off

16 April 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Airwallex is set to take on Stripe and the rest of the payments industry — in the physical world

16 April 2026

Cash app launches ‘pay later’ feature for P2P transfers

3 April 2026

Doss raises $55 million for AI inventory management that connects to ERP

24 March 2026
Startups

This energy startup’s bet on 100-year-old grid technology is paying off

Hightouch reaches $100M ARR powered by AI-powered marketing tools

StrictlyVC San Francisco is less than a month away

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.