Openai on Monday launched a new family of models called GPT-4.1. Yes, “4.1” – as if the company’s nomenclature is not enough already.
There are GPT-4.1, GPT-4.1 mini and GPT-4.1 Nano, all that Openai says “Excel” in the coding and instructions that follow. Available through OpenAi’s API, but not Chatgpt, multimodal models have a 1 million token environment window, which means they can get about 750,000 words in one touch (more than “war and peace”).
GPT-4.1 arrives as Openai opponents such as Google attempts and Anthropic Ratchet to build sophisticated programming models. The recently released Google’s Gemini 2.5 Pro, which also has a $ 1 million environment window, is highly ranked in a popular coding reference index. Similarly, the Sonet and Chinese AI starting AI of Anthropic Deepseek upgraded V3.
It is the goal of many technological giants, including Openai, to train AI encoding models capable of performing complex mechanical software work. Openai’s great ambition is to create a “software engineer” as well as Cfo Sarah Friar put it During a technology peak in London last month. The company claims that its future models will be able to plan entire end -to -end applications, handling of aspects such as quality assurance, error tests and documentation.
GPT-4.1 is one step in this direction.
“We have optimized GPT-4.1 for the use of real world based on immediate feedback to improve in areas where programmers are more interested in encoding the Frontend, doing fewer external processes, following reliable forms, clinging to structure and order, consistent use of tools and many other tools. “These improvements allow developers to build agents that are significantly better in real world mechanical software work.”
Openai claims that the full GPT-4.1 model exceeds MINI GPT-4O and GPT-4O models on coding reference indicators, including Swench. GPT-4.1 mini and nano is said to be more effective and faster at the cost of some accuracy, with Openai saying that GPT-4.1 Nano is the fastest-and the cheapest-model ever.
GPT-4.1 costs $ 2 million input and $ 8 per million tokens. GPT-4.1 mini is $ 0.40 input brands and $ 1.60 million/million output brands and GPT-4.1 NANO is 0.10/million dollars input brands and $ 0.40/million output brands.
According to Openai’s internal tests, GPT-4.1, which can produce more brands at the same time than GPT-4O (32,768 versus 16,384), recorded between 52% and 54.6% in Swench verification. (Openai noted in a blog position that some solutions to verified problems with Swech could not work in its infrastructure, hence the score range.) These items are slightly below the scores reported by Google and humanity for Gemini 2.5 Pro (63.8%) and CLAUDE 3. Correspondingly, at the same point.
In a separate evaluation, Openai explores GPT-4.1 using the video-mme, which is designed to measure the ability of a model to “understand” the contents in video. GPT-4.1 reached a 72% accuracy in the video category “Long, No Subtitles”, Openai claims.
While GPT-4.1 scores are reasonably good at benchmarks and have a more recent “cut-off”, giving it a better framework for current events (until June 2024), it is important to keep in mind that even some of the best models today are struggling with duties that will not travel. For example, many studies have appears These code -producing models often fail to be corrected and even introduce vulnerabilities and security errors.
Openai also recognizes that GPT-4.1 becomes less reliable (that is, most likely make mistakes) the more input brands it has to deal with. In one of the company’s tests, Openai-MRCR, the precision of the model was reduced from about 84% to 8,000 chips to 50% to 1 million chips. GPT-4.1 also tends to be more “literal” than the GPT-4O, says that the company sometimes requires more specific, explicit prompts.