Math Masti Reasoning Coding Decoding

A New AI Math Startup Just Cracked 4 Previously Unsolved Problems

Five years ago, mathematicians Dawei Chen and Quentin Gendron were trying to untangle a difficult area of algebraic geometry involving differentials, elements of calculus used to measure distance ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

VentureBeat

Beyond math and coding: New RL framework helps train LLM agents for complex, real-world tasks

Researchers at the University of Science and Technology of China have developed a new reinforcement learning (RL) framework that helps train large language models (LLMs) for complex agentic tasks ...

SiliconANGLE

Harmonic AI raises $120M at $1.45B valuation to advance mathematical reasoning

Artificial intelligence for formal mathematical reasoning startup Harmonic AI Inc. announced today that it has raised $120 million in new funding on a $1.45 billion valuation. The funding is intended ...

Business Wire

Harmonic Builds Momentum Towards Mathematical Superintelligence with $120 Million Series C

Ribbit Capital Leads Round at $1.45B Valuation of Math-Based AI Venture; Emerson Collective Joins Existing Backers Including Sequoia & Kleiner Perkins PALO ALTO, Calif.--(BUSINESS WIRE)--Harmonic, the ...

Android Authority

Gemini 3 is here: Google's most advanced model promises better reasoning, coding, and more

Gemini 3 is Google’s latest AI model, offering improvements in reasoning, coding, and multimodal analysis. New features include the Gemini Agent tool and generative interfaces, such as visual layout ...

Scientific American

Can Writing Math Proofs Teach AI to Reason Like Humans?

A few months before the 2025 International Mathematical Olympiad (IMO) in July, a three-person team at OpenAI made a long bet that they could use the competition’s brutally tough problems to train an ...

The Droid Guy

Grok 4 Shows Early Strengths in Coding, Reasoning, and Visual Tasks While Struggling With Images and Memory

Grok 4 and its reasoning-focused counterpart, Grok 4 Heavy, arrived with an immediate sense of ambition, offering multimodal AI designed to handle coding, logic, and perception tasks. In the initial ...

Bleeping Computer

New ChatGPT o3-alpha model hints at coding upgrade

ChatGPT's o3 is OpenAI's best model to date because it features reasoning, and it might get even better in the next update. As spotted on X, OpenAI is testing a new "Alpha" variant of the o3 model, ...

Bleeping Computer

Grok 4 benchmark results: Tops math, ranks second in coding

Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. LMArena.ai, which is an ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results