
AIBench: Benchmarking 7 LLMs on Real-World Code Generation
We evaluated seven frontier and budget LLMs on 10 graded code generation tasks across React and Rust. Quality still costs money — but the gap is narrowing faster than the price sheet suggests.

