InferenceMAX™: Open Source Inference Benchmarking

newsletter.semianalysis.com

NVIDIA GB200 NVL72, AMD MI355X, Throughput Token per GPU, Latency Tok/s/user, Perf per Dollar, Tokens per Provisioned Megawatt, DeepSeek R1 670B, GPTOSS 120B, Llama3 70B

3 pages link to this URL

Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat

Dwarkesh Podcast Dwarkesh Patel Apr 15, 2026

“If our next several years are a trillion dollars in scale, we have the supply chain to do it"

9 inbound links article en

[feature suggestion] instead of using random datasets, it should use real datasets · Issue #359 · SemiAnalysisAI/InferenceX

GitHub SemiAnalysisAI Dec 21, 2025

Almost all benchmark configurations set max_model_len or for TensorRT --max_seq_len, which controls the maximum supported length of request (inclusive of the prompt and any generated output). It is...

1 inbound link object en issue:3750796719

Per-query energy consumption of LLMs

Muxup Jan 7, 2026

Can we reasonably use the InferenceMAX benchmark dataset to get a Wh per query figure?

4 inbound links article en