Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat
“If our next several years are a trillion dollars in scale, we have the supply chain to do it"
NVIDIA GB200 NVL72, AMD MI355X, Throughput Token per GPU, Latency Tok/s/user, Perf per Dollar, Tokens per Provisioned Megawatt, DeepSeek R1 670B, GPTOSS 120B, Llama3 70B
“If our next several years are a trillion dollars in scale, we have the supply chain to do it"
Almost all benchmark configurations set max_model_len or for TensorRT --max_seq_len, which controls the maximum supported length of request (inclusive of the prompt and any generated output). It is...
Can we reasonably use the InferenceMAX benchmark dataset to get a Wh per query figure?