Minipost: Additional figures for per-query energy consumption of LLMs
Per-query energy consumption figures based on recent Lambda benchmarks
Almost all benchmark configurations set max_model_len or for TensorRT --max_seq_len, which controls the maximum supported length of request (inclusive of the prompt and any generated output). It is...
Per-query energy consumption figures based on recent Lambda benchmarks