Multi-Objective Scheduling for Large Language Model Inference with Prompt-Level Cost Prediction and SLO Awareness

Jiajing Liao; Feng Chang; Yihan Xue; Tianjian Xia; Zeyu Huang; Yuxiao Wang

doi:10.20944/preprints202604.1399.v1

Submitted:

18 April 2026

Posted:

20 April 2026

You are already at the latest version

Abstract

Large language model (LLM) inference in multi-tenant clouds is becoming an increasingly important contributor to data-center carbon emissions, yet existing carbon-aware scheduling techniques target long-running training jobs and are ill-suited for the short, bursty, SLO-sensitive nature of online serving. We propose CAPS (Carbon–Aware Prompt Scheduling), an online bi-objective scheduler that jointly optimizes goodput and per-request carbon cost for multi-tenant LLM inference. CAPS first employs a lightweight prompt complexity predictor to estimate token generation cost and latency risk for each incoming request. It then combines real-time grid carbon intensity, GPU energy profiles, and per-tenant SLO tiers to route each request to one of three execution pools: a low-latency pool, a low-carbon pool, or a delay-tolerant batch pool. A composite reward function balances goodput, carbon emissions, and SLO violation rate. In trace-driven simulations using public conversation traces and regional carbon intensity data, CAPS reduces average carbon emissions per 1K generated tokens by 26.8% compared to round-robin scheduling while achieving an SLO attainment rate that matches or exceeds a dedicated SLO-aware baseline.

Keywords:

LLM inference

;

carbon-aware scheduling

;

multi-tenant cloud

;

prompt complexity

;

SLO

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Multi-Objective Scheduling for Large Language Model Inference with Prompt-Level Cost Prediction and SLO Awareness

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe