InfraLLM: A Generic Large Language Model Framework for Production-Grade Microservice Auto-Scaling in Cloud Infrastructure
DOI:
https://doi.org/10.38124/ijsrmt.v4i11.1023Abstract
Current microservice auto-scaling solutions operate in isolation, focusing on individual service metrics without considering global cloud resource availability, cross-datacenter performance, or mission-critical application priorities. This paper presents InfraLLM, a novel framework leveraging large language models to orchestrate intelligent, context-aware auto-scaling decisions across entire cloud infrastructures. Our approach integrates three key components: a distributed Collection Service for comprehensive metric aggregation, an LLM Service for predictive resource allocation, and an Execution Service for policy enforcement. Evaluation across large-scale Kubernetes deployments demonstrates up to 57.2% reduction in CPU overutilization, 51.1% improvement in resource allocation efficiency, 48% reduction in average response time, and 16× reduction in SLO violations compared to traditional per-service auto-scaling approaches. InfraLLM represents a paradigm shift from reactive, service-level scaling to proactive, infrastructure-wide resource orchestration.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research and Modern Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
PlumX Metrics takes 2–4 working days to display the details. As the paper receives citations, PlumX Metrics will update accordingly.