InfraLLM: A Generic Large Language Model Framework for Production-Grade Microservice Auto-Scaling in Cloud Infrastructure

Authors

  • Muhamed Ramees Cheriya Mukkolakkal

DOI:

https://doi.org/10.38124/ijsrmt.v4i11.1023

Abstract

Current microservice auto-scaling solutions operate in isolation, focusing on individual service metrics without considering global cloud resource availability, cross-datacenter performance, or mission-critical application priorities. This paper presents InfraLLM, a novel framework leveraging large language models to orchestrate intelligent, context-aware auto-scaling decisions across entire cloud infrastructures. Our approach integrates three key components: a distributed Collection Service for comprehensive metric aggregation, an LLM Service for predictive resource allocation, and an Execution Service for policy enforcement. Evaluation across large-scale Kubernetes deployments demonstrates up to 57.2% reduction in CPU overutilization, 51.1% improvement in resource allocation efficiency, 48% reduction in average response time, and 16× reduction in SLO violations compared to traditional per-service auto-scaling approaches. InfraLLM represents a paradigm shift from reactive, service-level scaling to proactive, infrastructure-wide resource orchestration.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-08

How to Cite

Cheriya Mukkolakkal, M. R. (2025). InfraLLM: A Generic Large Language Model Framework for Production-Grade Microservice Auto-Scaling in Cloud Infrastructure. International Journal of Scientific Research and Modern Technology, 4(11), 113–123. https://doi.org/10.38124/ijsrmt.v4i11.1023

PlumX Metrics takes 2–4 working days to display the details. As the paper receives citations, PlumX Metrics will update accordingly.