A Survey on Integrated Training-Inference Architectures for Large Language Models on Multi-GPU Stream Processors

Heric Tsang

doi:10.62306/7ds08r28

Authors

Heric Tsang Author

DOI:

https://doi.org/10.62306/7ds08r28

Keywords:

Large Language Models, Multi-GPU Parallelism, Tensor Parallelism, Hardware-Software Co-Design, Compilation Frameworks

Abstract

Large language models (LLMs) have revolutionized artificial intelligence, achieving remarkable performance in natural language understanding, generation, and multimodal tasks. However, their unprecedented scale—often comprising billions to trillions of parameters—imposes severe computational demands, particularly in training and inference phases, necessitating advanced parallel processing architectures on multi-GPU arrays. This survey provides a comprehensive overview of integrated training-inference (train-infer) architectures for LLMs on large-scale GPU stream processors, emphasizing multi-GPU stream processing, hypercube tensor parallelism, and hardware-software co-designed compilation frameworks. We trace the evolution of parallelism strategies, including data parallelism, pipeline parallelism, and tensor parallelism, highlighting innovations such as cross-cluster pipeline execution, adaptive NIC selection, and spatiotemporal tensor partitioning to mitigate communication overheads and memory bottlenecks in heterogeneous environments [1, 7, 8, 17]. Key challenges, including scalability in non-homogeneous networks and efficient compilation for diverse hardware, are analyzed alongside state-of-the-art solutions like MLIR-based frameworks and RISC-V accelerators [28, 33]. By synthesizing recent advancements, this survey identifies promising directions for scalable, energy-efficient LLM systems, paving the way for broader deployment in edge computing and high-performance clusters.

A Survey on Integrated Training-Inference Architectures for Large Language Models on Multi-GPU Stream Processors

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section