Pfeife: Automatic Pipeline Parallelism for PyTorch

Ho Young Jhoo, Chung-Kil Hur, Nuno P. Lopes

Abstract:

The memory requirements of machine learning (ML) models has been growing quickly. However, the memory capacity of GPUs has not kept pace. Despite significant research on reducing the memory usage of ML models, the larger models do not fit in a single device. A popular solution to the memory capacity issue is to use multiple devices in parallel. In this paper, we focus on a particular form of parallelism called pipelining, as it offers a good balance between cost and performance for many ML models. We present Pfeife, the first tool that integrates with PyTorch to provide automatic pipelining of ML models. Pfeife intercepts the execution of models and parallelizes them transparently, requiring no manual work. We show that Pfeife can execute large models that would otherwise not run due to not fitting in a single device. Moreover, Pfeife can pipeline non-sequential models such as Stable Diffusion, which are not supported by existing pipelining parallelism tools. Pfeife outperforms state-of-the-art tools by up to 22%.

Published:

H. Y. Jhoo, C. Hur, N. P. Lopes. Pfeife: Automatic Pipeline Parallelism for PyTorch. In Proc. of the 42nd International Conference on Machine Learning (ICML), July 2025.

Download:

Bibtex:

@inproceedings{pfeife-icml25,
  title =	{Pfeife: Automatic Pipeline Parallelism for {PyTorch}},
  author =	{Ho Young Jhoo and Chung-Kil Hur and Nuno P. Lopes},
  booktitle =	{Proc. of the 42nd International Conference on Machine Learning (ICML)},
  month =	jul,
  year =	2025
}

<-- Return