ZeRO, ZeRO-Offload, ZeRO-Infinity

Paper reading of ZeRO.

Jun-07-2025 · 19 min · 9049 words · WITHER

DeepSpeedUlysses

Paper reading of Deepseed Ulysses.

Oct-21-2024 · 2 min · 678 words · WITHER

Efficient Large-Scale Language Model Training on GPU

Paper reading about Efficient Large-Scale Language Model Training on GPU Clusters.

Oct-05-2024 · 9 min · 4182 words · WITHER

Megatron-LM

Paper reading about Megatron-LM

Oct-02-2024 · 4 min · 1866 words · WITHER

Ring Attention Principle

This is a brief introduction to the Ring Attention Principle.

Sep-26-2024 · 6 min · 2551 words · WITHER

Comparsion of Parallelsim Metods in ViT

Paper reading of

Nov-13-2023 · 15 min · 7045 words · WITHER