Get in Touch

Course Outline

Introduction

  • Overview of challenges in scaling deep learning
  • Overview of DeepSpeed and its key features
  • DeepSpeed compared to other distributed deep learning libraries

Getting Started

  • Setting up the development environment
  • Installing PyTorch and DeepSpeed
  • Configuring DeepSpeed for distributed training

DeepSpeed Optimisation Features

  • DeepSpeed training pipeline
  • ZeRO (memory optimisation)
  • Activation checkpointing
  • Gradient checkpointing
  • Pipeline parallelism

Scaling Models with DeepSpeed

  • Basic scaling techniques using DeepSpeed
  • Advanced scaling methodologies
  • Performance considerations and best practices
  • Debugging and troubleshooting techniques

Advanced DeepSpeed Topics

  • Advanced optimisation techniques
  • Using DeepSpeed with mixed precision training
  • Deploying DeepSpeed across different hardware (e.g., GPUs, TPUs)
  • DeepSpeed with multiple training nodes

Integrating DeepSpeed with PyTorch

  • Integrating DeepSpeed into PyTorch workflows
  • Using DeepSpeed with PyTorch Lightning

Troubleshooting

  • Debugging common DeepSpeed issues
  • Monitoring and logging

Summary and Next Steps

  • Recap of key concepts and features
  • Best practices for deploying DeepSpeed in production
  • Further resources to deepen your understanding of DeepSpeed

Requirements

  • Intermediate understanding of deep learning principles
  • Experience with PyTorch or similar deep learning frameworks
  • Familiarity with Python programming

Audience

  • Data scientists
  • Machine learning engineers
  • Developers
 21 Hours

Number of participants


Price per participant

Testimonials (3)

Provisional Upcoming Courses (Require 5+ participants)

Related Categories