gitmyhub

nvidia-resiliency-ext

Python ★ 299 updated 5d ago

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.

No plain-English explanation yet — one is being written right now. Check back in a minute.