Distributed training framework: Horovod and RaySGD
As deep learning models grow exponentially in size, it is no longer difficult to achieve usable learning times with a single machine. GPT-2, a well-known conversational model, has about 1.5B parameters and 8 million...