Memory optimization technology for very large models
As the number of parameters of a deep learning model increases significantly, the memory required for training is also increasing. OpenAI's GPT-2 consists of 1.5B parameters, and Google's mT5 also has a number of parameters up to 13B. Also, the number of parameters of OpenAI's GPT-3...