It is said that the learning cost of GPT-3, the pronoun of the super-scale language model, which shows the possibility that it can be applied to all natural language tasks only with Few shot learning, is estimated at 4 billion KRW in Korean. About 40 million won for GPT-2. No matter how highly recyclable it is, it is a difficult amount for most companies to invest in R&D. (I just want someone to create and distribute KoGPT-3^^)
Although it does not reach the scale of the language model, the image model is also similarly attempted by Google's Big Transfer. In general, models used for transfer learning purposes were trained using a part of ImageNet (about 1 million copies), but by learning with ImageNet-21k (14 million copies) and JFT (300 million copies), various tasks will be responded to with this transfer learning only. . Models trained with JFT are not open, but ResNet models trained with ImageNet-21k are open and can be used by anyone. (Personally, I have a question why did I use Microsoft ResNet, not Google Inception?)
The method of attempting transfer learning based on a single model trained with super-scale data rather than training each model using data optimized for each task is likely to become a necessity rather than an option in the future. In fact, this isn't just bad because even companies that don't have a large learning infrastructure have opened the way to create top-notch models with GPT-3 + few shot learning, or Big Transfer + fine tuning. (Of course, Big Transfer still needs to get much bigger…)
Links are articles related to the above, with a very favorite title.
Everyone can use deep learning now