{"id":61023,"date":"2021-10-24T14:24:59","date_gmt":"2021-10-24T05:24:59","guid":{"rendered":"https:\/\/smilegate.ai\/?p=61023"},"modified":"2021-10-24T14:26:29","modified_gmt":"2021-10-24T05:26:29","slug":"deep-learning-optimized-learning","status":"publish","type":"post","link":"https:\/\/smilegate.ai\/cn\/2021\/10\/24\/deep-learning-optimized-learning\/","title":{"rendered":"Deep learning? Optimized learning!"},"content":{"rendered":"
[\u524d\u7814\u7a76\u7ec4\u91d1\u6210yun]<\/p>\n\n\n\n
\uc790\uc5f0\uc5b4\ucc98\ub9ac \ubd84\uc57c\uc5d0\uc11c pre-trained language model (PLM) \uc804\ub7b5\uc774 \ud6cc\ub96d\ud55c \uc131\uacf5\uc744 \uac70\ub450\uc790, \ub354 \ub9ce\uc740 \ub370\uc774\ud130\ub97c \uc774\uc6a9\ud574 \ub354 \ud070 PLM\uc744 \uac1c\ubc1c\ud558\ub294 \uac83\uc774 \ud558\ub098\uc758 \ud2b8\ub79c\ub4dc\ub85c \uc790\ub9ac\uc7a1\uc558\uc2b5\ub2c8\ub2e4.
\uadf8\ub9ac\uace0 \uc5bc\ub9c8 \uc804, NVIDIA\uc5d0\uc11c\ub294 GPT-3\uc758 \ubb34\ub824 4\ubc30 \uac00\uae4c\uc774 \ub418\ub294 530B\uac1c\uc758 \ud30c\ub77c\ubbf8\ud130\uc9dc\ub9ac \ubaa8\ub378\uc744 \uacf5\uac1c\ud588\uc2b5\ub2c8\ub2e4.
\uc774 \ubaa8\ub378\uc740 \uae30\uc874\uc758 Megatron-LM \ubaa8\ub378\uacfc Turing-NLG \ubaa8\ub378\uc744 \uacb0\ud569\ud558\uc5ec, “Megatron-Turing NLG” (MT-NLG) \ub77c\ub294 \uc774\ub984\uc73c\ub85c \uba85\uba85\ub410\uc2b5\ub2c8\ub2e4.
\ubaa8\ub378\uc758 \ud559\uc2b5\uc740 DGX A100 80G \uc11c\ubc84 560\ub300\ub97c \ud558\ub098\uc758 \ud074\ub7ec\uc2a4\ud130\ub85c \ubb36\uc5b4\uc11c \ud559\uc2b5\ud588\ub2e4\uace0 \ud569\ub2c8\ub2e4. \uc815\ub9d0 NVIDIA\uac00 \uc544\ub2c8\uace0\uc11c\ub294 \uc2e4\ud5d8\ub3c4 \ubd88\uac00\ub2a5\ud560 \uc815\ub3c4\uc758 \ubaa8\ub378\uc774\ub124\uc694!<\/p>\n\n\n\n
\ucd1d 105\uac1c\uc758 transformer layer\ub85c \uad6c\uc131\ub418\uc5b4 \uc788\uace0, zero-, one- \uadf8\ub9ac\uace0 few-shot learning task\uc5d0\uc11c \ucd5c\uace0\uc758 \uc131\ub2a5\uc744 \ubcf4\uc600\ub2e4\uace0 \ud569\ub2c8\ub2e4.<\/p>\n\n\n\n
\uc774\ub807\uac8c \ud070 \ubaa8\ub378\uc744 \ud559\uc2b5\ud558\ub294\ub370\ub294 \ub2e8\uc21c\ud788 \ub9ce\uc740 \ub3c8, \ub9ce\uc740 \ub370\uc774\ud130, \ub9ce\uc740 GPU\ub9cc\uc744 \ud544\uc694\ub85c \ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4.
\uc544\ub798\uc758 \ubb38\uc81c\ub4e4 \ub54c\ubb38\uc778\ub370\uc694, \uc6b0\uc120 (1) GPU\uc758 \uba54\ubaa8\ub9ac\ub294 \ud55c\uc815\ub418\uc5b4 \uc788\uace0, \uc5c4\uccad \ud070 hyper parameter\ub97c \ubaa8\ub450 \ud559\uc2b5\ud558\ub294\ub370\ub294 \uc808\ub300 \ucda9\ubd84\ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4. (2) \ud559\uc2b5 \uc54c\uace0\ub9ac\uc998 \ucd5c\uc801\ud654, \ub370\uc774\ud130 \ucc98\ub9ac \ubc29\ubc95, \uc18c\ud504\ud2b8\uc6e8\uc5b4-\ud558\ub4dc\uc6e8\uc5b4 \ucd5c\uc801\ud654\ub97c \ubaa8\ub450 \uace0\ub824\ud558\uc9c0 \uc54a\uc73c\uba74, \ube44\ud604\uc2e4\uc801\uc73c\ub85c \ud559\uc2b5\uc2dc\uac04\uc774 \uc624\ub798 \uac78\ub9b4 \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n
\uc774\ubc88\uc5d0 \uacf5\uac1c\ub41c MT-NLG\uc758 \uacbd\uc6b0, Microsoft\uc640 NVIDIA\uac00 \ud611\uc5c5\ud558\uc5ec \uc804\ub840\uc5c6\ub294 \ubaa8\ub378 \ud559\uc2b5 \ud6a8\uc728\uc744 \ub2ec\uc131\ud574\uc11c \ub9cc\ub4e4\uc5b4\ub0bc \uc218 \uc788\uc5c8\ub2e4\uace0 \ud569\ub2c8\ub2e4 \ud83d\ude42
\uc989, \ud558\ub4dc\uc6e8\uc5b4\uc640 \uc18c\ud504\ud2b8\uc6e8\uc5b4\uc758 \uc2dc\uc2a4\ud15c \uad6c\uc870\uae4c\uc9c0 \ubaa8\ub450 \ud30c\uc545\ud558\uace0 \uc788\uc5b4\uc57c \ud6a8\uc728\uc801\uc778 \ud559\uc2b5\uc774 \uac00\ub2a5\ud558\ub2e4\ub294 \uac70\uaca0\uc8e0?
\ub354 \uc790\uc138\ud55c \uc774\uc57c\uae30\ub294 (\u5173\u8054<\/a>) \uc5d0\uc11c \ud655\uc778\ud574\ubcf4\uc2e4 \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n
\n\n\n\n