{"id":64374,"date":"2023-12-15T12:54:09","date_gmt":"2023-12-15T03:54:09","guid":{"rendered":"https:\/\/smilegate.ai\/?p=64374"},"modified":"2023-12-15T13:03:10","modified_gmt":"2023-12-15T04:03:10","slug":"mixtral-8x7b-%ec%9d%b8%ea%b3%b5%ec%a7%80%eb%8a%a5%eb%8f%84-%ed%98%91%ec%97%85%ec%9d%b4-%eb%8c%80%ec%84%b8","status":"publish","type":"post","link":"https:\/\/smilegate.ai\/cn\/2023\/12\/15\/mixtral-8x7b-%ec%9d%b8%ea%b3%b5%ec%a7%80%eb%8a%a5%eb%8f%84-%ed%98%91%ec%97%85%ec%9d%b4-%eb%8c%80%ec%84%b8\/","title":{"rendered":"Mixtral 8x7B, \uc778\uacf5\uc9c0\ub2a5\ub3c4 \ud611\uc5c5\uc774 \ub300\uc138!"},"content":{"rendered":"
\n
\"\"
\ucd9c\ucc98: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER
https:\/\/arxiv.org\/pdf\/1701.06538.pdf<\/figcaption><\/figure><\/div>\n\n\n

[\uc120\ud589AI\uae30\uc220\ud300 \uc804\ub3d9\uc900]<\/p>\n\n\n\n

\uc2a4\ud0c0\ud2b8\uc5c5 \ud68c\uc0ac\uc778 Mistral AI \uc5d0\uc11c \uc9c0\ub09c 8\uc77c\uc5d0 Mixtral 8x7B \ubaa8\ub378\uc744 \uc624\ud508\uc18c\uc2a4\ub85c \ucd9c\uc2dc\ud558\uc600\uc2b5\ub2c8\ub2e4. \uc9c0\ub09c 9\uc6d4\uc5d0 \ucd9c\uc2dc\ud55c Mistral 7B \ubaa8\ub378 \uae30\ubc18\uc73c\ub85c \ud604\uc874 \uc5b8\uc5b4 \uc0dd\uc131 \ubd84\uc57c\uc5d0\uc11c \ucd5c\uace0 \uc131\ub2a5\uc778 GPT-4\uc5d0\uc11c \ucc44\ud0dd\uc911\uc778 “MoE” \ubc29\uc2dd\uc744 \uc0ac\uc6a9\ud558\uc5ec \ud30c\ub77c\ubbf8\ud130 \uc218\uac00 \ub354 \ub9ce\uc740 Llama 2 70B, GPT3.5 \ubaa8\ub378\ubcf4\ub2e4 \uc790\uc5f0\uc5b4 \ubca4\uce58\ub9c8\ud06c \uc131\ub2a5\uc774 \ub6f0\uc5b4\ub098\uace0, \ucd94\ub860 \uc18d\ub3c4\ub3c4 \ube60\ub974\ub2e4\uace0 \uc124\uba85\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4. \uadf8\ub9ac\uace0 Apache 2.0<\/a> \ub77c\uc774\uc120\uc2a4\ub85c “open weights”\ub97c \ud45c\ubc29\ud558\uba74\uc11c \uc624\ud508\uc18c\uc2a4 \uc0dd\ud0dc\uacc4\uc5d0 \ud070 \ub3c4\uc6c0\uc774 \ub418\uace0 \uc788\uc2b5\ub2c8\ub2e4!<\/p>\n\n\n\n

Mixtral<\/h2>\n\n\n\n

\uc774\ubc88\uc5d0 \ubc1c\ud45c\ud55c Mixtral\uc740 \ub2e4\uc74c\uc758 \ud2b9\uc9d5\ub4e4\uc774 \uc788\ub2e4\uace0 \ud569\ub2c8\ub2e4.<\/p>\n\n\n\n

    \n
  • 32k, 3\ub9cc 2\ucc9c\uac1c\uc758 \ud1a0\ud070\uc744 \ucee8\ud14d\uc2a4\ud2b8\ub85c \ucc98\ub9ac\ud560 \uc218 \uc788\uace0<\/li>\n\n\n\n
  • \uc601\uc5b4, \ubd88\uc5b4, \uc774\ud0c8\ub9ac\uc548, \ub3c5\uc77c\uc5b4, \uc2a4\ud398\uc778\uc5b4\ub4f1\uc758 \ub2e4\uad6d\uc5b4\uac00 \uac00\ub2a5\ud558\uace0<\/li>\n\n\n\n
  • \ucf54\ub4dc \uc0dd\uc131\uc5d0 \ub6f0\uc5b4\ub09c \uc131\ub2a5<\/li>\n\n\n\n
  • instruction \ud29c\ub2dd\uc744 \ud1b5\ud574 \uc5b8\uc5b4 \ubaa8\ub378\uc758 \uc131\ub2a5\uc744 \uce21\uc815\ud558\ub294 MT-Bench\uc5d0\uc11c 8.3\uc810 \ub2ec\uc131<\/li>\n<\/ul>\n\n\n
    \n
    \"\"
    GPT-4\ub97c \uc774\uc6a9\ud55c multi-turn \ub300\ud654\ud3c9\uac00 benchmark\uc778 MT-Bench \uc810\uc218 \uc21c\uc704
    https:\/\/huggingface.co\/spaces\/lmsys\/chatbot-arena-leaderboard<\/figcaption><\/figure><\/div>\n\n\n

    Mixtral\uc740 sparse mixture-of-expoerts \ub124\ud2b8\uc6cc\ud06c\ub97c \uac00\uc9c0\uace0 \uc788\uc2b5\ub2c8\ub2e4. 8\uac1c\uc758 \uadf8\ub8f9\ud654\ub41c \ud30c\ub77c\ubbf8\ud130\uc5d0 feedforward\ub418\ub294 decoder-only \ubaa8\ub378\uc778\ub370 \uc774 \ubd80\ubd84\uc740 \ub2e4\uc74c \ub2e8\ub77d\uc5d0\uc11c \uc790\uc138\ud788 \uc0b4\ud3b4\ubcf4\uaca0\uc2b5\ub2c8\ub2e4. Mixtral\uc740 470\uc5b5\uac1c \uc815\ub3c4\uc758 \ud30c\ub77c\ubbf8\ud130\ub97c \uac00\uc9c0\ub294 \ubaa8\ub378\uc778\ub370 \uc774 \ubc29\ubc95\uc744 \ud1b5\ud574\uc11c 130\uc5b5\uac1c \ud30c\ub77c\ubbf8\ud130\ub9cc \uc0ac\uc6a9\ud558\ub294 \uac83\ucc98\ub7fc \uc791\ub3d9\ud569\ub2c8\ub2e4. \uadf8\ub798\uc11c \ucd94\ub860 \uc18d\ub3c4\ub3c4 \ube60\ub974\uace0 \ud6a8\uc728\uc801\uc73c\ub85c \uc0ac\uc6a9\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n

    MoE(Mixture of Experts)<\/h2>\n\n\n\n

    AI \ubaa8\ub378\uc758 \ud06c\uae30\ub97c \ud0a4\uc6b8\uc218\ub85d(\ud30c\ub77c\ubbf8\ud130\uac00 \ub9ce\uc744\uc218\ub85d) \uc131\ub2a5\uc774 \uc88b\uc544\uc9d1\ub2c8\ub2e4. MoE\ub294 \ud55c\uc815\ub41c \ub9ac\uc18c\uc2a4\uc5d0\uc11c \ubaa8\ub378\uc758 \uc0ac\uc774\uc988\ub97c \ud0a4\uc6b0\ub294 \ubc29\ubc95\uc73c\ub85c \uc81c\uc548\ub418\uc5c8\uc2b5\ub2c8\ub2e4. transformer \uad6c\uc870\uc5d0\uc11c MoE\ub294 \uc544\ub798 \uadf8\ub9bc\uc758 \ud30c\ub780\uc0c9 \ubd80\ubd84\uc758 \ub808\uc774\uc5b4\ub85c \uad6c\uc131\ub418\uc5b4 \uc788\uc2b5\ub2c8\ub2e4. \uac01\uac01\uc758 FFN(feed-forward network)\uc774 “experts”(\uc804\ubb38\uac00)\uc785\ub2c8\ub2e4. Router\ub97c \ud1b5\ud574 \uc785\ub825\uc774 \uc5b4\ub5a4 expert\ub85c \ubcf4\ub0b4\uc9c8\uc9c0 \uacb0\uc815\ub429\ub2c8\ub2e4. \ud658\uc790\uac00 \uc624\uba74 \uc99d\uc0c1\uc744 \ud30c\uc545\ud558\uace0 \uadf8 \uc99d\uc0c1\uc744 \ubd10\uc904 \uc218 \uc788\ub294 \uc804\ubb38\uc758\uc5d0\uac8c \ubcf4\ub0b4\uc8fc\ub294 \ub290\ub08c\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n

    MoE\ub294 \ud559\uc2b5\uc744 \ud6a8\uc728\uc801\uc73c\ub85c \ud558\uace0 \ucd94\ub860\uc744 \ube44\uc2b7\ud55c \ud30c\ub77c\ubbf8\ud130\uc218\uc758 \ubaa8\ub378\ubcf4\ub2e4 \ube60\ub974\uac8c \ud560 \uc218 \uc788\uc9c0\ub9cc, fine-tuning \ub2e8\uacc4\uc5d0\uc11c \uc77c\ubc18\ud654\ud558\ub294\ub370 \ud55c\uacc4\uac00 \uc788\ub2e4\uace0 \ud569\ub2c8\ub2e4. \ub9ce\uc740 experts, FFN \ud30c\ub77c\ubbf8\ud130\ub97c \uac00\uc9c0\uace0 \uc788\uc9c0\ub9cc \ucd94\ub860\uc5d0\uc11c\ub294 \ud2b9\uc815 FFN\ub9cc \uacc4\uc0b0\ud568\uc73c\ub85c\uc368 \ucd94\ub860 \uc18d\ub3c4\uac00 \ube60\ub974\uc9c0\ub9cc, \ubaa8\ub378\uc758 \ubaa8\ub4e0 \ud30c\ub77c\ubbf8\ud130\ub97c RAM\uc5d0 \uc62c\ub824\uc11c \uc0ac\uc6a9\ud574\uc57c \ud569\ub2c8\ub2e4. \uadf8\ub798\uc11c \uba54\ubaa8\ub9ac \uc694\uad6c\ub7c9\uc774 \ub192\uc2b5\ub2c8\ub2e4. \uc774\ubc88\uc5d0 \ubc1c\ud45c\ud55c Mixtral 8x7B \ubaa8\ub378 \uac19\uc740 \uacbd\uc6b0\ub3c4 470\uc5b5\uac1c\uc758 \ud30c\ub77c\ubbf8\ud130 \uc218\ub97c \uac00\uc9c4 \ubaa8\ub378\ub85c, \uc774\ub97c \uc218\uc6a9\ud558\ub824\uba74 \ub9ce\uc740 VRAM\uc774 \ud544\uc694\ud569\ub2c8\ub2e4. \uc774\ub984\uc5d0\uc11c \ub098\uc640\uc788\ub294 \uac83\uc73c\ub85c\ub294 7Bx8\uac1c\uc778\ub370 56B\uc774 \uc544\ub2cc 47B\uc778 \uc774\uc720\ub294 FFN layer\uac00 \uc5ec\ub7ec\uac1c\ub85c \uad6c\uc131\ub418\uc5b4 \uc788\uace0 \ub098\uba38\uc9c0\ub294 \uacf5\uc720\ud558\ub294 \ud30c\ub77c\ubbf8\ud130\uc774\uae30 \ub54c\ubb38\uc785\ub2c8\ub2e4.<\/p>\n\n\n

    \n
    \"\"
    Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https:\/\/arxiv.org\/pdf\/2101.03961.pdf)<\/figcaption><\/figure><\/div>\n\n\n

    Performance<\/h2>\n\n\n\n

    Mixtral\uacfc Llama2, GPT3.5 \ubaa8\ub378\uacfc\uc758 \uc790\uc5f0\uc5b4 \ubca4\uce58\ub9c8\ud06c\uc5d0\uc11c\uc758 \uc131\ub2a5 \ube44\uad50\uc785\ub2c8\ub2e4. \ubaa8\ub378 \ud06c\uae30\uac00 \ub354 \ud070 Llama2 70B \ubaa8\ub378\ubfd0 \uc544\ub2c8\ub77c GPT3.5\uc640 \ube44\uad50\ud574\ub3c4 \uac70\uc758 \ub300\ubd80\ubd84\uc5d0\uc11c \uc131\ub2a5 \uc6b0\uc704\ub97c \ubcf4\uc774\uace0 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n

    \n
    \"\"
    https:\/\/mistral.ai\/news\/mixtral-of-experts\/<\/figcaption><\/figure><\/div>\n\n\n

    \ucd94\ub860\uc2dc\uc5d0 \ub354 \uc801\uc740 Inference cost\ub97c \uc0ac\uc6a9\ud558\uba74\uc11c \uc131\ub2a5\uc740 Llama2 70B \ubaa8\ub378\ubcf4\ub2e4 \ub354 \uc6b0\uc704\uc5d0 \uc788\ub294 \uac83\ub3c4 \ubcfc \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n

    \n
    \"\"<\/figure><\/div>\n\n\n

    <\/p>\n\n\n\n

    Mixtral\uc744 \ub300\ud654\ud615 Instruction\uc5d0 \ub9de\ub294 \ub2f5\ubcc0\uc5d0 \ucd5c\uc801\ud654\ud55c instruct \ubc84\uc804\ub3c4 \ubc1c\ud45c\ud558\uc600\uc2b5\ub2c8\ub2e4. instruction\uc744 \uc9c0\ub3c4\ud559\uc2b5(supervised fine-tuning)\ud558\uace0 \uc0dd\uc131 \ub2f5\ubcc0\uc5d0 \ub300\ud55c \uae0d\uc815\uc801 \ub2f5\ubcc0\uc758 \ud655\ub960\uc744 \ub192\uc544\uc9c0\ub3c4\ub85d \ud559\uc2b5\ud558\ub294(DPO, direct preference optimisation) \ubc29\ubc95\uc774 \uc801\uc6a9\ub418\uc5c8\uc2b5\ub2c8\ub2e4. \uc774\ub294 \ud604\uc7ac \uc624\ud508\uc18c\uc2a4 \ubaa8\ub378\uc911\uc5d0 \uac00\uc7a5 \ub192\uc740 \uc131\ub2a5\uc744 \uc774\ub04c\uc5b4\ub0b4\ub294 \ubaa8\ub378\uc785\ub2c8\ub2e4. \uc548\uc815\uc131\uc744 \uac15\ud654\ud558\uae30 \uc704\ud574 \uc544\ub798\uc640 \uac19\uc740 \uc548\uc815\uc131 \ud504\ub86c\ud504\ud2b8\uc5d0 \ub300\ud55c \ud29c\ub2dd\ub3c4 \uc9c4\ud589\ud558\uc600\ub2e4\uace0 \ud569\ub2c8\ub2e4. Mistral\uc758 safe_mode \ud30c\ub77c\ubbf8\ud130\ub97c \ud1b5\ud558\uc5ec \uc27d\uac8c \ud504\ub86c\ud504\ud2b8\ub97c \ubd99\uc77c \uc218 \uc788\uace0, \uc0dd\uc131\ubaa8\ub378\uc5d0 \ub300\ud55c \ud560\ub8e8\uc2dc\ub124\uc774\uc158, \ud3b8\ud5a5\uc801 \ub2f5\ubcc0\uc744 \ud574\uacb0\ud558\ub824\ub294 \ub178\ub825\uc774 \ubcf4\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n

    chat_response = client.chat(\n    model=\"mistral-tiny\", \n    messages=ChatMessage(role=\"user\", content=\"What is the best French cheese?\"),\n    safe_mode=True\n)\n\n# safe_mode\ub97c True\ub85c \uc8fc\uba74, \ub2e4\uc74c\uacfc \uac19\uc740 \ud504\ub86c\ud504\ud2b8\uac00 \ubd99\ub294\ub2e4\uace0 \ud569\ub2c8\ub2e4.\n# Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.\n\n<\/code><\/pre>\n\n\n\n

    \uc131\ub2a5\ub3c4 \uc7a1\uc73c\uba74\uc11c \ucd94\ub860\ub3c4 \ube60\ub974\uac8c, \ub450 \ub9c8\ub9ac \ud1a0\ub07c\ub97c \ub2e4 \uc7a1\uc73c\uba74\uc11c LLM\uc744 \uc2e4\uc81c \uc5b4\ud50c\ub9ac\ucf00\uc774\uc158\ub2e8\uc5d0\uc11c \ub354 \uc27d\uac8c \uc0ac\uc6a9\ud560 \uc218 \uc788\uac8c \ub41c \uac83 \uac19\uc2b5\ub2c8\ub2e4. \ud558\uc9c0\ub9cc \uc544\uc9c1\uc740 OpenAI, Google\uac19\uc740 \ube45\ud14c\ud06c \uae30\uc5c5\uc5d0\uc11c API\ud615\ud0dc\ub85c \uc81c\uacf5\ud558\ub294 LLM \ub300\ube44 \uacbd\uc7c1\ub825\uc774 \uc5bc\ub9cc\ud07c \uc778\uc9c0\uc5d0 \ub300\ud55c \uc758\ubb38\uc774 \ub530\ub77c \ubd99\uc9c0\ub9cc LLM \uc0dd\ud0dc\uacc4\uc758 \uc88b\uc740 \ud65c\ub825\uc744 \uc8fc\ub294 \uac83 \uac19\uc2b5\ub2c8\ub2e4. (Mistral AI\ub3c4 \uc2a4\ud0c0\ud2b8\uc5c5\uc774\uace0, Business Model\uc744 \ubc14\uafc0 \uc218\ub294 \uc788\uaca0\uc9c0\ub9cc…)<\/p>\n\n\n\n

    \u53c2\u8003<\/strong><\/p>\n\n\n\n

      \n
    • https:\/\/mistral.ai\/news\/mixtral-of-experts\/<\/li>\n\n\n\n
    • https:\/\/huggingface.co\/blog\/moe<\/li>\n\n\n\n
    • https:\/\/www.aitimes.com\/news\/articleView.html?idxno=155775<\/li>\n<\/ul>\n\n\n\n

      <\/p>\n

      <\/span><\/div>","protected":false},"excerpt":{"rendered":"

      [\uc120\ud589AI\uae30\uc220\ud300 \uc804\ub3d9\uc900]
      \n\uc2a4\ud0c0\ud2b8\uc5c5 \ud68c\uc0ac\uc778 Mistral AI \uc5d0\uc11c \uc9c0\ub09c 8\uc77c\uc5d0 Mixtral 8x7B \ubaa8\ub378\uc744 \uc624\ud508\uc18c\uc2a4\ub85c \ucd9c\uc2dc\ud558\uc600\uc2b5\ub2c8\ub2e4. \uc9c0\ub09c 9\uc6d4\uc5d0 \ucd9c\uc2dc\ud55c Mistral 7B \ubaa8\ub378 \uae30\ubc18\uc73c\ub85c \ud604\uc874 \uc5b8\uc5b4 \uc0dd\uc131 \ubd84\uc57c\uc5d0\uc11c \ucd5c\uace0 \uc131\ub2a5\uc778 GPT-4\uc5d0\uc11c \ucc44\ud0dd\uc911\uc778 “MoE” \ubc29\uc2dd\uc744 \uc0ac\uc6a9\ud558\uc5ec \ud30c\ub77c\ubbf8\ud130 \uc218\uac00 \ub354 \ub9ce\uc740 Llama 2 70B, GPT3.5 \ubaa8\ub378\ubcf4\ub2e4 \uc790\uc5f0\uc5b4 \ubca4\uce58\ub9c8\ud06c \uc131\ub2a5\uc774 \ub6f0\uc5b4\ub098\uace0, \ucd94\ub860 \uc18d\ub3c4\ub3c4 \ube60\ub974\ub2e4\uace0 \uc124\uba85\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n

      <\/span><\/div>","protected":false},"author":1,"featured_media":64384,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_lock_modified_date":false,"footnotes":""},"categories":[532,19],"tags":[205,531,667,720,721,722],"class_list":["post-64374","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-nlp","category-tech04","tag-generative","tag-nlp","tag-llm","tag-moe","tag-mixtral","tag-mistral","category-532","category-19","description-off"],"_links":{"self":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts\/64374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/comments?post=64374"}],"version-history":[{"count":4,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts\/64374\/revisions"}],"predecessor-version":[{"id":64385,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts\/64374\/revisions\/64385"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/media\/64384"}],"wp:attachment":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/media?parent=64374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/categories?post=64374"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/tags?post=64374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}