{"id":60394,"date":"2021-06-25T12:03:48","date_gmt":"2021-06-25T03:03:48","guid":{"rendered":"https:\/\/smilegate.ai\/?p=60394"},"modified":"2021-06-25T12:10:14","modified_gmt":"2021-06-25T03:10:14","slug":"handling-imbalanced-datasets","status":"publish","type":"post","link":"https:\/\/smilegate.ai\/cn\/2021\/06\/25\/handling-imbalanced-datasets\/","title":{"rendered":"\u5904\u7406\u4e0d\u5e73\u8861\u7684\u6570\u636e\u96c6"},"content":{"rendered":"
[\u670d\u52a1\u5f00\u53d1\u7ec4\u9ec4\u4fca\u5584]<\/p>\n\n\n\n
\u5728\u76d1\u7763\u5b66\u4e60\u673a\u5668\u5b66\u4e60\u6a21\u578b\u65f6\uff0c\u5982\u679c\u4f7f\u7528\u6807\u7b7e\u95f4\u6570\u636e\u6570\u91cf\u4e0d\u5747\u8861\u7684\u6570\u636e\u96c6\u4f5c\u4e3a\u8bad\u7ec3\u6570\u636e\uff0c\u5219\u4f1a\u51fa\u73b0\u5bf9\u5c5e\u4e8e\u5c0f\u6bd4\u4f8b\u6807\u7b7e\u7684\u6837\u672c\u8bad\u7ec3\u6548\u679c\u4e0d\u4f73\u7684\u73b0\u8c61\u3002\u5982\u679c\u5355\u7eaf\u7684\u6837\u672c\u6570\u91cf\u5c11\uff0c\u8bad\u7ec3\u6548\u679c\u4e0d\u4f1a\u5f88\u597d\uff0c\u5373\u4f7f\u6709\u8db3\u591f\u7684\u6837\u672c\u53ef\u4ee5\u5b66\u4e60\uff0c\u5982\u679c\u6bd4\u4f8b\u5dee\u5f02\u8fc7\u5927\uff0c\u6a21\u578b\u4e5f\u4f1a\u6709\u504f\u5dee\u3002\u8fd9\u5c24\u5176\u5e38\u89c1\uff0c\u4f8b\u5982\uff0c\u5f53\u5f02\u5e38\u6570\u636e\u7684\u5206\u7c7b\u95ee\u9898\u662f\u6807\u7b7e\u8fc7\u591a\u800c\u65e0\u6cd5\u5206\u7c7b\u65f6\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u65e0\u8bba state-of-the-art \u6a21\u578b\u6709\u591a\u597d\uff0c\u4e5f\u5f88\u96be\u63a8\u5bfc\u51fa\u6b63\u786e\u7684\u6027\u80fd\u3002\u6709\u56db\u79cd\u4e3b\u8981\u65b9\u6cd5\u53ef\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002<\/p>\n\n\n\n
\u5b83\u4e0d\u662f\u76f4\u63a5\u89e3\u51b3\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u95ee\u9898\u7684\u65b9\u6cd5\u7684\u4e00\u90e8\u5206\uff0c\u800c\u662f\u53ef\u4ee5\u8bf4\u662f\u51c6\u786e\u89e3\u91ca\u548c\u7406\u89e3\u5f53\u524d\u8bad\u7ec3\u7684\u6a21\u578b\u5e76\u5e94\u7528\u540e\u9762\u63cf\u8ff0\u7684\u89e3\u51b3\u65b9\u6848\u7684\u7b2c\u4e00\u6b65\u3002\u4f8b\u5982\uff0c\u5047\u8bbe\u6211\u4eec\u6709\u4e00\u4e2a\u6807\u7b7e 0 \u548c 1 \u7684\u4e8c\u5143\u5206\u7c7b\u95ee\u9898\uff0c\u5c5e\u4e8e\u6807\u7b7e 0 \u7684\u6837\u672c\u5360\u6574\u4e2a\u6570\u636e\u96c6\u7684\u6bd4\u4f8b\u4e3a 99%\uff0c\u5c5e\u4e8e\u6807\u7b7e 1 \u7684\u6837\u672c\u6bd4\u4f8b\u4e3a 1%\u3002\u5982\u679c\u8bad\u7ec3\u540e\u7684\u6a21\u578b\u5c06\u6240\u6709\u6570\u636e\u5206\u7c7b\u4e3a 0\uff0c\u5219\u8be5\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e3a 99%\u3002\u867d\u7136\u8fd9\u4e2a\u7cbe\u5ea6\u4e0d\u662f\u4e00\u4e2a\u9519\u8bef\u7684\u6307\u6807\uff0c\u4f46\u662f\u8fd9\u4e2a99%\u6027\u80fd\u6307\u6807\u80fd\u6b63\u786e\u7684\u544a\u8bc9\u8fd9\u4e2a\u578b\u53f7\u7684\u6027\u80fd\u5417\uff1f\u4e00\u822c\u6765\u8bf4\uff0c\u6211\u4eec\u5e0c\u671b\u5728\u8fd9\u4e9b\u6570\u636e\u4e2d\u6b63\u786e\u5206\u7c7b 1\uff0c\u800c\u4e0d\u662f 0\u3002\u5982\u679c\u662f\u8fd9\u6837\uff0c\u8fd9\u4e2a\u6307\u6807\u5c31\u6ca1\u6709\u4ef7\u503c\u4e86\u3002\u56e0\u6b64\uff0c\u63a8\u8350\u4f7f\u7528\u4ee5\u4e0b\u8bc4\u4ef7\u6307\u6807[1]\uff0c\u4e0d\u4ec5\u53ef\u4ee5\u770b\u5230\u51c6\u786e\u6027\uff0c\u8fd8\u53ef\u4ee5\u770b\u5230\u5404\u4e2a\u65b9\u9762\u3002<\/p>\n\n\n\n \u5728\u4ee5\u4e0a\u6307\u6807\u4e2d\uff0c\u53ef\u4ee5\u770b\u51fa\uff0c\u5bf9\u4e8e\u6bd4\u4f8b\u8f83\u5c0f\u7684\u6807\u7b7e\uff0c\u5176\u51c6\u786e\u7387\u548c\u53ec\u56de\u7387\u90fd\u975e\u5e38\u4f4e\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u5b83\u6ca1\u6709\u9488\u5bf9\u8be5\u6807\u7b7e\u8fdb\u884c\u8fc7\u826f\u597d\u7684\u8bad\u7ec3\u3002<\/p>\n\n\n\n \u5982\u679c\u901a\u8fc7\u4e0a\u8ff0\u8bc4\u4f30\u6307\u6807\u786e\u5b9a\u6a21\u578b\u6ca1\u6709\u8bad\u7ec3\u597d\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u53ef\u4ee5\u7b80\u5355\u5e94\u7528\u7684\u7b56\u7565\u3002\u60a8\u53ef\u4ee5\u901a\u8fc7\u5339\u914d\u6807\u7b7e\u4e4b\u95f4\u7684\u6bd4\u4f8b\u6765\u89e3\u51b3\u4e0d\u5e73\u8861\u95ee\u9898\u3002<\/p>\n\n\n\n \u4e0a\u8ff0\u4e24\u79cd\u65b9\u6cd5[4]\u7684\u9002\u5f53\u7ec4\u5408\u5c06\u5bfc\u81f4\u66f4\u597d\u7684\u91c7\u6837\u5668\u3002\u53e6\u5916\uff0c\u5e38\u7528\u7684\u6df1\u5ea6\u5b66\u4e60\u6846\u67b6PyTorch\u7684Imbalanced Dataset Sampler[5]\u5df2\u7ecf\u53d1\u5e03\uff0c\u53ef\u4ee5\u53c2\u8003\u4e00\u4e0b\u3002\u4f46\u662f\uff0c\u5982\u679c\u5c5e\u4e8e\u4f60\u8981\u5206\u7c7b\u7684\u6807\u7b7e\u7684\u6570\u636e\u6781\u5c11\uff0c\u4e0d\u8db3\u4ee5\u5b66\u4e60\uff0c\u5355\u9760\u62bd\u6837\u65b9\u6cd5\u53ef\u80fd\u65e0\u6cd5\u89e3\u51b3\u3002<\/p>\n\n\n\n \u5f53\u6570\u636e\u6570\u91cf\u6781\u5c11\u65f6\uff0c\u53ef\u4ee5\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u3002\u4f46\u662f\uff0c\u662f\u5426\u4ee5\u53ca\u5982\u4f55\u5e94\u7528\u6b64\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u5c06\u56e0\u4efb\u52a1\u800c\u5f02\uff0c\u5177\u4f53\u53d6\u51b3\u4e8e\u9886\u57df\u3002 <\/p>\n\n\n\n \u4e0a\u9762\u7684\u4f8b\u5b50\u662f\u4e00\u79cd\u666e\u904d\u5e94\u7528\u4e8e\u56fe\u50cf\u548c\u6587\u672c\u6570\u636e\u7684\u6570\u636e\u589e\u5f3a\u6280\u672f\u3002\u5177\u6709\u4f4e\u6bd4\u7387\u6807\u7b7e\u7684\u6837\u672c\uff0c\u800c\u4e0d\u662f\u5177\u6709\u9ad8\u6bd4\u7387\u6807\u7b7e\u7684\u6837\u672c\uff0c\u53ef\u4ee5\u8fdb\u884c\u6269\u5145\u5e76\u7528\u4e8e\u8bad\u7ec3\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002<\/p>\n\n\n\n \u6700\u540e\uff0c\u4f7f\u7528\u9002\u5408\u4e0d\u5e73\u8861\u6570\u636e\u7684\u635f\u5931\u51fd\u6570\u3002\u4e0d\u5e73\u8861\u6570\u636e\u6700\u6d41\u884c\u7684\u635f\u5931\u51fd\u6570\u4e4b\u4e00\u662f Focal Loss[6]\u3002\u9664\u6b64\u4e4b\u5916\uff0c\u8fd8\u6709\u5404\u79cd\u635f\u5931\u51fd\u6570\uff0c\u8fd8\u6709LADE Loss[7]\uff0c\u5c06\u5728CVPR 2021\u4e2d\u5f15\u5165\u3002<\/p>\n\n\n\n LADE Loss\uff1a\u901a\u8fc7\u5c06\u4e0d\u5e73\u8861\u8bad\u7ec3\u6570\u636e\u7684\u6807\u7b7e\u5206\u5e03\u5206\u5e03\u5230\u76ee\u6807\u6570\u636e\u7684\u6807\u7b7e\u5206\u5e03\u6765\u89e3\u51b3\u3002<\/p>\n\n\n\n \u6bd4\u6240\u63cf\u8ff0\u7684\u65b9\u6cd5\u66f4\u57fa\u672c\u7684\u89e3\u51b3\u65b9\u6848\u662f\u83b7\u53d6\u66f4\u591a\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6536\u96c6\u9002\u5f53\u7684\u6570\u636e\u96c6\u548c\u6807\u8bb0\u7684\u6210\u672c\u662f\u5de8\u5927\u7684\u3002\u5982\u679c\u7528\u4e8e\u60a8\u5c1d\u8bd5\u89e3\u51b3\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u6570\u91cf\u4e0d\u8db3\u4ee5\u8bad\u7ec3\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u5219\u4e0a\u8ff0\u6240\u6709\u65b9\u6cd5\u53ef\u80fd\u90fd\u65e0\u6cd5\u89e3\u51b3\u3002\u5982\u679c\u6570\u636e\u96c6\u8db3\u591f\uff0c\u4f46\u9ad8\u5ea6\u4e0d\u5e73\u8861\uff0c\u5219\u53ef\u4ee5\u5e94\u7528\u4e0a\u8ff0\u65b9\u6cd5\u6765\u9a71\u52a8\u6027\u80fd\u3002\u6700\u540e\u4e00\u4e2a\u65b9\u6cd5\uff0c\u4f7f\u7528\u635f\u5931\u51fd\u6570\u7684\u65b9\u6cd5\uff0c\u5c06\u4f5c\u4e3a\u4ee3\u7801\u5171\u4eab\uff0c\u6211\u5c06\u5b8c\u6210\u8be5\u5e16\u5b50\u3002<\/p>\n\n\n\n https:\/\/github.com\/Joonsun-Hwang\/imbalance-loss-test\/blob\/main\/Loss%20Test.ipynb<\/a><\/p>\n\n\n\n [1] https:\/\/en.wikipedia.org\/wiki\/Precision_and_recall<\/a> [\uc11c\ube44\uc2a4\uac1c\ubc1c\ud300 \ud669\uc900\uc120] \uae30\uacc4\ud559\uc2b5 \ubaa8\ub378\uc744 \uc9c0\ub3c4 \ud559\uc2b5\ud560 \ub54c \ub77c\ubca8\uac04 \ub370\uc774\ud130\uc758 \uac1c\uc218\uac00 \ubd88\uade0\ud615\ud55c \ub370\uc774\ud130\uc14b\uc744 \ud6c8\ub828 \ub370\uc774\ud130\ub85c \uc0bc\uc744 \uacbd\uc6b0, \ube44\uc728\uc774 \uc791\uc740 \ub77c\ubca8\uc5d0 \uc18d\ud55c \uc0d8\ud50c\ub4e4\uc5d0 \ub300\ud55c \ud559\uc2b5\uc774 \uc798 \uc774\ub8e8\uc5b4\uc9c0\uc9c0 \uc54a\ub294 \ud604\uc0c1\uc744 \uacaa\uac8c \ub429\ub2c8\ub2e4. \ub2e8\uc21c\ud788 \uc0d8\ud50c\uc758 \uac1c\uc218 \uc790\uccb4\uac00 \uc801\ub2e4\uba74 \ub2f9\uc5f0\ud788 \ud559\uc2b5\uc774 \uc798 \uc774\ub8e8\uc5b4\uc9c0\uc9c0 \uc54a\uc744 \uac83\uc774\uba70, \uc0d8\ud50c\uc774 \ucda9\ubd84\ud788 \ud559\uc2b5\ud560\ub9cc\ud07c\uc740 \uc788\ub2e4\uace0 \ud558\ub354\ub77c\ub3c4, \ube44\uc728 \ucc28\uc774\uac00 \uadf9\uc2ec\ud558\ub2e4\uba74 \ubaa8\ub378\uc740 \ud3b8\ud5a5\uc131\uc744 \uac00\uc9c0\uac8c \ub420 \uac83\uc785\ub2c8\ub2e4. \ud2b9\ud788, \uc608\ub97c…<\/p>\n<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
\n\n\n\n
[2] https:\/\/imbalanced-learn.org\/stable\/under_sampling.html<\/a>
[3] https:\/\/imbalanced-learn.org\/stable\/over_sampling.html<\/a>
[4] https:\/\/imbalanced-learn.org\/stable\/combine.html<\/a>
[5] https:\/\/github.com\/ufoym\/imbalanced-dataset-sampler<\/a>
[6] https:\/\/arxiv.org\/abs\/1708.02002<\/a>
[7] https:\/\/github.com\/hyperconnect\/LADE<\/a><\/p>\n