ãœãŒããªãŒããŒ
LLMã®ãã³ãããŒã¯

LLM ããã©ãŒãã³ã¹è©äŸ¡ã«ããããã³ãããŒã¯ã®åœ¹å²ãšéçãçè§£ããŸããå ç¢ãª LLM ãéçºããããã®ææ³ãæ¢ããŸãã
è¿å¹Žãå€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒã®äººæ°ãççºçã«é«ãŸã£ãŠããŸããçããããåç¥ã§ããããLLMã¯äººéã®èšèªã³ãã³ããçè§£ããåªããèœåãæã¡ãããžãã¹ã«ãšã£ãŠãŸãã«å®ç§ãªçµ±åããŒã«ãšãªã£ãŠããŸããéèŠãªã¯ãŒã¯ãããŒããµããŒãããã¿ã¹ã¯ãèªååããŠå¹çãæå€§éã«é«ããŸããããã«ãå¹³åçãªãŠãŒã¶ãŒã®çè§£ãè¶ ããå€ãã®ããšãLLMã§å®çŸã§ããŸããLLMãžã®äŸå床ãé«ãŸãã«ã€ããŠãå¿ èŠãªç²ŸåºŠãšä¿¡é Œæ§ã確ä¿ããããã®å¯Ÿçã«ãããäžå±€æ³šæãæãå¿ èŠããããŸããããã¯çµç¹å šäœã«é¢ä¿ããã°ããŒãã«ãªèª²é¡ã§ãããããžãã¹ã®åéã§ã¯ãããŸããŸãªé åã«ããã£ãŠLLMã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããã«äœ¿çšã§ãããã³ãããŒã¯ãããã€ãååšããŸãããããã®ãã³ãããŒã¯ã¯ãã¢ãã«ã®çè§£åãè«çæ§ç¯èœåãæ°åŠçèœåãªã©ããã¹ãããããšãã§ãããã®çµæã«åºã¥ããŠLLMãããžãã¹å±éã«é©ããŠãããã©ããã倿ããŸãã
ãã®èšäºã§ã¯ãLLM è©äŸ¡ã®æãäžè¬çãªãã³ãããŒã¯ã®å æ¬çãªãªã¹ããéããŸãããåãã³ãããŒã¯ã«ã€ããŠè©³ãã説æããããŸããŸãª LLM ãè©äŸ¡åºæºã«å¯ŸããŠã©ã®ããã«è©äŸ¡ãããããèŠãŠãããŸãããããããŸã㯠LLM è©äŸ¡ã«ã€ããŠè©³ããçè§£ããŸãããã
LLMè©äŸ¡ãšã¯äœã§ãã?
ä»ã®AIã¢ãã«ãšåæ§ã«ãLLMãèšèªã¢ãã«ã®ããã©ãŒãã³ã¹ã®æ§ã ãªåŽé¢ïŒç¥èã粟床ãä¿¡é Œæ§ãäžè²«æ§ïŒãè©äŸ¡ããç¹å®ã®ãã³ãããŒã¯ã«åºã¥ããŠè©äŸ¡ããå¿ èŠããããŸãããã®æšæºã«ã¯éåžžã以äžã®é ç®ãå«ãŸããŸãã
- ãŠãŒã¶ãŒã¯ãšãªã®çè§£: å¹ åºããŠãŒã¶ãŒå ¥åãæ£ç¢ºã«çè§£ãè§£éããã¢ãã«ã®èœåãè©äŸ¡ããŸãã
- åºåæ€èšŒ: AI ãçæããå¿çãä¿¡é Œã§ããç¥èããŒã¹ãšç §åããŠãæ£ç¢ºãã€é¢é£æ§ãããããšã確èªããŸãã
- å ç¢æ§ïŒ ãããŸããäžå®å šããŸãã¯ãã€ãºã®å€ãå ¥åã«å¯ŸããŠã¢ãã«ãã©ã®çšåºŠé©åã«æ©èœããããæž¬å®ããŸãã
LLM è©äŸ¡ã«ãããéçºè ã¯å¶éãå¹ççã«ç¹å®ããŠå¯ŸåŠã§ããããã«ãªããå šäœçãªãŠãŒã¶ãŒ ãšã¯ã¹ããªãšã³ã¹ãåäžãããããšãã§ããŸããLLM ã培åºçã«è©äŸ¡ãããšããããŸããªå ¥åãäºæããªãå ¥åãå«ãããŸããŸãªå®éã®ã¢ããªã±ãŒã·ã§ã³ãåŠçã§ããã»ã©æ£ç¢ºã§å ç¢ãªãã®ã«ãªããŸãã
ãã³ãããŒã¯
LLM ã¯ãçŸåšãŸã§ã«æãè€éãªãã¯ãããžãŒã® 1 ã€ã§ãããæãé£ããã¢ããªã±ãŒã·ã§ã³ã«ãåãçºæ®ããŸãããããã£ãŠãè©äŸ¡ããã»ã¹ãåæ§ã«è€éã§ãæèããã»ã¹ãšæè¡çæ£ç¢ºæ§ã詊ãããŸãã
ãã³ãããŒã¯ã§ã¯ãç¹å®ã®ããŒã¿ã»ãããã¡ããªãã¯ãè©äŸ¡ã¿ã¹ã¯ã䜿çšã㊠LLM ã®ããã©ãŒãã³ã¹ããã¹ãããããŸããŸãª LLM ãæ¯èŒããŠãã®ç²ŸåºŠã枬å®ã§ãããããããã©ãŒãã³ã¹ã®åäžã«ãã£ãŠæ¥çã®é²æ©ãä¿é²ãããŸãã
LLM ããã©ãŒãã³ã¹ã®æãå žåçãªåŽé¢ãããã€ã瀺ããŸãã
- ç¥èã¢ãã«ã®ç¥èã¯ãæ§ã ãªåéã«ããã£ãŠãã¹ãããå¿ èŠããããŸãããã®ããã«ç¥èãã³ãããŒã¯ãååšããŸãããã®ãã³ãããŒã¯ã¯ãç©çåŠãããã°ã©ãã³ã°ãå°çåŠãªã©ãæ§ã ãªåéããã¢ãã«ãã©ãã ã广çã«æ å ±ãæ³èµ·ã§ããããè©äŸ¡ããŸãã
- è«çç æšè«: ã¢ãã«ã段éçã«ãèãããè«ççãªçµè«ãå°ãåºãèœåããã¹ãããããšãæå³ããŸããéåžžãã¢ãã«ãæ¥åžžçãªç¥èãšè«ççæšè«ã«åºã¥ããŠæã劥åœãªç¶ç¶ãŸãã¯èª¬æãéžæããªããã°ãªããªãã·ããªãªãå«ãŸããŸãã
- èªè§£ã¢ãã«ã¯èªç¶èšèªã®è§£éã«åªããããã«å¿ããŠå¿çãçæããå¿ èŠããããŸãããã¹ãã¯ãæç« ã«åºã¥ããŠè³ªåã«çããçè§£ãæšè«ã詳现ã®ä¿æã枬å®ãããããªãã®ã§ããåŠæ ¡ã®èªè§£ãã¹ãã®ãããªãã®ã§ãã
- ã³ãŒãã®çè§£: ããã¯ãã¢ãã«ã®ã³ãŒãã®çè§£ãèšè¿°ããããã°ã®èœåãæž¬å®ããããã«å¿ èŠã§ãããããã®ãã³ãããŒã¯ã¯ãã¢ãã«ãæ£ç¢ºã«è§£æ±ºããå¿ èŠãããã³ãŒãã£ã³ã° ã¿ã¹ã¯ãŸãã¯åé¡ãã¢ãã«ã«äžããå€ãã®å ŽåãããŸããŸãªããã°ã©ãã³ã°èšèªãšãã©ãã€ã ãã«ããŒããŸãã
- äžçã®ç¥èã¢ãã«ãäžçã«é¢ããäžè¬çãªç¥èãã©ã®çšåºŠçè§£ããŠããããè©äŸ¡ããããããã®ããŒã¿ã»ããã¯ãæ£è§£ããããã«ã¯åºç¯ãã€çŸç§äºå žçãªç¥èãå¿ èŠãšãªã質åãå«ãã§ãããããããå ·äœçã§å°éçãªç¥èãã³ãããŒã¯ãšã¯ç°ãªããŸãã
ãç¥èããã³ãããŒã¯
MMLU (ãã«ãã¢ãŒãã«èšèªçè§£)
ãã®ãã³ãããŒã¯ã¯ã人æç§åŠã瀟äŒç§åŠãæŽå²ãã³ã³ãã¥ãŒã¿ãŒ ãµã€ãšã³ã¹ãããã«ã¯æ³åŸãªã©ãããŸããŸãªãããã¯ã«é¢ãã LLM ã®äºå®ç¥èã®çè§£ããã¹ãããããã«äœæãããŠããŸãã57 ã®è³ªåãš 15 ã®ã¿ã¹ã¯ã¯ãã¹ãŠãã¢ãã«ãåªããæšè«æ©èœãåããŠããããšã確èªããããšãç®çãšããŠããŸããããã«ãããMMLU ã¯ããŸããŸãªãããã¯ãæ±ã LLM ã®äºå®ç¥èãšæšè«ãè©äŸ¡ããããã®åªããããŒã«ã«ãªããŸãã
æè¿ãããã¯äžèšã®åéã§LLMãè©äŸ¡ããããã®éèŠãªãã³ãããŒã¯ãšãªã£ãŠããŸããéçºè ã¯åžžã«ãã®ãã³ãããŒã¯ã§ä»ã®ã¢ãã«ãäžåãããã«ã¢ãã«ãæé©åããããšèããŠããããããLLMã«ãããé«åºŠãªæšè«ãšç¥èãè©äŸ¡ããããã®äºå®äžã®æšæºãšãªã£ãŠããŸããå€§èŠæš¡ãªãšã³ã¿ãŒãã©ã€ãºã°ã¬ãŒãã®ã¢ãã«ã¯ã å°è±¡çãªã¹ã³ã¢ ãã®ãã³ãããŒã¯ã§ã¯ãGPT-4-omni ã 88.7%ãClaude 3 Opus ã 86.8%ãGemini 1.5 Pro ã 85.9%ãLlama-3 70B ã 82% ãšããçµæãåºãŠããŸããå°åã¢ãã«ã¯éåžžããã®ãã³ãããŒã¯ã§ã¯ããã»ã©è¯ãããã©ãŒãã³ã¹ãçºæ®ãããé垞㯠60ïœ65% ãè¶ ããããšã¯ãããŸããããæè¿ã® Phi-3-Small-7b ã® 75.3% ãšããããã©ãŒãã³ã¹ã¯æ³šç®ã«å€ããŸãã
ããããMMLUã«ã¯æ¬ ç¹ããªãããã§ã¯ãããŸãããææ§ãªè³ªåãªã©ã®æ¢ç¥ã®åé¡ããããŸãã äžæ£è§£ãã³ã³ããã¹ããæ¬ èœããŠããŸãããŸããäžéšã®ã¿ã¹ã¯ã¯ LLM ã®é©åãªè©äŸ¡ã«ã¯ç°¡åããããšèãã人ãå€ãããŸãã
MMLUã®ãããªãã³ãããŒã¯ã¯çŸå®äžçã®ã·ããªãªãå®å šã«æåãããã®ã§ã¯ãªãããšãæç¢ºã«ããŠãããããšæããŸããLLMããã®ç¹ã§é«ãã¹ã³ã¢ãç²åŸãããšããŠããå¿ ããããã®åéã®å°éå®¶ã«ãªã£ãããšãæå³ããããã§ã¯ãããŸããããã³ãããŒã¯ã¯å¯Ÿè±¡ç¯å²ãããªãéå®ãããŠãããå€è¢éžæåŒã®åé¡ã«é Œãããšãå€ããçŸå®äžçã®çžäºäœçšã®è€éããæèãå®å šã«æããããšã¯ã§ããŸãããçã®çè§£ã«ã¯ãäºå®ãç¥ãããã®ç¥èãåçã«é©çšããããšãå¿ èŠã§ãããããã«ã¯æ¹å€çæèãåé¡è§£æ±ºããããŠæèçè§£ãå«ãŸããŸããããããçç±ãããLLMã¯ã¢ãã«ããã³ãããŒã¯ã®åŠ¥åœæ§ãšæå¹æ§ãç¶æããããã«ãåžžã«æ¹è¯ãšæŽæ°ãè¡ãå¿ èŠããããŸãã
GPQA (倧åŠé¢ã¬ãã«ã® Google å¯Ÿå¿ Q&A ãã³ãããŒã¯)
ãã®ãã³ãããŒã¯ã¯ãLLMã®è«ççæšè«èœåãã ããŒã¿ã»ãã 質åã¯ããã 448 åã§ããåéã®å°éå®¶ãéçºããçç©åŠãç©çåŠãååŠã®ãããã¯ãã«ããŒããŠããŸãã
å質åã¯æ¬¡ã®æ€èšŒããã»ã¹ãçµãŸãã
- åããããã¯ã®å°éå®¶ã質åã«çãã詳现ãªãã£ãŒãããã¯ãæäŸããŸãã
- 質åäœæè ã¯ãã®ãã£ãŒãããã¯ã«åºã¥ããŠè³ªåãä¿®æ£ããŸãã
- 2 人ç®ã®å°éå®¶ãä¿®æ£ããã質åã«çããŸãã
ãã®ããã»ã¹ã«ããã質åã客芳çã§æ£ç¢ºã§ãããèšèªã¢ãã«ã«ãšã£ãŠææŠçãªãã®ã§ããããšãå®éã«ä¿èšŒãããŸããçµéšè±å¯ãªå士å·ååŸè ã§ããããããã®è³ªåã®ç²ŸåºŠã¯ 65% ã«ããéããŸããããGPT-4-omni 㯠53.6% ã«ããéããã人éã®ç¥èœã𿩿¢°ã®ç¥èœã®ã®ã£ãããæµ®ã圫ãã«ãªã£ãŠããŸãã
è³æ ŒèŠä»¶ãå³ãããããããŒã¿ã»ããã¯å®éã«ã¯ããªãå°ãããæ£ç¢ºæ§ãæ¯èŒããããã®çµ±èšçæ€åºåãå€å°å¶éããã倧ããªå¹æãµã€ãºãå¿ èŠã«ãªããŸãããããã®è³ªåãäœæãæ€èšŒããå°é家㯠Upwork ããæ¥ããããå°éç¥èãšå¯Ÿè±¡ãããã¯ã«åºã¥ããŠãã€ã¢ã¹ãçããå¯èœæ§ããããŸãã
ã³ãŒããã³ãããŒã¯
HumanEval
164ã®ããã°ã©ãã³ã°åé¡ãLLMã®ã³ãŒãã£ã³ã°èœåãå®éã«ãã¹ãããŸãã HumanEvalããã¯ãå€§èŠæš¡èšèªã¢ãã« (LLM) ã®åºæ¬çãªã³ãŒãã£ã³ã°èœåããã¹ãããããã«èšèšãããŠããŸããçæãããã³ãŒãã®æ©èœçæ£ç¢ºæ§ã倿ããããã« pass@k ã¡ããªãã¯ã䜿çšããäžäœ k åã® LLM çæã³ãŒã ãµã³ãã«ã®ãã¡å°ãªããšã 1 ã€ããã¹ã ã±ãŒã¹ã«åæ Œãã確çãåºåããŸãã
HumanEval ããŒã¿ã»ããã«ã¯é¢æ°ã·ã°ããã£ãããã¥ã¡ã³ãæååãã³ãŒãæ¬äœãããã³ããã€ãã®åäœãã¹ããå«ãŸããŠããŸãããå®éã®ã³ãŒãã£ã³ã°åé¡ããã¹ãŠå«ãŸããŠããããã§ã¯ãããŸããããã®ãããããŸããŸãªã·ããªãªã«å¯ŸããŠæ£ããã³ãŒããäœæããã¢ãã«ã®èœåãé©åã«ãã¹ãããããšã¯ã§ããŸããã
MBPP (äž»ã«åºæ¬ç㪠Python ããã°ã©ãã³ã°)
ã¡ã¬ãã€ã ãã³ãããŒã¯ã¯ãã¯ã©ãŠããœãŒã·ã³ã°ããã 1,000 ã® Python ããã°ã©ãã³ã°åé¡ã§æ§æãããŠããŸãããããã¯åçŽã¬ãã«ã®åé¡ã§ãããåºæ¬çãªããã°ã©ãã³ã° ã¹ãã«ã«çŠç¹ãåœãŠãŠããŸããã¢ãã«ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããã«ãæ°åã®ã·ã§ãããšåŸ®èª¿æŽã®ã¢ãããŒãã䜿çšãããéåžžããã®ããŒã¿ã»ããã§ã¯å€§èŠæš¡ãªã¢ãã«ã®æ¹ãããã©ãŒãã³ã¹ãåäžããŸãããã ããããŒã¿ã»ããã«ã¯äž»ã«åçŽã¬ãã«ã®ããã°ã©ã ãå«ãŸããŠãããããå®éã®ã¢ããªã±ãŒã·ã§ã³ã®è€éããšèª²é¡ãå®å šã«ã¯è¡šããŠããŸããã
æ°åŠã®ãã³ãããŒã¯
ã»ãšãã©ã®æ³åŠä¿®å£«èª²çšã®åŠçã¯æšæºçãªåçãçµã¿ç«ãŠãã®ãåŸæã§ãããæ°åŠçæšè«ã¯åœŒãã«ãšã£ãŠã¯ããã«å€§ããªåé¡ã§ãããªãã§ãããã? 質åã®çè§£ãæ°åŠçæšè«ã«ããæ®µéçãªè«ççã¢ãããŒããæ£ããçããå°ãåºãã¹ãã«ãæ±ããããããã§ãã
ãæèã®é£éãïŒCoTïŒæ³ã¯ãæ°åŠé¢é£ã®ãã³ãããŒã¯ã§LLMãè©äŸ¡ããããã«èæ¡ãããææ³ã§ãã¢ãã«ã«åé¡ãè§£ãéã®æšè«ããã»ã¹ã段éçã«èª¬æããããã®ã§ãããã®ææ³ã«ã¯ããã€ãã®å©ç¹ããããŸããæšè«ããã»ã¹ã®éææ§ãé«ããã¢ãã«ã®ããžãã¯ã®æ¬ é¥ãç¹å®ããããããåé¡è§£æ±ºèœåããã詳现ã«è©äŸ¡ã§ããããã«ãªããŸããè€éãªåé¡ãäžé£ã®ããåçŽãªã¹ãããã«åè§£ããããšã§ãCoTã¯æ°åŠãã³ãããŒã¯ã«ãããã¢ãã«ã®ããã©ãŒãã³ã¹ãåäžãããæšè«èœåã«é¢ããããæ·±ãæŽå¯ãæäŸããŸãã
GSM8K: 人æ°ã®æ°åŠãã³ãããŒã¯
LLM ã«ãããæ°åŠèœåãè©äŸ¡ããããã®ããç¥ããããã³ãããŒã¯ã® 8 ã€ã«ãGSM8K ããŒã¿ã»ããããããŸããGSM8.5K ã¯ãäžåŠæ°åŠã®åé¡ 4k åã§æ§æãããŠãããè§£ãã«ã¯æ°ã¹ãããããããããã解決ã«ã¯äž»ã«äžé£ã®åºæ¬çãªèšç®ãå®è¡ããããšãå«ãŸããŸããéåžžãå€§èŠæš¡ãªã¢ãã«ãæ°åŠçæšè«çšã«ç¹å¥ã«ãã¬ãŒãã³ã°ãããã¢ãã«ã¯ããã®ãã³ãããŒã¯ã§åªããããã©ãŒãã³ã¹ãçºæ®ããåŸåããããŸããããšãã°ãGPT-96.5 ã¢ãã«ã¯ 7% ã®ã¹ã³ã¢ãèªããŸãããDeepSeekMATH-RL-88.2B 㯠XNUMX% ã§ãããã«é ãããšã£ãŠããŸãã
GSM8K ã¯ãå°åŠæ ¡ã¬ãã«ã®æ°åŠã®åé¡ãåŠçããã¢ãã«ã®èœåãè©äŸ¡ããã®ã«åœ¹ç«ã¡ãŸãããããé«åºŠã§å€æ§ãªæ°åŠã®èª²é¡ã解決ããã¢ãã«ã®èœåãå®å šã«ã¯æããããªãå¯èœæ§ãããããã®ããæ°åŠã®èœåã®å æ¬çãªå°ºåºŠãšããŠã®æå¹æ§ã¯å¶éãããŸãã
æ°åŠããŒã¿ã»ãã: å æ¬çãªä»£æ¿ææ®µ
æ°åŠããŒã¿ã»ããã¯ãGSM8Kã®ãããªãã³ãããŒã¯ã®æ¬ ç¹ã解決ããŸããããã®ããŒã¿ã»ããã¯ããåºç¯å²ã§ãåçç®æ°ãã髿 ¡ã倧åŠã¬ãã«ã®åé¡ãŸã§ãã«ããŒããŠããŸãããŸãã人éãšæ¯èŒãããæ°åŠãèŠæãªã³ã³ãã¥ãŒã¿ãµã€ãšã³ã¹ã®å士課çšã®åŠçã®ç²ŸåºŠã¯40%ãéã¡ããªã¹ãã®ç²ŸåºŠã¯90%ã§ããã
ããã¯ãLLMã®æ°åŠçèœåãããå æ¬çã«è©äŸ¡ãããã®ã§ããã¢ãã«ãåºæ¬çãªç®è¡ã«ç²ŸéããŠããããšããããŠä»£æ°ã幟äœåŠã埮ç©åãšãã£ãè€éãªåéã«ã粟éããŠããããšã蚌æããŸããããããåé¡ã®è€éæ§ãšå€æ§æ§ãé«ãŸããšãç¹ã«å¹ åºãæ°åŠçæŠå¿µã«ã€ããŠæç€ºçã«èšç·ŽãããŠããªãã¢ãã«ã§ã¯ãé«ã粟床ãéæããããšãå°é£ã«ãªãå¯èœæ§ããããŸãããŸããMathããŒã¿ã»ããã«ãããåé¡ã®åœ¢åŒã倿§ã§ããããšãããã¢ãã«ã®ããã©ãŒãã³ã¹ã«äžè²«æ§ããªããªãå¯èœæ§ããããã¢ãã«ã®å šäœçãªæ°åŠçèœåã«ã€ããŠæç¢ºãªçµè«ãå°ãåºãããšãéåžžã«å°é£ã«ãªããŸãã
æ°åŠããŒã¿ã»ãããšChain of Thoughtæ³ãçµã¿åãããããšã§ãå¹ åºãæ°åŠç課é¡ã«ãããæ³åŠä¿®å£«ïŒLLMïŒã®æ®µéçãªæšè«èœåãæããã«ããããšãã§ããè©äŸ¡ã®è³ªãé«ããããšãã§ããŸãããã®ãããªçµã¿åããã¢ãããŒãã«ãããæ³åŠä¿®å£«ïŒLLMïŒã®çã®æ°åŠçèœåããã匷åºãã€è©³çްã«è©äŸ¡ããããšãå¯èœã«ãªããŸãã
èªè§£åã®ãã³ãããŒã¯
èªè§£åè©äŸ¡ã¯ãè€éãªããã¹ããçè§£ãåŠçããã¢ãã«ã®èœåãè©äŸ¡ãããã®ã§ã顧客ãµããŒããã³ã³ãã³ãçæãæ å ±æ€çŽ¢ãšãã£ãã¢ããªã±ãŒã·ã§ã³ã«ãšã£ãŠç¹ã«éèŠã§ãããã®ã¹ãã«ãè©äŸ¡ããããã«èšèšããããã³ãããŒã¯ã¯ããã€ããããããããç¬èªã®å±æ§ãæã¡ãã¢ãã«ã®èœåãå æ¬çã«è©äŸ¡ããã®ã«åœ¹ç«ã¡ãŸãã
RACE (詊éšããã®èªè§£ããŒã¿ã»ãã)
RACE ãã³ãããŒã¯ã«ã¯ã28,000 æ³ãã 100,000 æ³ã®äžåœã®äžé«çã®è±èªè©ŠéšããåéãããçŽ 12 ã®æç« ãš 18 ã®è³ªåãå«ãŸããŠããŸããäžããããæç« ããæœåºããã質åãšåçã«å¶éããªããããã¿ã¹ã¯ã¯ããã«é£ãããªããŸãã
å¹ åºããããã¯ãšè³ªåã¿ã€ããã«ããŒããŠããããã培åºçãªè©äŸ¡ãå¯èœã§ãããŸããŸãªé£æåºŠã®è³ªåãå«ãŸããŠããŸãããŸããRACE ã®è³ªåã¯äººéã®èªè§£åããã¹ãããããã«ç¹å¥ã«èšèšãããŠããããã¡ã€ã³ã®å°éå®¶ã«ãã£ãŠäœæãããŠããŸãã
ãããããã®ãã³ãããŒã¯ã«ã¯æ¬ ç¹ããããŸããäžåœã®ææã«åºã¥ããŠéçºãããŠãããããã°ããŒãã«ãªç¶æ³ãåæ ããªãæåçåèŠãå ¥ã蟌ãåŸåããããŸãããŸããäžéšã®è³ªåã®é£æåºŠã¯é«ããå®éã®å žåçãªã¿ã¹ã¯ãåæ ããŠããŸããããã®ãããããã©ãŒãã³ã¹è©äŸ¡ã¯ããã»ã©æ£ç¢ºã§ã¯ãªãå¯èœæ§ããããŸãã
DROP (段èœããšã®å奿šè«)
ãã 96,000 ã€ã®éèŠãªã¢ãããŒã㯠DROP (Discrete Reasoning Over Paragraphs) ã§ããããã¯ã段èœã«å¯ŸããŠé¢æ£æšè«ãå®è¡ããã¢ãã«ã«èª²é¡ãäžããŸããLLM ã®æšè«èœåããã¹ãããããã® XNUMX ã®è³ªåãããã質å㯠Wikipedia ããæœåºãããAmazon Mechanical Turk ããã¯ã©ãŠããœãŒã·ã³ã°ãããŸããDROP ã®è³ªåã§ã¯ãå€ãã®å Žåãæç« å šäœã«æ£ãã°ã£ãŠããæ å ±ã«åºã¥ããŠãå ç®ãæžç®ãæ¯èŒãªã©ã®æ°åŠçæŒç®ãå®è¡ããã¢ãã«ãåŒã³åºãããŸãã
åé¡ã¯é£ããã§ããLLM ã¯æç« äžã®è€æ°ã®æ°åãèŠã€ããããããè¶³ãç®ãŸãã¯åŒãç®ããŠæçµçãªçããåºãå¿ èŠããããŸããGPT-4 ã palm ãªã©ã®å€§èŠæš¡ã¢ãã«ã¯ 80% ãš 85% ãéæãã人é㯠DROP ããŒã¿ã»ããã§ 96% ãéæããŠããŸãã
åžžèçãªãã³ãããŒã¯
èšèªã¢ãã«ã«ãããåžžèã®ãã¹ãã¯è峿·±ãã ãã§ãªãã人éçãªæšè«ãšäžèŽããå€æãæšè«ãè¡ãã¢ãã«ã®èœåãè©äŸ¡ããäžã§ãéèŠã§ããå®è·µçãªçµéšãéããŠå æ¬çãªäžçã¢ãã«ãæ§ç¯ãã人éãšã¯ç°ãªããèšèªã¢ãã«ã¯èšå€§ãªããŒã¿ã»ãããçšããŠåŠç¿ããŸãããå®éã«ã¯æèãæ¬è³ªçã«çè§£ããŠããããã§ã¯ãããŸããããã®ãããèšèªã¢ãã«ã¯ãæ¥åžžçãªç¶æ³ã®çŽæçãªææ¡ãè«ççæšè«ããããŠå®è·µçãªç¥èãå¿ èŠãšããã¿ã¹ã¯ãèŠæãšããŠããŸãããããã¯ãå ç¢ã§ä¿¡é Œæ§ã®é«ãAIã¢ããªã±ãŒã·ã§ã³ã«ãšã£ãŠéåžžã«éèŠã§ãã
HellaSwag (æµå¯Ÿçãªäžä»£ã®ç¶æ³ã«å¯ŸãããããããŒããªçµæ«ãããé·ãã³ã³ããã¹ããããã³äœã·ã§ããã®ã¢ã¯ãã£ããã£)
Hellaswagã¯ãã¯ã·ã³ãã³å€§åŠãšã¢ã¬ã³äººå·¥ç¥èœç ç©¶æã®Rowan Zellersãã«ãã£ãŠéçºãããŸãããããã¯ãäžããããã·ããªãªã®æã劥åœãªç¶ç¶ãäºæž¬ããã¢ãã«ã®èœåããã¹ãããããã«èšèšãããŠããŸãããã®ãã³ãããŒã¯ã¯ãæµå¯Ÿçãã£ã«ã¿ãªã³ã°ïŒAFïŒãçšããŠæ§ç¯ãããŠããŸããAFã§ã¯ãäžé£ã®èå¥åšãæµå¯Ÿçãªæ©æ¢°çæã®èª€ã£ãåçãå埩çã«éžæããŸãããã®ææ³ã¯ã人éã«ãšã£ãŠã¯äºçްãªäŸã§ãããã®ã®ãã¢ãã«ã«ãšã£ãŠã¯é£ããããŒã¿ã»ãããäœæããããŽã«ãã£ããã¯ã¹ããªé£æåºŠãŸãŒã³ãçã¿åºããŸãã
ãã©ã¹ã¯ã°ã¯ä»¥åã®ã¢ãã«ã§ã¯å°é£ã§ããããGPT-4 ãªã©ã®æå 端ã®ã¢ãã«ã¯äººéã®ç²ŸåºŠã«è¿ãããã©ãŒãã³ã¹ ã¬ãã«ãéæããŠããããã®åéã§å€§ããªé²æ©ãèŠãããŸãããã ãããããã®çµæã¯ãAI æ©èœã®é²æ©ã«å¯Ÿå¿ããããã«ãã³ãããŒã¯ãç¶ç¶çã«é²åãããå¿ èŠãããããšã瀺åããŠããŸãã
ãªãŒãã³ããã¯
Openbook ããŒã¿ã»ããã¯ã5957 ä»¶ã®åçã¬ãã«ã®çç§ã®å€è¢éžæåé¡ã§æ§æãããŠããŸãããããã®åé¡ã¯ãªãŒãã³ããã¯è©Šéšããåéããã察象ç§ç®ã«å¯Ÿãã人éã®ç解床ãè©äŸ¡ããããã«éçºãããŸããã
Openbook ãã³ãããŒã¯ã§ã¯ãæ å ±æ€çŽ¢ãè¶ ããæšè«èœåãæ±ããããŸããGPT-4 ã¯çŸæç¹ã§ 95.9% ãšããæé«ã®ç²ŸåºŠãéæããŠããŸãã
OpenbookQA ã¯ãªãŒãã³ããã¯è©Šéšãã¢ãã«ã«ããŠããã5,957 åã®å€è¢éžæåŒã®åçŽã¬ãã«ã®ç§åŠåé¡ã§æ§æãããŠããŸãããããã®åé¡ã¯ã1,326 ã®æ žãšãªãç§åŠäºå®ã®çè§£ãšãæ°ããç¶æ³ãžã®å¿çšãæ¢ãããã«èšèšãããŠããŸãã
Hellaswag ãšåæ§ã«ã以åã®ã¢ãã«ã§ã¯ OpenbookQA ã¯å°é£ã§ããããGPT-4 ãªã©ã®ææ°ã®ã¢ãã«ã¯äººéã«è¿ãããã©ãŒãã³ã¹ ã¬ãã«ãéæããŠããŸãããã®é²æ©ã¯ãAI çè§£ã®éçãæŒãåºãç¶ããããã«ãããã«è€éã§åŸ®åŠãªãã³ãããŒã¯ãéçºããããšã®éèŠæ§ã匷調ããŠããŸãã
LLM ããã©ãŒãã³ã¹è©äŸ¡ã«ã¯ãã³ãããŒã¯ã§ååã§ãããã?
ã¯ããããã㯠LLM ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããã®æšæºåãããã¢ãããŒããæäŸããŸããã誀解ãæãå¯èœæ§ããããŸããLarge Model Systems Organization ã¯ãåªãã LLM ãã³ãããŒã¯ã¯ã¹ã±ãŒã©ãã«ã§ãæ¯èŒçå°ãªã詊è¡åæ°ã§æ°ããã¢ãã«ãè©äŸ¡ã§ãããã¹ãŠã®ã¢ãã«ã«äžæã®ã©ã³ãã³ã°é åºãæäŸã§ããå¿ èŠããããšè¿°ã¹ãŠããŸããããããããã ãã§ã¯ååã§ã¯ãªãçç±ãããã€ããããŸãã以äžã«ããã€ãæããŸãã
ãã³ãããŒã¯æŒæŽ©
ããã¯ããããã±ãŒã¹ã§ããã¬ãŒãã³ã° ããŒã¿ããã¹ã ããŒã¿ãšéè€ããŠã誀ã£ãè©äŸ¡ãè¡ãããå Žåã«çºçããŸãããã¬ãŒãã³ã°äžã«ã¢ãã«ããã§ã«ããã€ãã®ãã¹ãåé¡ã«ééããŠããå Žåããã®çµæã¯å®éã®æ©èœãæ£ç¢ºã«åæ ããŠããªãå¯èœæ§ããããŸãããã ããçæ³çãªãã³ãããŒã¯ã¯ãèšæ¶ãæå°éã«æããçŸå®ã®ã·ããªãªãåæ ããå¿ èŠããããŸãã
è©äŸ¡ãã€ã¢ã¹
LLMãã³ãããŒã¯ã®ãªãŒããŒããŒãã¯ãæ§ã ãªã¿ã¹ã¯ã«ãããLLMã®ããã©ãŒãã³ã¹ãæ¯èŒããããã«äœ¿çšãããŸããããããã¢ãã«ã®æ¯èŒã«ãããã®ãªãŒããŒããŒãã«é Œãã®ã¯å±éºã§ãã 誀解ãæããã³ãããŒã¯ ãã¹ãã§è³ªåã®é åºã倿Žãããªã©ã®åçŽãªå€æŽã«ãããã¢ãã«ã®ã©ã³ãã³ã°ãæå€§ 8 äœãŸã§å€ããå¯èœæ§ããããŸãããŸããLLM ã¯æ¡ç¹æ¹æ³ã«ãã£ãŠããã©ãŒãã³ã¹ãç°ãªãå Žåããããè©äŸ¡ãã€ã¢ã¹ãèæ ®ããããšã®éèŠæ§ã匷調ãããŸãã
ãªãŒãã³ãšã³ã
çŸå®äžçã® LLM ã€ã³ã¿ã©ã¯ã·ã§ã³ã«ã¯ãæãŸãã AI åºåãçæããããã®ããã³ããã®èšèšãå«ãŸããŸããLLM åºåã¯ããã³ããã®æå¹æ§ã«äŸåãããã³ãããŒã¯ã¯ LLM ã®ã³ã³ããã¹ãèªèããã¹ãããããã«èšèšãããŠããŸãããã³ãããŒã¯ã¯ LLM ã®ã³ã³ããã¹ãèªèããã¹ãããããã«èšèšãããŠããŸãããå¿ ãããçŸå®äžçã®ããã©ãŒãã³ã¹ã«çŽæ¥ã€ãªããããã§ã¯ãããŸãããããšãã°ãLSAT ãªã©ã®ãã³ãããŒã¯ ããŒã¿ã»ããã§ 100% ã®ã¹ã³ã¢ãéæããã¢ãã«ã¯ãå®éã®ã¢ããªã±ãŒã·ã§ã³ã§åãã¬ãã«ã®ç²ŸåºŠãä¿èšŒãããã®ã§ã¯ãããŸãããããã¯ãLLM è©äŸ¡ã«ãããŠçŸå®äžçã®ã¿ã¹ã¯ã®ãªãŒãã³ãšã³ããªæ§è³ªãèæ ®ããããšã®éèŠæ§ã匷調ããŠããŸãã
å ç¢ãªLLMã®ããã®å¹æçãªè©äŸ¡
ãã³ãããŒã¯ã¯å¿ ããããã¹ãŠã®åé¡ã«äžè¬åã§ããããã§ã¯ãªããããå¿ ãããæè¯ã®éžæè¢ã§ã¯ãªãããšããåããããã ãããšæããŸããããããä»ã®æ¹æ³ããããŸãã
ã«ã¹ã¿ã ãã³ãããŒã¯
ãããã¯ãã¿ã¹ã¯åºæã®ã·ããªãªã«ãããç¹å®ã®åäœãæ©èœããã¹ãããã®ã«æé©ã§ããäŸãã°ãLLMãå»çåŸäºè åãã«èšèšãããŠããå Žåãå»ççŸå ŽããåéãããããŒã¿ã»ããã¯ãçŸå®äžçã®ã·ããªãªã广çã«åçŸããŸãããããã®ã«ã¹ã¿ã ãã³ãããŒã¯ã¯ããã¡ã€ã³åºæã®èšèªçè§£ãããã©ãŒãã³ã¹ããããŠåºæã®ã³ã³ããã¹ãèŠä»¶ã«çŠç¹ãåœãŠãããšãã§ããŸãããã³ãããŒã¯ãçŸå®äžçã®ã·ããªãªãšæŽåãããããšã§ãLLMãå šäœçã«åªããããã©ãŒãã³ã¹ãçºæ®ããæ³å®ãããç¹å®ã®ã¿ã¹ã¯ã«ãããŠåªããããã©ãŒãã³ã¹ãçºæ®ããããšãä¿èšŒã§ããŸããããã«ãããã¢ãã«ã®æ©èœã«ãããã®ã£ããã匱ç¹ãæ©æã«ç¹å®ãã察åŠããã®ã«åœ¹ç«ã¡ãŸãã
ããŒã¿æŒæŽ©æ€åºãã€ãã©ã€ã³
è©äŸ¡çµæã®æŽåæ§ãã瀺ããããã«ã¯ãããŒã¿æŒæŽ©ã®ãªããã³ãããŒã¯ãã€ãã©ã€ã³ãéåžžã«éèŠã§ãããã³ãããŒã¯ããŒã¿ãã¢ãã«ã®äºååŠç¿ã³ãŒãã¹ã«å«ãŸãããšããŒã¿æŒæŽ©ãçºçãã人çºçã«é«ãããã©ãŒãã³ã¹ã¹ã³ã¢ãçæãããŸãããããåé¿ããã«ã¯ããã³ãããŒã¯ããŒã¿ãäºååŠç¿ããŒã¿ãšçžäºåç §ããå¿ èŠããããŸããããã«ãéå»ã«ç¢ºèªãããæ å ±ãåé¿ããããã®å¯Ÿçãè¬ããå¿ èŠããããŸããããã«ã¯ãã¢ãã«ã®åŠç¿ãã€ãã©ã€ã³ãšã¯å¥ã«ä¿ç®¡ãããŠããç¬èªã®ããŒã¿ã»ããããæ°ãã«ãã¥ã¬ãŒã·ã§ã³ãããããŒã¿ã»ããã®äœ¿çšãå«ãŸããŸããããã«ãããåŸãããããã©ãŒãã³ã¹ææšãã¢ãã«ã®åªããæ±åèœåãåæ ããããšãä¿èšŒãããŸãã
人éã®è©äŸ¡
èªååãããææšã ãã§ã¯ãã¢ãã«ã®ããã©ãŒãã³ã¹ã®å šå®¹ãæããããšã¯ã§ããŸãããç¹ã«ãèšèªçè§£ãšçæãšããéåžžã«åŸ®åŠã§äž»èгçãªåŽé¢ã«é¢ããŠã¯ãªãããã§ãã人éã«ããè©äŸ¡ã®æ¹ããã¯ããã«åªããè©äŸ¡ãšãªããŸãã
- å°éå®¶ã®æ¡çš ç¹ã«å°éåéã«ãããŠã詳现ãã€ä¿¡é Œæ§ã®é«ãè©äŸ¡ãæäŸã§ããŸãã
- ã¯ã©ãŠããœãŒã·ã³ã°! Amazon Mechanical Turk ã®ãããªãã©ãããã©ãŒã ã䜿çšãããšã人éã®å€æ§ãªå€æãè¿ éãã€äœã³ã¹ãã§åéã§ããŸãã
- ã³ãã¥ããã£ã®ãã£ãŒãããã¯: ãŠãŒã¶ãŒãæç¥šããŠã¢ãã«ãæ¯èŒã§ãã LMSYS ãªãŒããŒããŒã ã¢ãªãŒããªã©ã®ãã©ãããã©ãŒã ã䜿çšãããšãæŽå¯åãããã«é«ãŸããŸããããšãã°ãLMSYS Chatbot Arena Hard ã¯ããŠãŒã¶ãŒãšã®çŽæ¥çãªããåããæç¥šãéããŠãããã ã¢ãã«éã®åŸ®åŠãªéããæµ®ã圫ãã«ããã®ã«ç¹ã«å¹æçã§ãã
ãŸãšãïŒ
è©äŸ¡ãšãã³ãããŒã¯ããªããã°ãLLM ãçŸå®äžçã®ã¿ã¹ã¯ãåŠçããèœåããç§ãã¡ãèããŠããã»ã©æ£ç¢ºã§é©çšå¯èœã§ãããã©ãããç¥ãæ¹æ³ã¯ãããŸãããããããåè¿°ããããã«ããã³ãããŒã¯ã¯ããããã§ãã¯ããããã®å®å šã«ç¢ºå®ãªæ¹æ³ã§ã¯ãªããLLM ã®ããã©ãŒãã³ã¹ã«ã®ã£ãããçããå¯èœæ§ããããŸããããã«ãããå®éã«äœæ¥ã«é©ãã LLM ã®éçºãé ããå¯èœæ§ããããŸãã
çæ³çãªäžçã§ã¯ãããããã¹ãã§ããLLM ã¯ãŠãŒã¶ãŒã®ã¯ãšãªãçè§£ããããã³ããã®ãšã©ãŒãèå¥ããæç€ºã©ããã«ã¿ã¹ã¯ãå®äºããä¿¡é Œæ§ã®é«ãåºåãçæããŸããçµæã¯ãã§ã«çŽ æŽããããã®ã§ãããçæ³çã§ã¯ãããŸãããããã§ãã¿ã¹ã¯åºæã®ãã³ãããŒã¯ãã人éã«ããè©äŸ¡ããã³ãããŒã¯æŒãã®æ€åºãšåæ§ã«éåžžã«åœ¹ç«ã€ããšã蚌æãããŸãããããã䜿çšããããšã§ãå®éã«å ç¢ãª LLM ãäœæããæ©äŒãåŸãããŸãã