Artificial Intelligence
GPT-3 : èšèªã¢ãã«ã®å°æ°ã·ã§ããåŠç¿?

é廿°å¹ŽéãAI ããã³ ML æ¥çã§ã¯ãç ç©¶è ãéåžžã«æè»ã§ã¿ã¹ã¯ã«äŸåããªãæ¹æ³ã§äžæµã®ã¿ã¹ã¯ã転éããããã® NLP ãã©ã¯ãã£ã¹ãå®è£ ã§ããããã«ãªããNLP ã·ã¹ãã ã®éçºãšå¿çšãé£èºçã«å¢å ããŸããã
åœåã¯ã¯ãŒã ãã¯ãã«ã䜿çšããå局衚çŸã§ãããããã®åŸãã¿ã¹ã¯åºæã®ã¢ãŒããã¯ãã£ã«äŸçµŠãããŸãããæ¬¡ã«ãããè¯ã衚çŸã圢æããããã«å€å±€è¡šçŸãšã³ã³ããã¹ãç¶æ ã䜿çšãã RNN ã¢ãŒããã¯ãã£ã§ããããããŠæè¿ã§ã¯ããããã®ãããã¯ãŒã¯ã埮調æŽããããšã§ã¿ã¹ã¯åºæã®ã¢ãŒããã¯ãã£ã®å¿ èŠæ§ãå®å šã«æé€ãã転éèšèªã¢ãã«ãŸãã¯äºåãã¬ãŒãã³ã°ããããªã«ã¬ã³ã ã¢ãã«ãç»å ŽããŸããã
転éèšèªã¢ãã«ã¯ã質åãžã®åçãããã¹ãã®èªè§£ããããã¯ãããã¹ãã®å«æãªã©ã®å°é£ãªã¿ã¹ã¯ã§å€§ããªé²æ©ãããããããããNLP æ¥çã®å€§ããªè»¢æç¹ã§ããããšã蚌æãããŠããŸãã
ãã ãã転éèšèªã¢ãã«ã«ã¯ãã®å©ç¹ã«ãããããããã¿ã¹ã¯ã§æãŸããããã©ãŒãã³ã¹ãéæããã«ã¯ã¿ã¹ã¯åºæã®åŸ®èª¿æŽãŸãã¯ã¿ã¹ã¯åºæã®ããŒã¿ã»ãããå¿ èŠã§ããããã倧ããªå¶éããããŸãã ããã«ã転éèšèªã¢ãã«ã§ã¯ãéçºè ãç¹å®ã®ã¿ã¹ã¯ã«åºæã®æ°åäžã®äŸã«åãããŠããŒã¿ã»ããã埮調æŽããå¿ èŠããããŸãã
èšããŸã§ããªããã¿ã¹ã¯åºæã®ããŒã¿ã»ãããšã¿ã¹ã¯åºæã®åŸ®èª¿æŽã®èŠä»¶ãåãé€ãããšã¯éåžžã«æãŸãããããŸããŸãªçç±ãã NLP æ¥çã«ãšã£ãŠæçã§ãã
æ¢åã®äºåãã¬ãŒãã³ã°æžã¿è»¢éèšèªã¢ãã«ãŸãã¯ãªã«ã¬ã³ã ã¢ãã«ã®åé¡
- å®çšæ§ãšé©çšæ§ã®å¶é
äœããããŸããã¿ã¹ã¯ããšã«ã©ãã«ä»ãããŒã¿ãå«ãå€§èŠæš¡ãªããŒã¿ã»ãããå¿ èŠãªãããèšèªã¢ãã«ã®é©çšæ§ãšå®çšæ§ãå¶éãããŸãã èšèªã¢ãã«ã¯ãçç·šå°èª¬ã®äœæããææ³äžã®èª€ãã®ä¿®æ£ãæŠå¿µã«é¢ããäŸã®çæãŸã§ãå¹ åºãã¿ã¹ã¯ã«å¿çšã§ããŸãã å Žåã«ãã£ãŠã¯ãã©ãã«ä»ãããŒã¿ãå«ãå€§èŠæš¡ãªæåž«ä»ãããŒã¿ã»ãããåéããããšã¯ãç¹ã«åã ã®ã¿ã¹ã¯ããšã«ããã»ã¹ãç¹°ãè¿ãå¿ èŠãããå Žåã«ãå°é£ãªäœæ¥ãšãªãããšããããŸãã
- ãã¬ãŒãã³ã° ããŒã¿å ã®åœã®çžé¢ãæªçšãã
ã¢ãã«ã®è¡šçŸåãšçµåãããã¬ãŒãã³ã°ååžã®å¶éãšçãã«ããããã¬ãŒãã³ã° ããŒã¿å ã®åœã®çžé¢ãæªçšããå¯èœæ§ãæ ¹æ¬çã«å¢å€§ããå¯èœæ§ããããŸãã 転éèšèªã¢ãã«ã¯äºåãã¬ãŒãã³ã°äžã«å€§éã®æ å ±ãåžåããããã«èšèšãããŠããããããã¬ãŒãã³ã° ããŒã¿ãæªçšãããå¯èœæ§ããããšã埮調æŽããã³äºåãã¬ãŒãã³ã° ãã©ãã€ã äžã«åé¡ãçºçããå¯èœæ§ããããŸãã
ããã«ã以åã®ã¢ãã«ã®ç ç©¶ã§ã¯ãå€§èŠæš¡ãªã¢ãã«ãæ¯åããè¯ãååžå€çµæãããããããã§ã¯ãªãããšã瀺ãããŠããŸãã ããã«ããã®ãããªãã©ãã€ã ã®äžã§éæãããäžè¬åã¯ãäž»ã«ã¢ãã«ããã¬ãŒãã³ã° ããŒã¿ã«éåžžã«åºæã§ããããã¬ãŒãã³ã° ããŒã¿ã®ç¯å²ãè¶ ããç¶æ³ã§ã¯é©åã«ããã©ãŒãã³ã¹ãçºæ®ã§ããªããããããã©ãŒãã³ã¹ãäœäžããå¯èœæ§ãããããšã瀺ãããŠããŸãã
- 人éã®åŠç¿ãšã®æ¯èŒ
æåŸã«ã転移èšèªã¢ãã«ãšæ¯èŒãããšã人éã¯èšèªã¿ã¹ã¯ã®å€§éšåãåŠç¿ããéã«å€§èŠæš¡ãªãã¬ãŒãã³ã° ããŒã¿ã»ãããå¿ èŠãšããŸããã ã»ãšãã©ã®å Žåã人éãäžå®ã¬ãã«ã®ç«¶äºåãæã£ãŠèšèªã¿ã¹ã¯ãçè§£ããå®è¡ããã«ã¯ã人ã®èªç¶èšèªã«ããçãæç€ºããèšèªã¿ã¹ã¯ã®å°ããªãã¢ã³ã¹ãã¬ãŒã·ã§ã³ã§ååã§ãã
人éã®é©å¿èœåã«ã¯ãããŸããŸãªã¹ãã«ã»ãããåãæ¿ããããããããçµã¿åãããŠæ¹èšã®äžã§ããè¯ãããã©ãŒãã³ã¹ãçºæ®ã§ãããããå€ãã®å®çšçãªå©ç¹ããããŸããããã¯çŸåšã® NLP ã·ã¹ãã ã®èœåãè¶ ããŠããŸãã
ã¡ã¿ã©ãŒãã³ã°ãš GPT-3 ã§åé¡ã«åãçµã
äžèšã®èª²é¡ã«å¯Ÿããèãããã解決çã¯ãã¡ã¿åŠç¿ã®äœ¿çšã§ããããã¯ãã¢ãã«ããã¬ãŒãã³ã°äžã«ãã¿ãŒã³ãèªèããããã®ããå€§èŠæš¡ãã€åºç¯ãªã¹ãã«ãšèœåãéçºã§ããããã«ããçŸä»£ã® ML ã®æŠå¿µã§ããããã®åŸãå¹²æžäžã«åŠç¿ãããããã®èœåã䜿çšããŠé©å¿ããŸããè¿ éã«ããŸãã¯å¿ èŠãªã¿ã¹ã¯ãèªèããŸãã
ã¡ã¿åŠç¿ã¯ããããšåŒã°ããææ³ãä»ããŠèšèªã¢ãã« ã¢ãŒããã¯ãã£ã«å®è£ ãããŠããŸããã€ã³ã³ã³ããã¹ãåŠç¿ãã¯ãã¿ã¹ã¯ä»æ§ãšããŠäºåãã¬ãŒãã³ã°ãããèšèªã¢ãã«ã®ããã¹ãå ¥åã䜿çšããŸãã ãã®ããã»ã¹ã§ã¯ãã¢ãã«ã¯èªç¶èšèªåœä»€ãæ¡ä»¶ãšããããã€ãã®ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ã䜿çšããããšããããŸãããã®åŸãã¢ãã«ã¯æ¬¡ã®ã¹ããããäºæž¬ããããšã§æ®ãã®ã¿ã¹ã¯ãå®äºããããšãæåŸ ãããŸãã
ã¡ã¿åŠç¿ã®å¯äžã®å€§ããªåé¡ã¯ãã¡ã¿åŠç¿ã«ã¯ååããªå¯èœæ§ã瀺ãããŠãããã®ã®ãèªç¶èšèªã¢ãŒããã¯ãã£ã«ããã埮調æŽã¢ãããŒãã«ã¯ãŸã å£ã£ãŠãããèšèªã¿ã¹ã¯ãå æããå®çšçãªæ¹æ³ãšãªãããã«ã¯ãããªãæ¹åãå¿ èŠã§ãããšããããšã§ãã
ã¡ã¿åŠç¿ã«å ããŠã人æ°ãéããŠãããã XNUMX ã€ã®æ¹æ³ã¯ããã©ã³ã¹ãã©ãŒããŒèšèªã¢ãã«ã®å®¹éãå¢ããããšã§ãã é廿°å¹Žéã§ã転éã¢ãã«ã®å®¹éã¯å€§å¹ ã«å¢å ããŸããã RNSS18 100ååã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ã DCLT18 300ååã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ã RWC19 1.5 åã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ã SSP19 8 åã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ã RSR19 11ååã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ããã㊠TUR20 17åã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ã
æŽå²çã«ãã¢ãã«ã®å®¹éãå¢ããããã©ã¡ãŒã¿ãå¢ãããšããã¹ãåæãæ¹åãããäžæµã®ã¿ã¹ã¯ãšçžé¢ããå¯Ÿæ°æå€±ãã¹ã±ãŒã«ã«å¿ããŠé èª¿ã«æ¹åããåŸåã«ããããšã瀺ãããŠããŸãã
ããã§ã3 åãè¶ ãããã©ã¡ãŒã¿ãæã€ GPT-175 ã¢ãã«ãç»å ŽããŸãããããã¯ãçºå£²åœæãæã容éã®å€§ãã転éèšèªã¢ãã«ã§ããã æ¬¡ã« GPT-3 ã¢ãã«ã«ã€ããŠè©±ããŸãããã
GPT-3 ã¢ãã«ã®æŠèŠ
GPT-3 ã¯ã175 幎㫠OpenAI ã«ãã£ãŠãªãªãŒã¹ãããã2020 åãè¶ ãããã©ã¡ãŒã¿ãŒãåããèªå·±æ»æçãªèšèªã¢ãã«ã§ããGPT-3 ã¯ã å€§èŠæš¡ãªèšèªã¢ãã« GPT-2 ã¢ãã«ã¯ãåã¢ãã«ãšåæ§ã«ãç³ã¿èŸŒã¿ããŒã¹ã®ã¢ãŒããã¯ãã£ã䜿çšããŠããã¹ã ããŒã¿ãçæãããã³ãŒãå°çšã®æ·±å±€åŠç¿ãã©ã³ã¹ãã©ãŒã㌠ã¢ãã«ã§ãã
GPT-3 ã¢ãã«ã¯ãç¬èªã®ã³ã³ããã¹ãåŠç¿èœåãæž¬å®ãã3 ãè¶ ãã NLP ããŒã¿ã»ãããšè€æ°ã®æ°ããã¿ã¹ã¯ã§è©äŸ¡ãããŸãã åã ã®ã¿ã¹ã¯ããšã«ãGPT-3 ã¢ãã«ã¯ XNUMX ã€ã®æ¡ä»¶ã®äžã§è©äŸ¡ãããŸãã
- å°æ°ã®ã·ã§ããåŠç¿ãŸãã¯ã³ã³ããã¹ãå åŠç¿: å°æ°ã·ã§ããåŠç¿ã§ã¯ãGPT-3 ã¢ãã«ã¯ãã¢ãã«ã®ã³ã³ããã¹ã ãŠã£ã³ããŠã«é©åã«é©åã§ããéãå€ãã®ååžãèš±å¯ããŸãã
- ã¯ã³ã·ã§ããåŠç¿: ã¯ã³ã·ã§ããåŠç¿ã§ã¯ãã¢ãã«ã¯ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ã XNUMX åã ãèš±å¯ããŸãã
- ãŒãã·ã§ããåŠç¿: ãŒã ã·ã§ããåŠç¿ã§ã¯ããã¢ã³ã¹ãã¬ãŒã·ã§ã³ã¯ãªããã¢ãã«ã«äžããããèªç¶èšèªã«ããåœä»€ã®ã¿ããããŸãã
倧ãŸãã«èšãã°ã GPT-3 ã¢ãã« ãŒãã·ã§ããããã³ã¯ã³ã·ã§ããèšå®ã§ã¯æãŸããããã©ãŒãã³ã¹ãéæããæ°ã·ã§ããèšå®ã§ã¯ã»ãšãã©ã®å Žåãæå 端ã®è»¢éã¢ãã«ãäžåããŸãã ããã«ãGPT-3 ã¢ãã«ã¯ããªã³ã¶ãã©ã€æšè«ããã¹ãããããã«èšèšãããèªç¶èšèªã¿ã¹ã¯ããæã®åŸã«æ°ããåèªã䜿çšããããåèªã®ã¹ã¯ã©ã³ãã«ãè§£é€ããããç®è¡æŒç®ãå®è¡ããããããªã©ãè¿ éãªæ³šæãå¿ èŠãšããèªç¶èšèªã¿ã¹ã¯ã«ãããŠãã¯ã³ã·ã§ããããã³ãŒãã·ã§ããèšå®ã§è¯å¥œãªããã©ãŒãã³ã¹ãçºæ®ããŸãããªãã¬ãŒã·ã§ã³ã äžæ¹ãGPT-3 ã¢ãã«ãå°æ°ã·ã§ããèšå®ã§æäœãããšã人éã®è©äŸ¡è ãééãããšã人éã®å·çã«äŒŒãåæãã¥ãŒã¹èšäºãçæãããŸãã
GPT-3 ã¢ãã«: ã¢ãããŒã
GPT-3 ã¢ãã«ã¯ãã¢ãã«ãããŒã¿ããã¬ãŒãã³ã°ã§æ§æãããåŸæ¥ã®äºåãã¬ãŒãã³ã° ã¢ãããŒãã䜿çšããŠãããRWC-19 転éèšèªã¢ãã«ã«ç¶ãäºåãã¬ãŒãã³ã° ããã»ã¹ã«äŒŒãŠããŸãã GPT-3 ã¢ãã«ã¯ãã¢ãã« ãµã€ãºãããŒã¿ã»ãã ãµã€ãºãããŒã¿ã»ããã®å€æ§æ§ãã¹ã±ãŒã«ã¢ãããããã¬ãŒãã³ã°æéã®é·ããå»¶é·ããŸãã
ãã®ã¢ãã«ã¯ãŸããRWC-19 ã¢ãã«ã®ã¢ãããŒãã«åã³äŒŒãŠããã³ã³ããã¹ãå åŠç¿ã¢ãããŒãã䜿çšããŸãããããŒã¿ã»ããã®ã³ã³ããã¹ãå ã§åŠç¿ãã¿ãŒã³ã®ããŸããŸãªèšå®ã系統çã«æ¢çŽ¢ããããšã§ãå°ã調æŽããŠããŸãã
ãããã£ãŠããããã®èšå®ã調ã¹ãããšããå§ããŠãGTP-3 ã¢ãã«ãããŸããŸãªèšå®ã§ã©ã®ããã«åäœããããè©äŸ¡ããŸãããã
埮調æŽ
ã¢ãã«ã®åŸ®èª¿æŽã¯è»¢éã«ãããåŸæ¥ã®ã¢ãããŒãã§ãã èšèªã¢ãã«ãã®ã¢ãããŒãã«ã¯ãç®çã®ã¿ã¹ã¯ã«åºæã®æåž«ããããŒã¿ã»ããã§ã¢ãã«ããã¬ãŒãã³ã°ããããšã«ãã£ãŠãäºåãã¬ãŒãã³ã°ãããã¢ãã«ã®éã¿ãæŽæ°ããããšãå«ãŸããŠããããã®ããã»ã¹äžã«æ°åäžã®ã©ãã«ä»ããµã³ãã«ã䜿çšãããŸãã
埮調æŽã¢ãããŒãã¯ã倿°ã®ãã³ãããŒã¯ã«ããã£ãŠåªããããã©ãŒãã³ã¹ãè¿ããããæçã§ãã äžæ¹ã埮調æŽã¢ãããŒãã䜿çšããå Žåã®äž»ãªå¶éã¯ãåã ã®ã¿ã¹ã¯ããšã«æ°ããå€§èŠæš¡ãªããŒã¿ã»ãããå¿ èŠã§ããããã¬ãŒãã³ã° ããŒã¿ã»ããã®åœã®ç¹åŸŽãæªçšããå¯èœæ§ãããã人éã®ããã©ãŒãã³ã¹ãšäžå ¬å¹³ãªæ¯èŒãè¡ãããå¯èœæ§ãããããšã§ãã ãããã³é åžå€ã®äžè¬åãäžååã§ãã
GPT-3 ã¢ãã«ã®çŸåšã®ã¹ã³ãŒãã§ã¯ãã¿ã¹ã¯ã«äŸåããªãããã©ãŒãã³ã¹ã®ããã埮調æŽã¢ãããŒãã¯å®è£ ãããŠããŸããããå°æ¥çã«ã¯åŸ®èª¿æŽã GPT-3 ã¢ãã«ã«é©çšãããå¯èœæ§ããããŸãã
ãã¥ãŒã·ã§ãã
ãã¥ãŒã·ã§ãããšã¯ãã³ã³ãã£ã·ã§ãã³ã°ãšããŠå¹²æžäžã« GPT-3 ã¢ãã«ã«ã¿ã¹ã¯ã®ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ãæ°åäžããããããã¢ãã«ã®éã¿ã¯æŽæ°ãããªãèšå®ãæãçšèªã§ãã å°æ°ã®ã·ã§ããèšå®ã§ã¯ãããŒã¿ã»ããã«ã¯éåžžãã³ã³ããã¹ããšå¿ èŠãªè£å®ãå«ãäŸ (ãã©ã³ã¹èªã®æãšãã®è±èªç¿»èš³ãªã©) ãå«ãŸããŸãã æ°ã·ã§ããèšå®ã«ãããã¢ãã«ã«æ¬¡ã®ãããªå¹æãåŸãããŸãã K ã³ã³ããã¹ããšè£å®ã®äŸãäœæããã¢ãã«ã«æåŸã®ã³ã³ããã¹ãã XNUMX ã€æäŸããã¢ãã«ãè£å®ãæäŸããããšãæåŸ ããŸãã
å°æ°ã·ã§ããèšå®ã䜿çšããäž»ãªå©ç¹ã¯ãã¿ã¹ã¯åºæã®ããŒã¿ã®å¿ èŠæ§ãå€§å¹ ã«æžããçã埮調æŽãããå€§èŠæš¡ãªããŒã¿ã»ããããçãååžãåŠç¿ããå¯èœæ§ãæžãããšã§ãã äžæ¹ãå°æ°ã·ã§ããåŠç¿ã䜿çšããããšã®äž»ãªæ¬ ç¹ã¯ãå°æ°ã·ã§ããèšå®ã§åŸãããçµæãåºæºã«éããŠãããã埮調æŽãããä»ã®æå 端ã®ã¢ãã«ãšæ¯èŒããå Žåã«èããå£ãããšã§ãã
ã¯ã³ã·ã§ãã
ã¯ã³ã·ã§ããèšå®ã§ã¯ãã¢ãã«ã«ã¯ XNUMX åã®ãã¢ã®ã¿ãæäŸãããæ®ãã¯æ°ã·ã§ããèšå®ãšåæ§ã§ãã ã¯ã³ã·ã§ããèšå®ã転éèšèªã¢ãã«ã«é¢é£ããçç±ã¯ãXNUMX ã€ã®èšå®ãã¹ãŠã®äžã§ãã¿ã¹ã¯ã人éã«äŒéãããæ¹æ³ã«æããã䌌ãŠããã®ãã¯ã³ã·ã§ããã§ããããã§ãã ã»ãšãã©ã®ã¿ã¹ã¯ã§ã¯ãã¿ã¹ã¯ã®ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ã XNUMX åè¡ãã®ãäžè¬çã§ãããããããªããšã¿ã¹ã¯ã®ã³ã³ããã¹ããçè§£ããã®ãé£ãããªãå¯èœæ§ãããããã§ãã
ãŒãã·ã§ãã
ãŒãã·ã§ããèšå®ã§ã¯ããã¢ã³ã¹ãã¬ãŒã·ã§ã³ã¯ãªããã¢ãã«ã«ã¯ã¿ã¹ã¯ã説æããèªç¶èšèªã®æç€ºãäžããããŸãã ãŒã ã·ã§ããæ¹æ³ã¯ãæå€§éã®å©äŸ¿æ§ãæäŸããå ç¢ã§ã誀ã£ãçžé¢ãåé¿ããæ¹æ³ã§ãããXNUMX ã€ã®èšå®ãã¹ãŠã®äžã§æãé£ããæ¹æ³ã§ããããŸãã ãªããªããå Žåã«ãã£ãŠã¯ãç§ãã¡äººéã§ããæåã«ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ãèŠãã«ã¿ã¹ã¯ã®ã³ã³ããã¹ããçè§£ããã®ãé£ããå Žåãããããã§ãã
ãããã«ãããäžéšã®ã¿ã¹ã¯ã§ã¯ããŒãã·ã§ããèšå®ãã人éãèªç¶èšèªã¿ã¹ã¯ãå®è¡ããæ¹æ³ã«æãè¿ããã®ã«ãªããŸãã
äžã®å³ã¯ãè±èªã®æãååŸããŠãã©ã³ã¹èªã«ç¿»èš³ãããšããèªç¶èšèªã¿ã¹ã¯ãå®è¡ãããšãã®ãå°æ°ã·ã§ãããã¯ã³ã·ã§ãããããã³ãŒãã·ã§ããèšå®ãæ¯èŒããŠããŸãã
GPT-3: ã¢ãã« ã¢ãŒããã¯ãã£
GPT-3 ã¢ãã«ã¯ãGPT-2 ã¢ãã«ã§äœ¿çšãããŠãããã®ãšåãã¢ãŒããã¯ãã£ã䜿çšããŠããã代æ¿ã®äœ¿çšãé€ããŠãGPT ã¢ãã«ã§äœ¿çšãããŠããäºåæ£èŠåãä¿®æ£ãããåæåãããã³å¯éããŒã¯ã³åææ³ãå«ãŸããŠããŸãã Sparse Transformer ãšåæ§ã«ã屿çã«ãã³ãåãããçãªã¢ãã³ã·ã§ã³ ãã¿ãŒã³ãšããã©ã³ã¹å±€å ã®äº€äºã®å¯ãªå±€ã«å¯ŸããæŠç¥ã
ã¢ãã«ã®ããã©ãŒãã³ã¹ã®ã¢ãã« ãµã€ãºãžã®äŸåæ§ãç ç©¶ããããã«ãéçºè 㯠8 å 125 äžãã 175 åãè¶ ãããã©ã¡ãŒã¿ãŒãŸã§ã3 æ¡ä»¥äžã«ããã XNUMX ã€ã®ç°ãªãã¢ãã« ãµã€ãºããã¬ãŒãã³ã°ããŸããããã®ãã¡ã®æåŸã®ã¢ãã«ã¯ GPT-XNUMX ã¢ãã«ãšåŒã°ããŠããŸãã ã LLM ã¢ãã«ã«é¢é£ãããããŸã§ã®ç ç©¶ã§ã¯ãååãªéã®ãã¬ãŒãã³ã° ããŒã¿ã«ããæ€èšŒæå€±ã®ã¹ã±ãŒãªã³ã°ã¯ããµã€ãºã®é¢æ°ãšããŠã®è¿äŒŒçãªæ»ãããªã¹ãä¹åã§ããå¿ èŠãããããšã瀺ãããŠããŸãã ããŸããŸãªãµã€ãºã®ãã¬ãŒãã³ã° ã¢ãã«ã䜿çšãããšãéçºè ã¯äžæµã®èšèªã¿ã¹ã¯ãšæ€èšŒæå€±ã®äž¡æ¹ã«ã€ããŠä»®èª¬ããã¹ãã§ããŸãã
äžã®å³ã¯ãGPT-8 ã®éçºã«äœ¿çšããã 3 ã€ã®ç°ãªãã¢ãã«ã®ãµã€ãºãšã¢ãŒããã¯ãã£ãæ¯èŒããŠããŸãã ããã§ãn(params) ã¯ãã¬ãŒãã³ã°å¯èœãªãã¿ãŒã³ã®ç·æ°ãå®çŸ©ããn(layers) ã¯ã¢ãã«å ã®å±€ã®ç·æ°ãå®çŸ©ããd(model) ã¯ããã«ããã¯ã®åå±€ã®ãŠãããã®æ°ãå®çŸ©ããd(head) ã¯åã¢ãã³ã·ã§ã³ãããã®å¯žæ³ã åã¢ãã«ã®ã³ã³ããã¹ã ãŠã£ã³ããŠã¯ 2048 ããŒã¯ã³ãšåãã§ãã
ããã«ãããŒãéã®ããŒã¿è»¢éãæå°éã«æããããã«ãã¢ãã«ã¯æ¬¡å ã®æ·±ããšå¹ ã«æ²¿ã£ãŠ GPU éã§åå²ãããŸãã åã¢ãã«ã®ã¢ãŒããã¯ã㣠ãã©ã¡ãŒã¿ãŒã¯ãGPU å šäœã§ã®ã¢ãã«ã®ã¬ã€ã¢ãŠãã®ç²ŸåºŠãæå€§åããããã®èšç®å¹çãšè² è·åæ£ã«åºã¥ããŠéžæãããŠããŸãã
ãã¬ãŒãã³ã° ããŒã¿ã»ãã
éåžžãå€§èŠæš¡ãªèšèªã¢ãã«ã§ã¯ãæè¿ã®éçºã«ããå€§å¹ ã«æ¡åŒµãããããŒã¿ã»ããã䜿çšãããæçµçã«ã¯ 3 å ãè¶ ããç°ãªãåèªã§æ§æããã Common Crawl ããŒã¿ã»ããã«ãªããŸãã ããŒã¿ã»ããã®ãµã€ãºã¯ãåãã·ãŒã±ã³ã¹ãè€æ°åæŽæ°ããããšãªã GPT-XNUMX ã¢ãã«ããã¬ãŒãã³ã°ããã®ã«ååãªå€§ããã§ãã ãã ããç ç©¶ãšããã©ãŒãã³ã¹åæã«ãããšãå ±éã¯ããŒã« ããŒã¿ã»ããã®è»œããã£ã«ã¿ãªã³ã°ãããããŒãžã§ã³ãŸãã¯ãã£ã«ã¿ãªã³ã°ãããŠããªãããŒãžã§ã³ã¯ãããå³éžãããããŒã¿ã»ãããšæ¯èŒãããšå質ãäœãããšã瀺ãããŠããŸãã
ããŒã¿ã»ããã®å¹³åå質ã®åé¡ã«åãçµãããã«ãéçºè ã¯ããŒã¿ã»ããã®å質ãé«ããããã« 3 ã€ã®æé ãå®è¡ããŸããã
- éçºè ã¯ãé«å質ã®ãªãã¡ã¬ã³ã¹ ã³ãŒãã¹ãšåæ§ã®ç¯å²ã«åºã¥ããŠãCommon Crawl ããŒã¿ã»ããã®ããŒãžã§ã³ãããŠã³ããŒãããŠãã£ã«ã¿ãªã³ã°ããŸããã
- éçºè ã¯ãéå°é©åã®å¹æçãªæž¬å®ãšããŠä¿æãããæ€èšŒã»ããã®æŽåæ§ãç¶æããåé·æ§ãé²ãããã«ãããŒã¿ã»ããå šäœã«ããã£ãŠããã¥ã¡ã³ã ã¬ãã«ã§ãã¡ãžãŒè€è£œãå®è¡ããŸããã
- éçºè ã¯ãŸããé«å質ã®ãªãã¡ã¬ã³ã¹ ã³ãŒãã¹ããã¬ãŒãã³ã° ããŒã¿ã«è¿œå ããŠãCommon Crawl ããŒã¿ã»ããã匷åããããŒã¿ã»ããã®å€æ§æ§ãããã«é«ããŸããã
次ã®å³ã¯ãGPT-3 ã¢ãã«ã®ãã¬ãŒãã³ã°ã«äœ¿çšãããããŒã¿ã»ããã®æçµçãªå²åãŸãã¯æ··åã瀺ããŠããŸãã Common Crawl ããŒã¿ã¯ããã£ã«ã¿ãªã³ã°åã® 45 TB ãè¶ ããå¹³æã§æ§æãããŠããŸãããããã£ã«ã¿ãªã³ã°åŸã¯ 570 GB ã®ããŒã¿ã«æžå°ããŸãããããã¯ãããã 400 åãã€ã ãã¢ãè¶ ãããšã³ã³ãŒããããããŒã¯ã³ã«çžåœããŸãã é«å質ãšã¿ãªããããã¬ãŒãã³ã°å ã®ããŒã¿ã»ããã¯ããµã€ãºã«æ¯äŸããŠããŒã¿ã»ããããµã³ããªã³ã°ããã®ã§ã¯ãªããããå€ãã®é »åºŠã§ãµã³ããªã³ã°ãããããšã«æ³šæããŠãã ããã ãã®çµæãBooks2 ã Common Crawl ãªã©ã®ããŒã¿ã»ããã¯ãã¬ãŒãã³ã°äžã« XNUMX åæªæºãµã³ããªã³ã°ãããŸãããä»ã®ããŒã¿ã»ããã¯è€æ°åãµã³ããªã³ã°ãããŸãã ããã«ãããã¢ãã«ã¯ãããé«å質ã®ãã¬ãŒãã³ã° ããŒã¿ã§ã®ãã¬ãŒãã³ã°ãšåŒãæãã«ãå°éã®éåŠç¿ãåãå ¥ããããšãã§ããŸãã
倧éã®ã³ã³ãã³ããèšæ¶ããã³åŠç¿ããèœåãåããã倧éã®ã€ã³ã¿ãŒããã ããŒã¿ã§äºåãã¬ãŒãã³ã°ãããå€§èŠæš¡ãªèšèªã¢ãã«ã«é¢ããéå€§ãªæžå¿µã¯ãäºåã®å®è¡äžã«éçºã»ããããã¹ã ã»ãããèŠãããããšã§ãäžæµã®ã¿ã¹ã¯ãæ±æãããå¯èœæ§ãããããšã§ãããã¬ãŒãã³ã°ã®ããã»ã¹ã ãã®ãããªæœåšçãªæ±æãæžããããã«ãéçºè 㯠GPT-3 ã«ã€ããŠèª¿æ»ããããã³ãããŒã¯ã®ãã¹ãããã³éçºã»ãããšã®éè€ãæ€çŽ¢ãããããã®éè€ãåé€ããããšã詊ã¿ãŸããã
äžã®ç»åã¯ãGPT-3 ã¢ãã«ã®ãã¬ãŒãã³ã°äžã«äœ¿çšãããåèšã³ã³ãã¥ãŒãã£ã³ã°ã瀺ããŠããŸãã ãã®ã¢ãã«ã¯ããã¥ãŒã©ã«èšèªã¢ãã«ã®ã¹ã±ãŒãªã³ã°åã䜿çšããŠãéåžžãããå°ãªãããŒã¯ã³ã§ã¯ããã«å€§èŠæš¡ãªã¢ãã«ããã¬ãŒãã³ã°ããŸãã ãã®çµæãGPT-3 ã¢ãã«ãš GPT-10 ã¢ãã«ã® 3 åã® 50 ã§ãã RoBERTa-Large ã¢ãã«ã®äž¡æ¹ã§ãäºåãã¬ãŒãã³ã° ããã»ã¹äžã« XNUMX ãã¿ããããã¹/æ¥è¿ãã®ã³ã³ãã¥ãŒãã£ã³ã°ãå¿ èŠã§ããã
è©äŸ¡
å°æ°ã·ã§ããåŠç¿ã®å Žåãã¢ãã«ã¯ãæ¡ä»¶ä»ããšããŠãã®ã¿ã¹ã¯ã®ãã¬ãŒãã³ã° ããŒã¿ã»ããããã©ã³ãã ã« K åã®äŸãæœåºããããšã«ãã£ãŠãè©äŸ¡ããŒã¿ã»ããã«ååšããåäŸãè©äŸ¡ããã¿ã¹ã¯ã«å¿ã㊠1 ã€ãŸã㯠2 ã€ã®æ¹è¡ã§åºåããŸãã Storycloze ãš LAMBADA ã®å Žåãæåž«ãããã¬ãŒãã³ã° ã»ãããå©çšã§ããªããããã¢ãã«ã¯éçºã»ããããã³ã³ãã£ã·ã§ãã³ã° ãµã³ãã«ãæœåºãããã¹ã ã»ããã§è©äŸ¡ããŸãã Winograd ã®å ŽåãããŒã¿ã»ãã㯠XNUMX ã€ã ãååšãããããã³ã³ãã£ã·ã§ãã³ã° ãµã³ãã«ã¯ããããçŽæ¥æœåºãããŸãã
K ã«ã¯ã0 ããã¢ãã«ã®ã³ã³ããã¹ã ãŠã£ã³ããŠã§èš±å¯ãããæå€§é (n) ãŸã§ã®ç¯å²ã®ä»»æã®å€ãæå®ã§ããŸããEXT ãã¹ãŠã®ã¢ãã«ã§ = 2048 ã§ãããéåžžã¯çŽ 10 ïœ 100 åã®äŸã«é©åããŸãã K ã®å€ã倧ããã»ã©è¯ãçµæãåŸãããããšããããããŸãããåžžã«ãããšã¯éããŸãããã¢ãã«ã«ãã¹ã ã»ãããšå©çšå¯èœãªå¥ã®éçºã»ãããããå Žåãã¢ãã«ã¯éçºã»ããäžã® K ã®ããã€ãã®å€ãå®éšãããã®çµæã«åºã¥ããŠå®éšãè¡ããŸãã ããã¹ã ã»ããã§æè¯ã®å€ãå®è¡ããŸãã
ããã«ãè€æ°ã®éžæè¢ããæ£ããè£å®ãéžæããå¿ èŠãããã¿ã¹ã¯ã«ã€ããŠãéçºè ã¯ä¿®æ£ãšã³ã³ããã¹ãè£å®ã® K åã®äŸãæäŸãããã®åŸãã³ã³ããã¹ãã®ã¿ã® XNUMX ã€ã®äŸãæäŸããŠãã©ããŒã¢ããããã¿ã¹ã¯ã¯ LM 尀床ã«åºã¥ããŠæ¯èŒãããŸããããããã®å®æåºŠã ãã€ããªåé¡ãå¿ èŠãªã¿ã¹ã¯ã®å Žåãã¢ãã«ã¯å€ãã®å Žåãããæå³çã«ãããæå³ã®ããååãä»ããŠãªãã·ã§ã³ãæäŸãããã®ã¿ã¹ã¯ãè€æ°ã®éžæè¢ãšããŠæ±ããŸãããŸããå Žåã«ãã£ãŠã¯ãRSR ã¢ãã«ãšã¢ãŒããã¯ãã£ã«ãã£ãŠå®è¡ããããã®ãšåæ§ã«ã¿ã¹ã¯ããã¬ãŒã åããããšããããŸãã
èªç±åœ¢åŒã®å®äºãå¿ èŠãšããã¿ã¹ã¯ã®å Žåãã¢ãã«ã¯ãRSR ãã¬ãŒã ã¯ãŒã¯ã§äœ¿çšãããŠãããã®ãšåããã©ã¡ãŒã¿ãŒ (ããŒã ã®é·ã 4ãããã«ã㣠0.6) ã䜿çšããŠããŒã æ€çŽ¢ã䜿çšããŸãã æ¬¡ã«ãããŒã¿ã»ããã®æšæºã«å¿ããŠãF1 é¡äŒŒæ§ã¹ã³ã¢ãå®å šäžèŽããŸã㯠BLEU ã®ããããã䜿çšããŠã¢ãã«ã«ã¹ã³ã¢ãä»ããããŸãã
çµæ
äžã®å³ã¯ãåã®ã»ã¯ã·ã§ã³ã§èª¬æããããã«ãGPT-8 ã¢ãã« ã¢ãŒããã¯ãã£ã§äœ¿çšããã 3 ã€ã®ã¢ãã«ã®ãã¬ãŒãã³ã° ã«ãŒãã瀺ããŠããŸãã KMH èšèªã¢ãã«ã®çµæãšåæ§ã«ããã¬ãŒãã³ã° ã³ã³ãã¥ãŒãã£ã³ã°ã广çã«äœ¿çšãããšãGPT-3 ã¢ãã«ã®ããã©ãŒãã³ã¹ã¯é©åãªæ³åã«åŸããŸãã åŸåãããã« XNUMX æ¡æ¡å€§ããå Žåã«ã®ã¿ãæ³åãšã®ããããªéããçããŸãã ã¯ãã¹ãšã³ããããŒæå€±ã®æ¹åã¯ããã¬ãŒãã³ã° ã³ãŒãã¹ã®åœã®è©³çްãã¢ãã«åããçµæã§ãããããããªããšäººã ã¯æããããããŸããã ãã ããã¯ãã¹ãšã³ããããŒæå€±ã®æ¹åã«ãããããŸããŸãª NLP ã¿ã¹ã¯ã®åºç¯å²ã«ããã£ãŠå šäœçãªããã©ãŒãã³ã¹ãäžè²«ããŠåäžããŸãã
åºç¯å²ã®ãã¬ãŒãã³ã° ããŒã¿ã§ 8 ã€ã®ç°ãªãã¢ãã«ãè©äŸ¡ããåã«ãããŒã¿ã»ããã¯åæ§ã®ã¿ã¹ã¯ã衚ã 8 ã€ã®ç°ãªãã«ããŽãªã«ã°ã«ãŒãåãããŸãã ãããã®ã«ããŽãªã¯ã
- åŸæ¥ã®èšèªã¢ããªã³ã° ã¿ã¹ã¯ãããã³ Cloze ã¿ã¹ã¯ãæ/段èœè£å®ã¿ã¹ã¯ãªã©ã®èšèªã¢ããªã³ã°ã«äŒŒãã¿ã¹ã¯ã®è©äŸ¡ã
- ãã¯ããŒãºãããã¯ã質åå¿çã¿ã¹ã¯ã®è©äŸ¡ã
- ã¢ãã«ã®èšèªé翻蚳èœåã®è©äŸ¡ (ç¹ã«ã¯ã³ã·ã§ãããšãã¥ãŒã·ã§ãã)
- Winograd ã¹ããŒãã®ãããªã¿ã¹ã¯ã§ã®ã¢ãã«ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããŸãã
- åžžèçãªæšè«ã質åãžã®åçãå«ãããŒã¿ã»ããã®è©äŸ¡ã
- èªè§£èª²é¡ã®è©äŸ¡ãè¡ããŸãã
- SuperGLUE ãã³ãããŒã¯ ã¹ã€ãŒãã§è©äŸ¡ããŠããŸãã
- NLI ãæ¢çŽ¢ããã
èšèªã¢ããªã³ã°ãè£å®ãããã³ Cloze ã¿ã¹ã¯
ãã®ã»ã¯ã·ã§ã³ã§ã¯ãGPT-3 ã¢ãã«ã®ããã©ãŒãã³ã¹ããåŸæ¥ã®èšèªã¢ããªã³ã° ã¿ã¹ã¯ã ãã§ãªããé¢å¿ã®ãã XNUMX ã€ã®åèªã®äºæž¬ã段èœãæã®å®æããŸãã¯ããã¹ãã®äžéšã®å®æãå¿ èŠãšããã¿ã¹ã¯ã§ãè©äŸ¡ããŸãã ãããã«ã€ããŠç°¡åã«è©³ãã説æããŸãããã
èšèªã¢ããªã³ã°
GPT-3 ã¢ãã«ã¯ãPTB ãŸã㯠Penn Tree Bank ããŒã¿ã»ããã®ãŒãã·ã§ãã ããŒãã¬ãã·ãã£ãèšç®ããŸãã ãŠã£ãããã£ã¢é¢é£ã®ã¿ã¹ã¯ã¯ã¢ãã«ã®ãã¬ãŒãã³ã° ããŒã¿ã«æ¢ã«å«ãŸããŠãããããã¢ãã«ã§ã¯çç¥ãããŠããŸãããŸãããã¬ãŒãã³ã° ããŒã¿å ã®ããŒã¿ã»ããã«ããªãã®æ©æŠãçããããã3 ååèªã®ãã³ãããŒã¯ãçç¥ãããŠããŸãã ãã ããPTB ããŒã¿ã»ããã¯çŸä»£ã®ã€ã³ã¿ãŒããããããå€ããã®ã§ããå¯èœæ§ãããããããããã®åé¡ã«åãçµãã§ããŸãã GPT-15 ã¢ãã« ã¢ãŒããã¯ãã£ã®æå€§ã®ã¢ãã«ã¯ã泚ç®ãã¹ã 20.50 ãã€ã³ãã®å·®ã§ PTB ããŒã¿ã»ããã«æ°ãã SOTA ãèšå®ããXNUMX ã®è€é床ãéæããŸãã
ã©ã³ãã
LAMBADA ããŒã¿ã»ããã¯ã段èœãŸãã¯ããã¹ãå ã®é·è·é¢äŸåé¢ä¿ã«é¢ããã¢ãã«ã®ã¢ããªã³ã°ããã¹ãããããã«äœ¿çšãããŸãã ããã¯ãã¢ãã«ãã³ã³ããã¹ãã®æ®µèœãèªãã åŸãæã®æåŸã®åèªãäºæž¬ããããã«æ±ããããããšãæå³ããŸãã ããã«ãèšèªã¢ãã«ãç¶ç¶çã«ã¹ã±ãŒãªã³ã°ãããšããã³ãããŒã¯ã®åçãæžå°ããŸãã
GPT-3 ã¢ãã«ã¯ LAMBADA ã§ 76% ã®ç²ŸåºŠãéæãã以åã®æé«ã®ã¢ãã«ãšæ¯èŒã㊠8% 以äžã®åäžããããŸãã ããã«ãLAMBADA ã¢ãã«ã¯ãããŒã¿ã»ããã§å€å žçã«çºçããæ¹æ³ã§åé¡ã«å¯ŸåŠãããããå°æ°ã·ã§ããåŠç¿ã®æè»æ§ã瀺ããŠããŸãã LAMBADA ã§ã®æã®å®äºã¯éåžžãæã®æåŸã®åèªã§ãããèšèªã¢ãã«ã¯ãããèªèã§ããªããããæ£ããçµããã ãã§ãªããæ®µèœå ã®ä»ã®ç¶ç¶ã«ã確çãå²ãåœãŠãŸãã
ããã«ãGPT-3 ã¢ãã«ã«å ¥åããããµã³ãã«ãç¹å®ã®æ¹æ³ã§å€æŽããããšãã¢ãã«ã¯ 86% 以äžã®ç²ŸåºŠãè¿ããŸããããã¯ã以åã®ã¢ãã«ã«æ¯ã¹ãŠ 18% 以äžå¢å ããŸããã ããã«ãçµæã¯ãæ°ã·ã§ããèšå®ã«ãããã¢ãã«ã®ããã©ãŒãã³ã¹ãã¢ãã« ãµã€ãºã®å¢å ã«æ¯äŸããŠåäžããããšã瀺ããŸããã ãã®æŠç¥ã«ãããGPT-3 ã¢ãŒããã¯ãã£ã®æå°ã¢ãã«ã¯ 20% åæžãããŸããã3 ååã®ãã©ã¡ãŒã¿ãæã€äž»èŠãª GPT-175 ã¢ãã«ã®ç²ŸåºŠã¯ 10% åäžããŸãã
ã¯ããŒãºãããã¯ã®è³ªåãžã®åç
ã¯ããŒãºãããã¯è³ªååçã¯ãåºç¯ãªäºå®ç¥èã«åºã¥ããŠè³ªåã«åçãã GPT-3 ã¢ãã«ã®èœåãæž¬å®ãã詊ã¿ã§ãã ãã®ãããªè³ªåã«ã¯å€§éã®ã¯ãšãªãå«ãŸããããšãå€ãããããã®ã¿ã¹ã¯ã¯éåžžãã¢ãã«ãé¢é£ããããã¹ããèŠã€ããããããã«ããæ å ±æ€çŽ¢ã·ã¹ãã ãšãååŸããããã¹ãããåçã«å¯Ÿããå¿çãçæããæ¹æ³ãåŠç¿ããã¢ãã«ãçµã¿åãããŠäœ¿çšââããŠéæãããŸãã質åã
äžã®ç»åã¯ãGPT-3 ã¢ãã«ã®çµæãããŸããŸãªã¢ãã«ãšæ¯èŒããããŸããŸãªããŒã¿ã»ããã§å®è¡ãããã®ã§ãã TriviaQA ããŒã¿ã»ããã§ã¯ãã¢ãã«ã¯ãŒãã·ã§ããèšå®ã§ 64.3% ã®ç²ŸåºŠã¹ã³ã¢ãéæããã¯ã³ã·ã§ããèšå®ãšå°æ°ã·ã§ããèšå®ã§ã¯ãããã 68% ãš 71.2% ã®ç²ŸåºŠã¹ã³ã¢ãéæããŸããã
ãŒãã·ã§ããèšå®ã® GPT-3 ã¢ãã«ãã埮調æŽããã T5-11B ã¢ãã«ããã 14% 以äžåªããŠããããšãæããã«ããããŸãã
äžã®å³ã¯ãGPT-3 ã¢ãã«ã®ããã©ãŒãã³ã¹ãã¢ãã« ãµã€ãºã®å¢å ã«äŒŽã£ãŠé 調ã«åäžããŠããããšã瀺ããŠããŸãã ãã®ããã©ãŒãã³ã¹ã¯ãèšèªã¢ãã«ã®å®¹éãå¢å ããã«ã€ããŠãããŒã¿ã»ããããåŠç¿ãç¶ããŠããããšã瀺åããŠããŸãã
æçµçãªèã
GPT-3 ã¯ãèšèªã¢ãã«ãã§ããããšã®éçãæŒãåºããããšã«è²¢ç®ãããããGPT-3 㯠LLM æ¥çã«ãããé©åœçãªæ®µéã§ãã£ããšèšã£ãŠãéèšã§ã¯ãããŸããã GPT-3 ã«ãã£ãŠè¡ãããéçºãšå æãããé害ã«ãã£ãŠããããŸã§ã§æãå é²çã§æ£ç¢ºãªå€§èŠæš¡èšèªã¢ãã«ã§ãã GPT-4 ãžã®éãéãããŸããã