Artificial Intelligence
æµå¯ŸçãªäŸã«ããèªç¶èšèªåŠçã·ã¹ãã ã®æ»æ

è±åœãšã«ããã®ç ç©¶è ãã¯ãèªç¶èšèªåŠç (NLP) ã·ã¹ãã ã«å¯Ÿããäžé£ã®ãã©ã㯠ããã¯ã¹åæµå¯Ÿæ»æãèæ¡ããŸãããããã¯ãGoogleãFacebookãIBMãMicrosoft ãåºãå°å ¥ããŠããã·ã¹ãã ãå«ããåºãæ®åããŠããèšèªåŠçãã¬ãŒã ã¯ãŒã¯ã«å¯ŸããŠå¹æçã§ãã
ãã®æ»æã¯ãæ©æ¢°åŠç¿ç¿»èš³ã·ã¹ãã ã«æå³ã®ãªããã®ã匷å¶ããããå®éã«ç¿»èš³ã®æ§è³ªã倿Žããããããããšã«ãã£ãŠãã·ã¹ãã ãæ©èœäžå šã«ããããã«äœ¿çšãããå¯èœæ§ããããŸãã NLP ã¢ãã«ã®ãã¬ãŒãã³ã°ã®ããã«ããã¯ã«ãæå®³ãªã³ã³ãã³ãã誀ã£ãŠåé¡ããããšãã€ã³ããã¯ã¹äœæã«æ¬ é¥ãçããããæ€çŽ¢ãšã³ãžã³ã®çµæãæ±æãããæ€çŽ¢ãšã³ãžã³ãã人ãå®å šã«èªã¿åããæªæã®ããã³ã³ãã³ããŸãã¯åŠå®çãªã³ã³ãã³ããèå¥ã§ããªãããã«ããããšãããã«ã¯ãNLP ãã¬ãŒã ã¯ãŒã¯ã«å¯ŸããŠãµãŒãã¹æåŠ (DoS) æ»æãåŒãèµ·ããããšããããŸãã
èè ãã¯ãè«æã§ææ¡ãããè匱æ§ããç 究察象ãšãªã£ãŠãã補åã®æäŸè ã§ããå¿åã®è€æ°ã®é¢ä¿è ã«é瀺ããŠãããã®ã®ãNLPæ¥çã¯æµå¯Ÿçæ»æã«å¯Ÿããé²åŸ¡ãé ããŠãããšèããŠãããè«æã«ã¯æ¬¡ã®ããã«èšãããŠããã
ãããã®æ»æã¯ãäžå¯èŠæåããã¢ã°ãªããšãã£ãèšèªã³ãŒãã£ã³ã°ã®ç¹åŸŽãæªçšããŸããéå»ã«ã¯ã¹ãã ããã£ãã·ã³ã°è©æ¬ºã§æ£çºçã«èŠããããã®ã®ãçŸåšå€§èŠæš¡ã«å°å ¥ãããŠããå€ãã®NLPã·ã¹ãã ã®èšèšè ã¯ããããã®ç¹åŸŽãå®å šã«ç¡èŠããŠããããã§ãã
ããã€ãã®æ»æã¯ãå¯èœãªéãããã©ãã¯ããã¯ã¹ããªç°å¢ã§å®è¡ãããŸãããã€ãŸããããŒã«ã«ã«ã€ã³ã¹ããŒã«ãããNLPãã¬ãŒã ã¯ãŒã¯ã®FOSSçã§ã¯ãªããMLaaSã·ã¹ãã ãžã®APIåŒã³åºããä»ããŠå®è¡ãããã®ã§ãããããã®ã·ã¹ãã ã®ç·åçãªæå¹æ§ã«ã€ããŠãèè ãã¯ä»¥äžã®ããã«è¿°ã¹ãŠããŸãã
ãã¹ãŠã®å®éšã¯ãç¡å¶éã®ã¢ãã«è©äŸ¡ãèš±å¯ããããã©ãã¯ããã¯ã¹èšå®ã§å®è¡ãããŸããããè©äŸ¡ãããã¢ãã«ã®éã¿ãç¶æ ãžã®ã¢ã¯ã»ã¹ã¯èš±å¯ãããŸããã ããã¯ãåçšã® Machine Learning-as-a-Service (MLaaS) 補åã«å¯Ÿããæ»æãå«ããã»ãŒãã¹ãŠã®èšå®ã§æ»æã®å¯èœæ§ãããæã匷åãªè åšã¢ãã«ã® XNUMX ã€ã衚ããŠããŸãã 調æ»ããããã¹ãŠã®ã¢ãã«ã¯ãç¥èŠã§ããªãæåæ»æã«å¯ŸããŠè匱ã§ããã
ããããã®æ»æã®é©çšå¯èœæ§ã¯ãçè«äžã¯é©åãªé²åŸ¡çãè¬ããããŠããªãããããããã¹ãããŒã¹ã® NLP ã¢ãã«ã«äžè¬åãããã¯ãã ãšç§ãã¡ã¯èããŠããŸããã
åœåŠæ ¡åºã® çŽ ãšããã¿ã€ãã«ã§ã æªãæå: ç¥èŠã§ããªã NLP æ»æãã±ã³ããªããžå€§åŠãšãšãã£ã³ãã©å€§åŠã® XNUMX åŠéšã® XNUMX 人ã®ç ç©¶è ãšãããã³ã倧åŠã® XNUMX 人ã®ç ç©¶è ã«ãããã®ã§ãã
è«æã®ã¿ã€ãã«ã¯æš¡ç¯çã§ãããç ç©¶è ãæ¡çšãã 4 ã€ã®äž»èŠãªæ»ææ¹æ³ã® 1 ã€ã®åºç€ãšãªããç¥èŠã§ããªããUnicode æåãæºèŒãããŠããã

è«æã®ã¿ã€ãã«ã«ãè¬ãé ãããŠããã
ã¡ãœãã
ãã®è«æã§ã¯ã次㮠XNUMX ã€ã®äž»ãªå¹æçãªæ»ææ¹æ³ãææ¡ããŠããŸãã èŠããªãæå; å圢æåãããã³ äžŠã¹æ¿ããããã¯ããã©ãã¯ããã¯ã¹ã·ããªãªã«ãããŠNLPãã¬ãŒã ã¯ãŒã¯ã«å¯ŸããŠåºç¯å²ã«åã¶ããšãç ç©¶è ã«ãã£ãŠå€æãããæ®éçãªãææ³ã§ããã åé€ ãã®æåã¯ããªãã¬ãŒãã£ã³ã° ã·ã¹ãã ã®ã¯ãªããããŒãã䜿çšããç¹æ®ãª NLP ãã€ãã©ã€ã³ã«ã®ã¿é©ããŠããããšãç ç©¶è ã«ãã£ãŠå€æããŸããã
1: èŠããªãæå
ãã®æ»æã§ã¯ãUnicode ã·ã¹ãã ã®ã°ãªãã«ããããããªããã©ã³ãå ã®ãšã³ã³ãŒããããæåã䜿çšãããŸãã Unicode ã·ã¹ãã ã¯é»åããã¹ããæšæºåããããã«èšèšãããŠãããçŸåšã§ã¯è€æ°ã®èšèªããã³èšå·ã°ã«ãŒãã«ããã 143,859 æåãã«ããŒããŠããŸãã ãããã®ãããã³ã°ã®å€ãã«ã¯ããã©ã³ãã«è¡šç€ºãããæåãå«ãŸããŸãã (åœç¶ã®ããšãªãããUnicode ã§èãããããã¹ãŠã®ãšã³ããªã®æåãå«ããããšã¯ã§ããŸãã)ã

è«æããã¯ãç®ã«èŠããªãæåã䜿çšããæ»æã®ä»®æ³çãªäŸã瀺ãããŠããŸããããã¯ãå
¥ååèªããèªç¶èšèªåŠçã·ã¹ãã ã«ãšã£ãŠäœã®æå³ãæããªãã»ã°ã¡ã³ãã«åå²ããããæ
éã«äœæãããå Žåã¯æ£ç¢ºãªç¿»èš³ã劚ããå¯èœæ§ããããŸãã ã«ãžã¥ã¢ã«ãªèªè
ã«ãšã£ãŠã¯ãã©ã¡ãã®å Žåãåæã¯æ£ãããã®ã§ãã åºå
žïŒhttps://arxiv.org/pdf/2106.09898.pdf
éåžžããããã®éæåã®ããããã䜿çšããŠãŒãå¹ ã®ã¹ããŒã¹ãäœæããããšã¯ã§ããŸãããã»ãšãã©ã®ã·ã¹ãã ã§ã¯ãèªèãããªãæåã衚ãããã«ããã¬ãŒã¹ãã«ããŒãèšå· (åè§åœ¢ãæãã®ããã¯ã¹å ã®çå笊ãªã©) ã衚瀺ãããããã§ãã
ããããè«æãè¿°ã¹ãŠããããã«ãçŸåšã®ã³ã³ãã¥ãŒãã£ã³ã°ã·ãŒã³ãæ¯é ããŠãããã©ã³ãã¯ã»ãã®äžæ¡ãã§ãããåœç¶ã®ããšãªããããããã®ãã©ã³ã㯠Unicode æšæºã«æºæ ããåŸåããããŸãã
ãã®ãããç ç©¶è ãã¡ã¯å®éšã«GNUã®Unifontã°ãªããéžæããŸãããããã¯ãUnicodeã®ãå ç¢ãªã«ãã¬ããžããšããçç±ã ãã§ãªããNLPã·ã¹ãã ã«å ¥åãããå¯èœæ§ã®ããä»ã®å€ãã®ãæšæºããã©ã³ããšèŠãç®ã䌌ãŠããããšãçç±ã§ããUnifontããçæãããäžå¯èŠæåã¯ã¬ã³ããªã³ã°ãããŸãããããã¹ããããNLPã·ã¹ãã ã§ã¯å¯èŠæåãšããŠã«ãŠã³ããããŸãã
ã¢ããªã±ãŒã·ã§ã³
è«æèªäœã®ãäœæããããã¿ã€ãã«ã«æ»ããšãéžæããããã¹ããã Google æ€çŽ¢ãå®è¡ããŠãæåŸ
ã©ããã®çµæãåŸãããªãããšãããããŸãã
ããã¯ã¯ã©ã€ã¢ã³ãåŽã®åœ±é¿ã§ããããµãŒããŒåŽãžã®åœ±é¿ã¯ããå°ãæ·±å»ã§ãã åçŽã¯æ¬¡ã®ããã«è¿°ã¹ãŠããã
ãããšãæåãããææžãæ€çŽ¢ãšã³ãžã³ã®ã¯ããŒã©ãŒã«ãã£ãŠã¯ããŒã«ããããšããŠãããã®ææžã®ã€ã³ããã¯ã¹äœæã«äœ¿çšãããçšèªã¯æåã®åœ±é¿ãåãããããæåãããŠããªãçšèªã§ã®æ€çŽ¢ã§ã¯ãã®ææžã衚瀺ãããå¯èœæ§ã¯äœããªããŸãã ãããã£ãŠãææžãæ€çŽ¢ãšã³ãžã³ãããäžç®çç¶ãã«é ãããšãå¯èœã§ãã
ãå¿çšäŸãšããŠãäžæ£ãªäŒæ¥ã財åå ±åæžå ã®åŠå®çãªæ å ±ãé èœããæ ªåŒã¢ããªã¹ãã䜿çšããå°éæ€çŽ¢ãšã³ãžã³ããã®æ å ±ãèŠã€ããããªãããã«ããããšãã§ããŸããã
ãäžå¯èŠæåãæ»æãããã»ã©å¹æçã§ãªãã£ãå¯äžã®ã·ããªãªã¯ãæå®³ã³ã³ãã³ããåºæè¡šçŸæœåºïŒNERïŒããããŠææ åæã¢ãã«ã«å¯Ÿãããã®ã§ãããèè ãã¯ãããã¯ã¢ãã«ãäžå¯èŠæåãå«ãããŒã¿ã§åŠç¿ãããããã¢ãã«ã®ããŒã¯ãã€ã¶ãŒïŒçã®èšèªå ¥åãã¢ãžã¥ãŒã«åãããã³ã³ããŒãã³ãã«åè§£ããïŒãæ¢ã«äžå¯èŠæåãç¡èŠããããã«èšå®ãããŠããããã ãšæšæž¬ããŠããŸãã
2: ãã¢ã°ãªã
ãã¢ã°ãªããšã¯ãå¥ã®æåã«äŒŒãæåã§ããæå³äžã®åŒ±ç¹ãå©çšã㊠2000 å¹Žã«æªçšãããŸããã è©æ¬ºã¬ããªã« PayPal æ¯æãåŠçãã¡ã€ã³ã®ã

ãã®è«æã®ä»®èª¬çãªäŸã§ã¯ããã¢ã°ãªãæ»æã«ãããäžè¬çãªã©ãã³æåãèŠèŠçã«åºå¥ã§ããªããã¢ã°ãªã (èµ€ã§å²ãŸããéšå) ã«çœ®ãæããããç¿»èš³ã®æå³ãå€åããŸãã
èè ã®ã³ã¡ã³ã*:
ãç§ãã¡ã¯ãæ©æ¢°åŠç¿ã¢ãã«ã次ã®ãããªåŠçãè¡ãããšãçºèŠããŸããã ãã¥ãŒã©ã«æ©æ¢°ç¿»èš³ã·ã¹ãã ãªã©ã®ãŠãŒã¶ãŒãæäŸããããã¹ãã¯ããã®ã¹ã¿ã€ã«ã®æ»æã«å¯ŸããŠç¹ã«è匱ã§ãã ããšãã°ãåžå ŽããªãŒããããµãŒãã¹ãèããŠã¿ãŸãããã Googleç¿»èš³ã æžãèŸŒã¿æã«ãããšããæååãå ¥åãããšããã€ãè±èªã§ãlã ãã·ã¢ã®ã¢ãã«ã§ã¯æ£ããåºåãããŸãããã€ãã«lãã§ããã眮ãæããŸã å ¥åå ã®ã©ãã³æå a ãšããªã«æå а ãпапаãïŒè±èªã§ã¯ãfatherãïŒã誀ã£ãŠåºåããŸãã'
ç ç©¶è ãã¯ãå€ãã® NLP ãã€ãã©ã€ã³ã¯ããã®èšèªåºæã®èŸæžå€ã«ããæåãã ïŒãäžæãïŒããŒã¯ã³ã®å Žåããã®å®å šå¯Ÿçãæ©èœããåã«ãæ±æãããããã¹ãããã€ãã©ã€ã³ã«åŒã³åºããœãããŠã§ã¢ããã»ã¹ãè©äŸ¡ã®ããã«æªç¥ã®åèªãäŒæããå¯èœæ§ããããèè ãã¯ããã® ãé©ãã»ã©å€§ããªæ»æå¯Ÿè±¡é åãéãããã.
3: äžŠã¹æ¿ã
Unicodeã§ã¯å·Šããå³ã«èšè¿°ããèšèªããµããŒããããŠãããé åºã¯Unicodeã®åæ¹åæ§ïŒããã£) ã¢ã«ãŽãªãºã ã ãããã£ãŠãåäžã®æååå ã«å³ããå·Šãžã®æåãšå·Šããå³ãžã®æåãæ··åšããããšã¯æ··ä¹±ãæããããUnicode ã§ã¯ç¹æ®ãªå¶åŸ¡æåã«ãã BIDI ã®ãªãŒããŒã©ã€ããèš±å¯ããããšã§ãããèæ ®ããŠããŸãã ãããã«ãããåºå®ããããšã³ã³ãŒãé åºã§ã»ãŒä»»æã®ã¬ã³ããªã³ã°ãå¯èœã«ãªããŸãã

è«æã®å¥ã®çè«çãªäŸã§ã¯ã翻蚳ã¡ã«ããºã ãééã£ãå³ããå·Š/å·Šããå³ã®ãšã³ã³ãŒãã£ã³ã°ã«åŸã£ãŠããããã翻蚳ãããããã¹ãã®ãã¹ãŠã®æåãééã£ãé åºã§é 眮ãããŸããããããããåœä»€ããæµå¯ŸçãªãœãŒã¹ããã¹ãïŒäžžã§å²ãã éšåïŒã
èè ãã¯ãè«æå·çæç¹ã§ã¯ããã®ææ³ã¯ Chromium ãŠã§ããã©ãŠã¶ãGoogle Chrome ãã©ãŠã¶ã®äžæµãœãŒã¹ãMicrosoft Edge ãã©ãŠã¶ããã®ä»å€æ°ã®ãã©ãŒã¯ã«ããã Unicode å®è£ ã«å¯ŸããŠæå¹ã§ãã£ããšè¿°ã¹ãŠããŸãã
ãŸãïŒ åé€
ãã®åŸã®çµæã°ã©ããæç¢ºã«ãªãããã«ãããã«å«ããŸãã åé€ æ»æã«ã¯ãããã¯ã¹ããŒã¹ãŸãã¯ãã®ä»ã®ããã¹ãã«åœ±é¿ãäžããã³ã³ãããŒã«/ã³ãã³ãã衚ãæåãå«ãŸããŸãããããã¯ãããã¹ã ãã¯ãã«äŒŒãã¹ã¿ã€ã«ã§èšèªèªã¿åãã·ã¹ãã ã«ãã£ãŠå¹æçã«å®è£ ãããŸãã
èè ãã¯æ¬¡ã®ããã«èгå¯ããŠããŸãã
'Unicode ã®å°æ°ã®å¶åŸ¡æåã«ããã 飿¥ããããã¹ããåé€ãããŸãã æãåçŽãªäŸã¯ãããã¯ã¹ããŒã¹ (BS) æåãšåé€ (DEL) æåã§ãã ãã£ãªããž ãªã¿ãŒã³ (CR) ã䜿çšãããšãããã¹ã ã¬ã³ããªã³ã° ã¢ã«ãŽãªãºã ãè¡ã®å é ã«æ»ãããã®å 容ãäžæžããããŸãã
'ããã« ããšãã°ããããã«ã¡ã¯ãã衚ããšã³ã³ãŒããããããã¹ã CRããããªã Worldãã¯ãGoodbyeããšããŠè¡šç€ºãããŸãã äžç"ã'
åè¿°ããããã«ããã®æ»æãæ©èœããã«ã¯äºå®äžãããããªãã¬ãã«ã®ã¢ã¯ã»ã¹ãå¿ èŠã§ãããã·ã¹ãã çã«ãã©ããã«ããããããã¯ãªããããŒãçµç±ã§ã³ã㌠ã¢ã³ã ããŒã¹ããããããã¹ããã€ãŸãçãã NLP ã€ã³ãžã§ã¹ã ãã€ãã©ã€ã³ã§ã®ã¿å®å šã«å¹æãçºæ®ããŸãã
ãšã«ããç ç©¶è ãã¡ã¯ããããã¹ãããŸãããããã®å®å®ãã仲éãšåçã®ããã©ãŒãã³ã¹ã瀺ããŸããã ãã ããæåã® XNUMX ã€ã®æ¹æ³ã䜿çšããæ»æã¯ãããã¥ã¡ã³ããŸã㯠Web ããŒãžãã¢ããããŒãããã ãã§å®è£ ã§ããŸã (æ€çŽ¢ãšã³ãžã³ã Web ã¹ã¯ã¬ã€ãã³ã° NLP ãã€ãã©ã€ã³ã«å¯Ÿããæ»æã®å Žå)ã

å逿»æã§ã¯ã现工ãããæåããã®åã«ãããã®ã广çã«æ¶å»ãããããŸãã¯åäžè¡ã®ããã¹ãã XNUMX çªç®ã®æ®µèœã«åŒ·å¶çã«æ¿å ¥ããŸãããã©ã¡ãã®å Žåããäžè¬ã®èªè ã«ã¯ãããæããã§ã¯ãããŸããã
çŸåšã® NLP ã·ã¹ãã ã«å¯Ÿããæå¹æ§
ç ç©¶è ãã¯ãFacebookãIBMãMicrosoftãGoogleãHuggingFace ã® XNUMX ã€ã®äººæ°ã®ããã¯ããŒãºããœãŒã¹ ã¢ãã«ãš XNUMX ã€ã®ãªãŒãã³ãœãŒã¹ ã¢ãã«ã«ããã£ãŠãããŸããŸãªéæšçåæ»æããã³æšçåæ»æãå®è¡ããŸããã
圌ãã¯ãŸããã¹ãããŸãã ãã¹ãã³ãžãæ»æ ã¢ãã«ã«å¯ŸããŠãã¹ãã³ãžæ»æã¯ãNLPã·ã¹ãã ã«å¯Ÿããäºå®äžã®DoSæ»æã§ãããå ¥åããã¹ãããèšç®äžèœããšãªãããã¬ãŒãã³ã°ã®é床ãèããäœäžããŸããããã¯éåžžãããŒã¿ã®ååŠçã«ãã£ãŠäžå¯èœã«ãªãã¯ãã®ããã»ã¹ã§ãã
è©äŸ¡ããã XNUMX ã€ã® NLP ã¿ã¹ã¯ã¯ãæ©æ¢°ç¿»èš³ãæå®³ãªã³ã³ãã³ãã®æ€åºãããã¹ã嫿åé¡ãåºæè¡šçŸèªèãã»ã³ãã¡ã³ãåæã§ããã
ãã¹ãã¯ãããããã Ubuntu äžã§ Intel Xeon Silver 100 CPU ãå®è¡ããäžç¹å®å€æ°ã® Tesla P4110 GPU ã§å®æœãããŸããã API åŒã³åºããè¡ãå Žåã«å©çšèŠçŽã«éåããªãããã«ãå®éšã¯æåããžã§ãã XNUMX (ãœãŒã¹ ããã¹ãã«åœ±é¿ãªã) ïœ XNUMX (æå€§ã®æ··ä¹±) ã§äžåŸã«ç¹°ãè¿ãããŸããã ç ç©¶è ãã¯ãããå€ãã®å埩ãèš±å¯ãããã°ãåŸãããçµæãè¶ ããå¯èœæ§ããããšäž»åŒµããŠããŸãã

Facebookã® ãã§ã¢ã»ã㯠EN-FRã¢ãã«ã

IBMã«å¯Ÿããæ»æã®çµæ ææ¯æååé¡åš Googleã® ããŒã¹ãã¯ãã£ã API.

Facebook ã® Fairseq ã«å¯Ÿãã 2 ã€ã®æ»æ: ãéæšçåãã¯æ··ä¹±ãçãã®ã«å¯Ÿãããæšçåãã¯ç¿»èš³ãããèšèªã®æå³ãå€ããããšãç®çãšããŠããã
ç ç©¶è ãã¯ãåæ§ã«ã人éãèªããã劚害ããã¹ããçæã§ããªãã£ãåŸæ¥ã®ãã¬ãŒã ã¯ãŒã¯ã«å¯ŸããŠèªãã®ã·ã¹ãã ãããã«ãã¹ãããã¹ãã«ã¹æ§ã®å€§ããªå©ç¹ãç¶æããªãããã·ã¹ãã ãããããšã»ãŒåçããããã¯å€§å¹ ã«åªããŠããããšãçºèŠããã
ãã¹ãŠã®ææ³ãæ»æãã¯ãã«ãã¿ãŒã²ããã®å¹³åæå¹æ§ã¯ãå®è¡ãããå埩ãã»ãšãã©ãªããçŽ 80% ã§æšç§»ããŠããŸãã
ç ç©¶è ãã¯çµæã«ã€ããŠæ¬¡ã®ããã«è¿°ã¹ãŠããŸãã
ããããããç§ãã¡ã®ç¥èŠã§ããªãæåæ»æã®æãææ ®ãã¹ãåŽé¢ã¯ããã®é©çšç¯å²ãåºãããšã§ããç§ãã¡ããã¹ãããããã¹ãããŒã¹ã® NLP ã·ã¹ãã ã¯ãã¹ãŠåœ±é¿ãåããŸããã å®éããŠãŒã¶ãŒãæå®ããããã¹ããå ¥åãšããŠåãèŸŒãæ©æ¢°åŠç¿ã¢ãã«ã¯ãçè«çã«ã¯ãã®æ»æã«å¯ŸããŠè匱ã§ãã
ãæµå¯Ÿç圱é¿ã¯ã¢ããªã±ãŒã·ã§ã³ãã¢ãã«ã«ãã£ãŠç°ãªãå ŽåããããŸããããã¹ãŠã®ããã¹ãããŒã¹ã®ã¢ãã«ã¯ãšã³ã³ãŒããããããã¹ãã«åºã¥ããŠãããã³ãŒãã£ã³ã°ãé©åã«å¶éãããªãéãããã¹ãŠã®ããã¹ãã¯æµå¯Ÿçãšã³ã³ãŒãã®å¯Ÿè±¡ãšãªããŸããã
ãŠãããŒãµã«å åŠåŒæåèªè?
ãããã®æ»æã¯ãäºå®äžUnicodeã®ãè匱æ§ãã«äŸåããŠãããå ¥åããã¹ãããã¹ãŠã©ã¹ã¿ã©ã€ãºããå åŠåŒæåèªèïŒOCRïŒããµãã¿ã€ãºåŠçãšããŠçšããNLââPãã€ãã©ã€ã³ã§ããã°ããããã®æ»æãæªç¶ã«é²ãããšãã§ããŸãããã®å Žåããããã®æ··ä¹±ããæ»æãèªã人ã ã«èŠãããæªæã®ãªãæå³ãNLPã·ã¹ãã ã«æž¡ãããããšã«ãªããŸãã
ããããç ç©¶è ãããã®çè«ããã¹ãããããã« OCR ãã€ãã©ã€ã³ãå®è£ ãããšãããBLEU (ãã€ãªã³ã¬ã«è©äŸ¡ã¢ã³ããŒã¹ã¿ãã£ãŒ) ã¹ã³ã¢ã¯ããŒã¹ã©ã€ã³ç²ŸåºŠã 6.2% äœäžããããããæ¹åããã«ã¯ OCR ãã¯ãããžãŒã®æ¹åãããããå¿ èŠã§ããããšã瀺åããŠããŸãã
ããã«ã圌ãã¯ãBIDI å¶åŸ¡æåãããã©ã«ãã§å ¥åããåé€ããçããå圢ç°çŸ©èªããããã³ã°ããŠã€ã³ããã¯ã¹ä»ããïŒåœŒãã¯ããããå°é£ãªäœæ¥ããšè¡šçŸããŠããŸãïŒãããŒã¯ãã€ã¶ãŒããã®ä»ã®åã蟌ã¿ã¡ã«ããºã ã§ç®ã«èŠããªãæåã«å¯Ÿæã§ããããã«ãã¹ãã ãšææ¡ããŠããŸãã
æåŸã«ãç ç©¶ã°ã«ãŒãã¯ãNLP åéã«å¯ŸããŠãçŸåšã³ã³ãã¥ãŒã¿ãŒ ããžã§ã³ç ç©¶ã§å€§ããªé¢å¿ãå¯ããããŠããåéã§ããæµå¯Ÿçæ»æã®å¯èœæ§ã«å¯ŸããŠãããèŠæããããæ±ããŠããŸãã
ãããã¹ãããŒã¹ã® NLP ã·ã¹ãã ãæ§ç¯ããã³å°å ¥ãããã¹ãŠã®äŒæ¥ã¯ãæªæã®ããè¡çºè ã«å¯ŸããŠèªç€Ÿã®ã¢ããªã±ãŒã·ã§ã³ãå ç¢ãªãã®ã«ãããã®ã§ããã°ããã®ãããªé²åŸ¡çãå®è£ ããããšããå§ãããŸããã
* ã€ã³ã©ã€ã³åŒçšããã€ããŒãªã³ã¯ã«å€æããŸãã
18 幎 08 æ 14 æ¥ 2021:XNUMX â IBM ã«é¢ããéè€ããèšåãåé€ããèªåå éšãªã³ã¯ãåŒçšããç§»å â MA