Artificial Intelligence
ããã¹ããã 3D ãžã® AI çæã®ä»çµã¿: Meta 3D GenãOpenAI Shap-E ãªã©

By
ã¢ãŒãŠã·ã¥ã»ãã¿ã« ãã¿ã«
ããã¹ãããã³ãããã3Dããžã¿ã«ã¢ã»ãããçæããæ©èœã¯ãAIãšã³ã³ãã¥ãŒã¿ã°ã©ãã£ãã¯ã¹ã®æè¿ã®æããšããµã€ãã£ã³ã°ãªéçºã®3ã€ã§ããXNUMXDããžã¿ã«ã¢ã»ããåžå Žã¯ã 28.3幎ã«ã¯2024å51.8äžãã«ã2029幎ã«ã¯XNUMXåXNUMXäžãã«ããã¹ããã 3D ãžã® AI ã¢ãã«ã¯ãã²ãŒã ãæ ç»ãe ã³ããŒã¹ãªã©ã®æ¥çå šäœã§ã³ã³ãã³ãäœæã«é©åœãèµ·ããäžã§éèŠãªåœ¹å²ãæããæºåãã§ããŠããŸãããããããããã® AI ã·ã¹ãã ã¯å ·äœçã«ã©ã®ããã«æ©èœããã®ã§ããããããã®èšäºã§ã¯ãããã¹ããã 3D ãžã®çæã®èåŸã«ããæè¡çãªè©³çްã詳ãã説æããŸãã
3Däžä»£ã®èª²é¡
ããã¹ããã 3D ã¢ã»ãããçæããã®ã¯ã2D ç»åã®çæãããã¯ããã«è€éãªäœæ¥ã§ãã2D ç»åã¯åºæ¬çã«ãã¯ã»ã«ã®ã°ãªããã§ããã3D ã¢ã»ããã§ã¯ããžãªã¡ããªããã¯ã¹ãã£ããããªã¢ã«ããããŠå€ãã®å Žåã¯ã¢ãã¡ãŒã·ã§ã³ã XNUMX 次å 空éã§è¡šçŸããå¿ èŠããããŸãããã®æ¬¡å æ§ãšè€éããå ããããšã§ãçæã¿ã¹ã¯ã¯ã¯ããã«å°é£ã«ãªããŸãã
ããã¹ããã 3D ãçæããéã®äž»ãªèª²é¡ã¯æ¬¡ã®ãšããã§ãã
- 3Dãžãªã¡ããªãšæ§é ã®è¡šçŸ
- 3D衚é¢å šäœã«äžè²«ãããã¯ã¹ãã£ãšãããªã¢ã«ãçæãã
- è€æ°ã®èŠç¹ããç©ççãªåŠ¥åœæ§ãšäžè²«æ§ã確ä¿ãã
- 现éšãšå šäœçãªæ§é ãåæã«æãã
- ç°¡åã«ã¬ã³ããªã³ã°ãŸãã¯3Dããªã³ãã§ããã¢ã»ãããçæãã
ãããã®èª²é¡ã«å¯ŸåŠããããã«ãããã¹ããã 3D ã¢ãã«ãžã®å€æã§ã¯ãããã€ãã®éèŠãªãã¯ãããžãŒãšææ³ãæŽ»çšãããŸãã
ããã¹ããã3Dãžã®å€æã·ã¹ãã ã®äž»èŠã³ã³ããŒãã³ã
æå 端ã®ããã¹ããã 3D ãžã®çæã·ã¹ãã ã®ã»ãšãã©ã¯ãããã€ãã®ã³ã¢ ã³ã³ããŒãã³ããå ±æããŠããŸãã
- ããã¹ããšã³ã³ãŒãã£ã³ã°: å ¥åããã¹ãããã³ãããæ°å€è¡šçŸã«å€æãã
- 3D衚çŸ: 3D圢ç¶ãšå€èгã衚çŸããæ¹æ³
- çæã¢ãã«: 3Dã¢ã»ãããçæããããã®ã³ã¢AIã¢ãã«
- ã¬ã³ããªã³ã°: 3D衚çŸã2Dç»åã«å€æããŠèŠèŠåãã
ããããã«ã€ããŠè©³ããèŠãŠãããŸãããã
ããã¹ããšã³ã³ãŒãã£ã³ã°
æåã®ã¹ãããã¯ãå ¥åããã¹ãããã³ãããAIã¢ãã«ãåŠçã§ããæ°å€è¡šçŸã«å€æããããšã§ããããã¯éåžžãæ¬¡ã®ãããªå€§èŠæš¡ãªèšèªã¢ãã«ã䜿çšããŠè¡ãããŸãã BERT ãŸã㯠GPT.
3D衚çŸ
AI ã¢ãã«ã§ 3D ãžãªã¡ããªã衚çŸããäžè¬çãªæ¹æ³ã¯ããã€ããããŸãã
- ãã¯ã»ã«ã°ãªãã: å æçãŸãã¯ç¹åŸŽã衚ãå€ã®3Dé å
- ç¹çŸ€: 3Dç¹ã®éå
- ã¡ãã·ã¥: 衚é¢ãå®çŸ©ããé ç¹ãšé¢
- æé»çãªé¢æ°: 衚é¢ãå®çŸ©ããé£ç¶é¢æ°ïŒäŸïŒç¬Šå·ä»ãè·é¢é¢æ°ïŒ
- ç¥çµæŸå°å Ž ïŒNeRFïŒ: 3D空éã®å¯åºŠãšè²ã衚çŸãããã¥ãŒã©ã«ãããã¯ãŒã¯
ãããããè§£å床ãã¡ã¢ãªäœ¿çšéãçæã®å®¹æãã®ç¹ã§ãã¬ãŒããªãããããŸããæè¿ã®ã¢ãã«ã®å€ãã¯ãåççãªèšç®èŠä»¶ã§é«å質ã®çµæãå¯èœã«ãããããæé»é¢æ°ãŸã㯠NeRF ã䜿çšããŠããŸãã
ããšãã°ãåçŽãªçã笊å·ä»ãè·é¢é¢æ°ãšããŠè¡šãããšãã§ããŸãã
import numpy as np def sphere_sdf(x, y, z, radius=1.0): return np.sqrt(x**2 + y**2 + z**2) - radius # Evaluate SDF at a 3D point point = [0.5, 0.5, 0.5] distance = sphere_sdf(*point) print(f"Distance to sphere surface: {distance}")
çæã¢ãã«
ããã¹ããã3Dãžã®ã·ã¹ãã ã®æ žãšãªãã®ã¯ãããã¹ããã3D衚çŸãçæããçæã¢ãã«ã§ãã ããã¹ãåãèŸŒã¿æå 端ã®ã¢ãã«ã®ã»ãšãã©ã¯ã2D ç»åçæã§äœ¿çšããããã®ãšåæ§ã®æ¡æ£ã¢ãã«ã®äœããã®ããªãšãŒã·ã§ã³ã䜿çšããŠããŸãã
æ¡æ£ã¢ãã«ã¯ãããŒã¿ã«åŸã ã«ãã€ãºã远å ãããã®ããã»ã¹ãéã«åŠç¿ããããšã«ãã£ãŠæ©èœããŸãã3D çæã®å Žåããã®ããã»ã¹ã¯éžæããã 3D 衚çŸã®ç©ºéã§çºçããŸãã
æ¡æ£ã¢ãã«ã®ãã¬ãŒãã³ã°æé ã®ç°¡ç¥åãããç䌌ã³ãŒãã¯æ¬¡ã®ããã«ãªããŸãã
def diffusion_training_step(model, x_0, text_embedding): # Sample a random timestep t = torch.randint(0, num_timesteps, (1,)) # Add noise to the input noise = torch.randn_like(x_0) x_t = add_noise(x_0, noise, t) # Predict the noise predicted_noise = model(x_t, t, text_embedding) # Compute loss loss = F.mse_loss(noise, predicted_noise) return loss # Training loop for batch in dataloader: x_0, text = batch text_embedding = encode_text(text) loss = diffusion_training_step(model, x_0, text_embedding) loss.backward() optimizer.step()
çæäžã¯ãçŽç²ãªãã€ãºããéå§ããããã¹ãã®åã蟌ã¿ã«å¿ããŠç¹°ãè¿ããã€ãºé€å»ãè¡ããŸãã
ã¬ã³ããªã³ã°
çµæãèŠèŠåãããã¬ãŒãã³ã°äžã®æå€±ãèšç®ããã«ã¯ã3D 衚çŸã 2D ç»åã«ã¬ã³ããªã³ã°ããå¿ èŠããããŸããããã¯éåžžãåŸé ãã¬ã³ããªã³ã° ããã»ã¹ãéããŠéæµã§ããããã«ãã埮åå¯èœãªã¬ã³ããªã³ã°ææ³ã䜿çšããŠè¡ãããŸãã
ã¡ãã·ã¥ããŒã¹ã®è¡šçŸã®å Žåãã©ã¹ã¿ã©ã€ãºããŒã¹ã®ã¬ã³ãã©ãŒã䜿çšããå ŽåããããŸãã
import torch import torch.nn.functional as F import pytorch3d.renderer as pr def render_mesh(vertices, faces, image_size=256): # Create a renderer renderer = pr.MeshRenderer( rasterizer=pr.MeshRasterizer(), shader=pr.SoftPhongShader() ) # Set up camera cameras = pr.FoVPerspectiveCameras() # Render images = renderer(vertices, faces, cameras=cameras) return images # Example usage vertices = torch.rand(1, 100, 3) # Random vertices faces = torch.randint(0, 100, (1, 200, 3)) # Random faces rendered_images = render_mesh(vertices, faces)
NeRF ã®ãããªæé»çãªè¡šçŸã®å Žåãéåžžã¯ã¬ã€ããŒãã³ã°ææ³ã䜿çšããŠãã¥ãŒãã¬ã³ããªã³ã°ããŸãã
ãã¹ãŠããŸãšãã: ããã¹ããã 3D ãžã®ãã€ãã©ã€ã³
äž»èŠãªã³ã³ããŒãã³ãã«ã€ããŠèª¬æããã®ã§ã次ã«ãå žåçãªããã¹ããã 3D ãžã®çæãã€ãã©ã€ã³ã§ããããã©ã®ããã«çµã¿åããããããèŠãŠãããŸãããã
- ããã¹ããšã³ã³ãŒãã£ã³ã°: å ¥åããã³ããã¯ãèšèªã¢ãã«ã䜿çšããŠå¯ãªãã¯ãã«è¡šçŸã«ãšã³ã³ãŒããããŸãã
- åæäžä»£: ããã¹ãåã蟌ã¿ãæ¡ä»¶ãšããæ¡æ£ã¢ãã«ã¯ãåæã® 3D è¡šçŸ (NeRF ãŸãã¯æé»ã®é¢æ°ãªã©) ãçæããŸãã
- ãã«ããã¥ãŒã®äžè²«æ§: ã¢ãã«ã¯ãçæããã 3D ã¢ã»ããã®è€æ°ã®ãã¥ãŒãã¬ã³ããªã³ã°ããèŠç¹éã®äžè²«æ§ã確ä¿ããŸãã
- æŽç·Ž: 远å ã®ãããã¯ãŒã¯ã«ããããžãªã¡ããªãæ¹è¯ãããããã¯ã¹ãã£ã远å ãããã詳现ã匷åãããããããšãã§ããŸãã
- æçµåºå: 3D 衚çŸã¯ãäžæµã®ã¢ããªã±ãŒã·ã§ã³ã§äœ¿çšããããã«ãå¿ èŠãªåœ¢åŒ (ãã¯ã¹ã㣠ã¡ãã·ã¥ãªã©) ã«å€æãããŸãã
ã³ãŒãå ã§ãããã©ã®ããã«è¡šç€ºããããã瀺ãç°¡ç¥åãããäŸã次ã«ç€ºããŸãã
class TextTo3D(nn.Module): def __init__(self): super().__init__() self.text_encoder = BertModel.from_pretrained('bert-base-uncased') self.diffusion_model = DiffusionModel() self.refiner = RefinerNetwork() self.renderer = DifferentiableRenderer() def forward(self, text_prompt): # Encode text text_embedding = self.text_encoder(text_prompt).last_hidden_state.mean(dim=1) # Generate initial 3D representation initial_3d = self.diffusion_model(text_embedding) # Render multiple views views = self.renderer(initial_3d, num_views=4) # Refine based on multi-view consistency refined_3d = self.refiner(initial_3d, views) return refined_3d # Usage model = TextTo3D() text_prompt = "A red sports car" generated_3d = model(text_prompt)
å©çšå¯èœãªãããããã¹ããã3Dã¢ã»ããã¢ãã«
3DGen â ã¡ã¿
3Dãžã§ãã¬ãŒã·ã§ã³ ããã¹ãã®èª¬æãããã£ã©ã¯ã¿ãŒãå°éå ·ãã·ãŒã³ãªã©ã® 3D ã³ã³ãã³ããçæãããšããåé¡ã«å¯ŸåŠããããã«èšèšãããŠããŸãã

å€§èŠæš¡èšèªãšããã¹ããã3Dã¢ãã«ãž â 3D-ãžã§ãã¬ãŒã·ã§ã³
3DGen ã¯ãçŸå®äžçã®ã¢ããªã±ãŒã·ã§ã³ã§ãªã¢ã«ãª 3D ã¢ã»ããã®åç §æãè¡ãããã«äžå¯æ¬ ãªç©çããŒã¹ ã¬ã³ããªã³ã° (PBR) ããµããŒãããŠããŸãããŸããæ°ããããã¹ãå ¥åã䜿çšããŠã以åã«çæããã 3D ã·ã§ã€ããã¢ãŒãã£ã¹ããäœæãã 3D ã·ã§ã€ãã®çæçãªåãã¯ã¹ãã£ãªã³ã°ãå¯èœã«ããŸãããã€ãã©ã€ã³ã«ã¯ãããããããã¹ããã 3D ãžã®çæãšããã¹ããããã¯ã¹ãã£ãžã®çæãåŠçãã Meta 3D AssetGen ãš Meta XNUMXD TextureGen ãšãã XNUMX ã€ã®ã³ã¢ ã³ã³ããŒãã³ããçµ±åãããŠããŸãã
ã¡ã¿ 3D ã¢ã»ãããžã§ãã¬ãŒã·ã§ã³
Meta 3D AssetGen (Siddiqui ä»ã2024) ã¯ãããã¹ã ããã³ãããã 3D ã¢ã»ããã®åæçæãæ åœããŸãããã®ã³ã³ããŒãã³ãã¯ããã¯ã¹ãã£ãš PBR ãããªã¢ã« ããããå«ã 3D ã¡ãã·ã¥ãçŽ 30 ç§ã§çæããŸãã
ã¡ã¿ 3D ãã¯ã¹ã㣠ãžã§ãã¬ãŒã¿
Meta 3D TextureGen (Bensadoun ä»ã2024) ã¯ãAssetGen ã«ãã£ãŠçæããããã¯ã¹ãã£ãæ¹è¯ããŸãããŸãã远å ã®ããã¹ãèšè¿°ã«åºã¥ããŠãæ¢åã® 3D ã¡ãã·ã¥ã®æ°ãããã¯ã¹ãã£ãçæããããã«ã䜿çšã§ããŸãããã®æ®µéã«ã¯çŽ 20 ç§ããããŸãã
ãã€ã³ãE (OpenAI)
OpenAIãéçºããPoint-Eã¯ããã 3 ã€ã®æ³šç®ãã¹ãããã¹ããã 3D ãžã®çæã¢ãã«ã§ããNeRF 衚çŸãçæãã DreamFusion ãšã¯ç°ãªããPoint-E 㯠XNUMXD ãã€ã³ã ã¯ã©ãŠããçæããŸãã
Point-E ã®äž»ãªç¹åŸŽ:
a) 2段éãã€ãã©ã€ã³Point-E ã¯ããŸãããã¹ãããç»åãžã®æ¡æ£ã¢ãã«ã䜿çšããŠåæ 2D ãã¥ãŒãçæããæ¬¡ã«ãã®ç»åã䜿çšã㊠3D ãã€ã³ã ã¯ã©ãŠããçæãã XNUMX çªç®ã®æ¡æ£ã¢ãã«ã調æŽããŸãã
b) å¹çåPoint-E ã¯èšç®å¹çãé«ããåäžã® GPU ã§æ°ç§ä»¥å ã« 3D ãã€ã³ã ã¯ã©ãŠããçæã§ããããã«èšèšãããŠããŸãã
c) è²æ å ±: ã¢ãã«ã¯ã幟äœåŠçæ å ±ãšå€èгæ å ±ã®äž¡æ¹ãä¿æããªãããè²ä»ãã®ç¹çŸ€ãçæã§ããŸãã
å¶éäºé ïŒ
- ã¡ãã·ã¥ããŒã¹ãNeRFããŒã¹ã®ã¢ãããŒãã«æ¯ã¹ãŠå¿ å®åºŠãäœã
- ãã€ã³ãã¯ã©ãŠãã¯ãå€ãã®äžæµã¢ããªã±ãŒã·ã§ã³ã§è¿œå ã®åŠçãå¿ èŠãšãã
Shap-E (OpenAI):
OpenAIã¯Point-Eãåºã«ã㊠ã·ã£ãŒã-E ã¯ããã€ã³ã ã¯ã©ãŠãã®ä»£ããã« 3D ã¡ãã·ã¥ãçæããŸããããã«ãããèšç®å¹çãç¶æããªãããPoint-E ã®ããã€ãã®å¶éã«å¯ŸåŠããŸãã
Shap-E ã®äž»ãªç¹åŸŽ:
a) æé»ç衚çŸShap-E ã¯ã3D ãªããžã§ã¯ãã®æé»çãªè¡šçŸ (笊å·ä»ãè·é¢é¢æ°) ãçæããããšãåŠç¿ããŸãã
b) ã¡ãã·ã¥æœåº: ãã®ã¢ãã«ã¯ãããŒãã³ã° ãã¥ãŒã ã¢ã«ãŽãªãºã ã®åŸ®åå¯èœãªå®è£ ã䜿çšããŠãæé»çãªè¡šçŸãããªãŽã³ ã¡ãã·ã¥ã«å€æããŸãã
c) ãã¯ã¹ãã£çæShap-E 㯠3D ã¡ãã·ã¥ã®ãã¯ã¹ãã£ãçæã§ãããããèŠèŠçã«é åçãªåºåãåŸãããŸãã
Advantages:
- çææéãçãïŒæ°ç§ããæ°åïŒ
- ã¬ã³ããªã³ã°ãäžæµã®ã¢ããªã±ãŒã·ã§ã³ã«é©ããçŽæ¥ã¡ãã·ã¥åºå
- ãžãªã¡ããªãšãã¯ã¹ãã£ã®äž¡æ¹ãçæããæ©èœ
GET3D (NVIDIA):
GET3Dã¯ãNVIDIA ã®ç ç©¶è ã«ãã£ãŠéçºããããé«å質ã®ãã¯ã¹ãã£ä»ã 3D ã¡ãã·ã¥ã®çæã«éç¹ã眮ããããã 3 ã€ã®åŒ·åãªããã¹ããã XNUMXD ãžã®çæã¢ãã«ã§ãã
GET3Dã®äž»ãªæ©èœ:
a) æç€ºçãªè¡šé¢è¡šçŸ: DreamFusion ã Shap-E ãšã¯ç°ãªããGET3D ã¯äžéã®æé»çãªè¡šçŸãªãã§æç€ºçãªãµãŒãã§ã¹è¡šçŸ (ã¡ãã·ã¥) ãçŽæ¥çæããŸãã
b) ãã¯ã¹ãã£çæ: ãã®ã¢ãã«ã«ã¯ã3D ã¡ãã·ã¥ã®é«å質ãªãã¯ã¹ãã£ãåŠç¿ããŠçæããããã®åŸ®åå¯èœãªã¬ã³ããªã³ã°ææ³ãå«ãŸããŠããŸãã
c) GANããŒã¹ã®ã¢ãŒããã¯ãã£: GET3D ã¯çæçæµå¯Ÿãããã¯ãŒã¯ (GAN) ã¢ãããŒãã䜿çšããŠãããã¢ãã«ã®ãã¬ãŒãã³ã°ãå®äºãããšé«éçæãå¯èœã«ãªããŸãã
Advantages:
- é«å質ã®ãžãªã¡ããªãšãã¯ã¹ãã£
- æšè«æéãéã
- 3Dã¬ã³ããªã³ã°ãšã³ãžã³ãšã®çŽæ¥çµ±å
å¶éäºé ïŒ
- 3Dãã¬ãŒãã³ã°ããŒã¿ãå¿ èŠã§ãããäžéšã®ãªããžã§ã¯ãã«ããŽãªã§ã¯äžè¶³ããå¯èœæ§ããã
ãŸãšãïŒ
ããã¹ããã 3D ãžã® AI çæã¯ã3D ã³ã³ãã³ãã®äœææ¹æ³ãšæäœæ¹æ³ã«æ ¹æ¬çãªå€åããããããŸããé«åºŠãªãã£ãŒãã©ãŒãã³ã°æè¡ã掻çšããããšã§ããããã®ã¢ãã«ã¯åçŽãªããã¹ãèšè¿°ããè€éã§é«å質㮠3D ã¢ã»ãããäœæã§ããŸãããã¯ãããžãŒãé²åãç¶ããã«ã€ããŠãã²ãŒã ãæ ç»ãã補åèšèšã建ç¯ã«è³ããŸã§ãæ¥çã«é©åœãããããããŸããŸãæŽç·Žããã髿§èœãªããã¹ããã 3D ãžã®ã·ã¹ãã ãç»å ŽããããšãæåŸ ãããŸãã
ç§ã¯éå» 50 幎éãæ©æ¢°åŠç¿ã𿷱局åŠç¿ã®é åçãªäžçã«æ²¡é ããŠããŸããã ç§ã®æ ç±ãšå°éç¥èã«ãããç¹ã« AI/ML ã«éç¹ã眮ãã XNUMX ãè¶ ãã倿§ãªãœãããŠã§ã¢ ãšã³ãžãã¢ãªã³ã° ãããžã§ã¯ãã«è²¢ç®ããŠããŸããã ç§ã®ç¶ç¶çãªå¥œå¥å¿ã¯ãç§ãããã«æ¢æ±ããããšæã£ãŠããåéã§ããèªç¶èšèªåŠçã«ãåŒãå¯ããããŸããã
ããªãã¯å¥œããããããŸãã
-
é»ååååŒãã©ãããã©ãŒã ã«ããããªã¢ã«ã¿ã€ã 3D ã¬ã³ããªã³ã°ã®ããã®ãã¥ãŒã©ã«æŸå°å Ž (NeRF) ã®æé©å
-
ã¯ããŒãã®ã¢ãã« ã³ã³ããã¹ã ãããã³ã« (MCP): éçºè ã¬ã€ã
-
AI ããã³ LLM ãšã³ãžãã¢ã®ããã® Python ã®ãã¶ã€ã³ ãã¿ãŒã³: å®è·µã¬ã€ã
-
Microsoft AutoGen: é«åºŠãªèªååãåãããã«ããšãŒãžã§ã³ã AI ã¯ãŒã¯ãããŒ
-
Python ã§ã®éåæ LLM API åŒã³åºã: ç·åã¬ã€ã
-
AIèšèªå¯Ÿæ±º: C++ãPythonãJavaãRustã®ããã©ãŒãã³ã¹æ¯èŒ