AI ããŒã« 101
NVIDIA GPU ãš CUDA ã䜿çšãã LLM ã®ãã¬ãŒãã³ã°ã埮調æŽãæšè«ã®èšå®

人工ç¥èœ (AI) ã®åéã¯è¿å¹Žç®èŠãŸãã鲿©ãéããŠããããã®äžæ žã«ã¯ã°ã©ãã£ãã¯ã¹ ããã»ãã·ã³ã° ãŠããã (GPU) ãšäžŠåã³ã³ãã¥ãŒãã£ã³ã° ãã©ãããã©ãŒã ã®åŒ·åãªçµã¿åããããããŸãã
ãªã©ã®ã¢ãã« GPTãBERTãæè¿ã§ã¯ ã©ã, ãã¹ãã©ã« ãããŸã§ã«ãªãæµæ¢ããšäžè²«æ§ã§ã人éã®ãããªããã¹ããçè§£ããçæããããšãã§ããŸãããã ãããããã®ã¢ãã«ã®ãã¬ãŒãã³ã°ã«ã¯èšå€§ãªéã®ããŒã¿ãšèšç®ãªãœãŒã¹ãå¿ èŠã§ããããã®åãçµã¿ã«ã¯ GPU ãš CUDA ãäžå¯æ¬ ãªããŒã«ãšãªããŸãã
ãã®å æ¬çãªã¬ã€ãã§ã¯ãNVIDIA ãã©ã€ããŒãCUDA ToolkitãcuDNNãPyTorch ãªã©ã®éèŠãªãœãããŠã§ã¢ ã³ã³ããŒãã³ãã®ã€ã³ã¹ããŒã«ãå«ããUbuntu ã§ NVIDIA GPU ãã»ããã¢ããããããã»ã¹ãé ã远ã£ãŠèª¬æããŸãã
CUDA ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ AI ãã¬ãŒã ã¯ãŒã¯ã®å°é
GPUã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ã«ãããã£ãŒãã©ãŒãã³ã°ã¯ãCUDAãæŽ»çšããŠå¹ççãªèšç®ãè¡ã人æ°ã®AIãã¬ãŒã ã¯ãŒã¯ã®éçºã«ãã£ãŠæšé²ãããŠããŸããã TensorFlow, ãã€ããŒã, MXNet CUDA ã®ãµããŒããçµã¿èŸŒãŸããŠãããGPU ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ããã£ãŒãã©ãŒãã³ã° ãã€ãã©ã€ã³ã«ã·ãŒã ã¬ã¹ã«çµ±åã§ããŸãã
ã«ãã NVIDIA ããŒã¿ã»ã³ã¿ãŒ ãã£ãŒãã©ãŒãã³ã°è£œåããã©ãŒãã³ã¹èª¿æ»CUDA ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ã«ãããã£ãŒãã©ãŒãã³ã° ã¢ãã«ã¯ãCPU ããŒã¹ã®å®è£ ã«æ¯ã¹ãп倧 100 åé«éãªããã©ãŒãã³ã¹ãå®çŸã§ããŸãã
Ampere ã¢ãŒããã¯ãã£ã§å°å ¥ããã NVIDIA ã®ãã«ãã€ã³ã¹ã¿ã³ã¹ GPU (MIG) ãã¯ãããžã«ãããåäžã® GPU ãè€æ°ã®å®å šãªã€ã³ã¹ã¿ã³ã¹ã«åå²ããããããã«å°çšã®ãªãœãŒã¹ãæãããããšãã§ããŸãããã®æ©èœã«ãããè€æ°ã®ãŠãŒã¶ãŒãŸãã¯ã¯ãŒã¯ããŒãéã§ GPU ãªãœãŒã¹ãå¹ççã«å ±æã§ããããã䜿çšçãæå€§åãããå šäœçãªã³ã¹ããåæžãããŸãã
NVIDIA TensorRT ã«ãã LLM æšè«ã®é«éå
GPU 㯠LLM ã®ãã¬ãŒãã³ã°ã«åœ¹ç«ã£ãŠããŸããããããã®ã¢ãã«ãå®çšŒåç°å¢ã«å±éããã«ã¯ãå¹ççãªæšè«ãåæ§ã«éèŠã§ãã NVIDIA TensorRTã¯ã髿§èœãªãã£ãŒãã©ãŒãã³ã°æšè«ãªããã£ãã€ã¶ãŒããã³ã©ã³ã¿ã€ã ã§ãããCUDA å¯Ÿå¿ GPU ã§ã® LLM æšè«ãé«éåããäžã§éèŠãªåœ¹å²ãæãããŸãã
NVIDIA ã®ãã³ãããŒã¯ã«ãããšãTensorRT ã¯ãGPT-8 ã®ãããªå€§èŠæš¡ãªèšèªã¢ãã«ã®å ŽåãCPU ããŒã¹ã®æšè«ãšæ¯èŒããŠæå€§ 5 åé«éãªæšè«ããã©ãŒãã³ã¹ãš 3 åäœãç·ææã³ã¹ããå®çŸã§ããŸãã
NVIDIAã®ãªãŒãã³ãœãŒã¹ã€ãã·ã¢ãããžã®åãçµã¿ã¯ãAIç ç©¶ã³ãã¥ããã£ã«ãããCUDAã®åºç¯ãªæ¡çšã®åååãšãªã£ãŠããŸãã ã¯ãã³, ãã¥ãã©ã¹, NCCL ãªãŒãã³ãœãŒã¹ ã©ã€ãã©ãªãšããŠå©çšå¯èœã§ãããç ç©¶è ãéçºè 㯠CUDA ã®å¯èœæ§ãæå€§éã«æŽ»çšããŠãã£ãŒãã©ãŒãã³ã°ãè¡ãããšãã§ããŸãã
ã€ã³ã¹ããŒã«
AIéçºãèšå®ããå Žåãææ°ã®ãã©ã€ããŒãšã©ã€ãã©ãªã䜿çšããããšãåžžã«æåã®éžæãšã¯éããŸãããããšãã°ãææ°ã®NVIDIAãã©ã€ããŒïŒ545.xxïŒã¯CUDA 12.3ããµããŒãããŠããŸãããPyTorchããã®ä»ã®ã©ã€ãã©ãªã¯ãŸã ãã®ããŒãžã§ã³ããµããŒãããŠããªãå¯èœæ§ããããŸãããã®ããã CUDA 535.146.02 ã®ãã©ã€ã㌠ããŒãžã§ã³ 12.2 äºææ§ã確ä¿ããããã
ã€ã³ã¹ããŒã«æé
1. NVIDIAãã©ã€ããŒãã€ã³ã¹ããŒã«ãã
ãŸããGPUã¢ãã«ãç¹å®ããŸãããã®ã¬ã€ãã§ã¯ãNVIDIA GPUã䜿çšããŸãã NVIDIA ãã©ã€ã㌠ããŠã³ããŒã ããŒãžãGPU ã«é©ãããã©ã€ããŒãéžæãããã©ã€ããŒã®ããŒãžã§ã³ãã¡ã¢ããŸãã
Ubuntu ã§ãã«ãæžã¿ã® GPU ããã±ãŒãžã確èªããã«ã¯ã次ã®ã³ãã³ããå®è¡ããŸãã
sudo ubuntu-drivers list --gpgpu
ã³ã³ãã¥ãŒã¿ãåèµ·åããŠã€ã³ã¹ããŒã«ã確èªããŸãã
nvidia-smi
2. CUDAããŒã«ããããã€ã³ã¹ããŒã«ãã
CUDA ããŒã«ãããã¯ã髿§èœã® GPU ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ ã¢ããªã±ãŒã·ã§ã³ãäœæããããã®éçºç°å¢ãæäŸããŸãã
LLM/ãã£ãŒãã©ãŒãã³ã°ä»¥å€ã®ã»ããã¢ããã§ã¯ã以äžã䜿çšã§ããŸãã
sudo apt install nvidia-cuda-toolkit However, to ensure compatibility with BitsAndBytes, we will follow these steps: [code language="BASH"] git clone https://github.com/TimDettmers/bitsandbytes.git cd bitsandbytes/ bash install_cuda.sh 122 ~/local 1
ã€ã³ã¹ããŒã«ã確èªããŸãã
~/local/cuda-12.2/bin/nvcc --version
ç°å¢å€æ°ãèšå®ããŸãã
export CUDA_HOME=/home/roguser/local/cuda-12.2/ export LD_LIBRARY_PATH=/home/roguser/local/cuda-12.2/lib64 export BNB_CUDA_VERSION=122 export CUDA_VERSION=122
3. cuDNNãã€ã³ã¹ããŒã«ãã
ããŠã³ããŒã cuDNN ããã±ãŒãž NVIDIA éçºè ãŠã§ããµã€ã次ã®ã³ãã³ãã§ã€ã³ã¹ããŒã«ããŸã:
sudo apt install ./cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
ããŒãªã³ã°ã远å ããã«ã¯ãæ¬¡ã®æé ã«åŸã£ãŠãã ããã
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.7.29/cudnn-local-08A7D361-keyring.gpg /usr/share/keyrings/
cuDNN ã©ã€ãã©ãªãã€ã³ã¹ããŒã«ããŸãã
sudo apt update sudo apt install libcudnn8 libcudnn8-dev libcudnn8-samples
4. Pythonä»®æ³ç°å¢ãã»ããã¢ãããã
Ubuntu 22.04 ã«ã¯ Python 3.10 ãä»å±ããŠããŸããvenv ãã€ã³ã¹ããŒã«ããŸãã
sudo apt-get install python3-pip sudo apt install python3.10-venv
ä»®æ³ç°å¢ãäœæããŠã¢ã¯ãã£ãåããŸãã
cd mkdir test-gpu cd test-gpu python3 -m venv venv source venv/bin/activate
5. ãœãŒã¹ããBitsAndBytesãã€ã³ã¹ããŒã«ãã
BitsAndBytes ãã£ã¬ã¯ããªã«ç§»åãããœãŒã¹ãããã«ãããŸãã
cd ~/bitsandbytes CUDA_HOME=/home/roguser/local/cuda-12.2/ \ LD_LIBRARY_PATH=/home/roguser/local/cuda-12.2/lib64 \ BNB_CUDA_VERSION=122 \ CUDA_VERSION=122 \ make cuda12x CUDA_HOME=/home/roguser/local/cuda-12.2/ \ LD_LIBRARY_PATH=/home/roguser/local/cuda-12.2/lib64 \ BNB_CUDA_VERSION=122 \ CUDA_VERSION=122 \ python setup.py install
6. PyTorchãã€ã³ã¹ããŒã«ãã
次ã®ã³ãã³ãã§ PyTorch ãã€ã³ã¹ããŒã«ããŸãã
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
7. ãã®ã³ã°ãã§ã€ã¹ãšãã©ã³ã¹ãã©ãŒããŒãã€ã³ã¹ããŒã«ãã
ãã©ã³ã¹ãã©ãŒããŒãšé«éåã©ã€ãã©ãªãã€ã³ã¹ããŒã«ããŸãã
pip install transformers pip install accelerate
䞊ååŠçã®åšå
GPU ã¯æ¬è³ªçã«ãæ°åã®åæã¹ã¬ãããå¹ççã«åŠçããããã«èšèšãããé«åºŠãªäžŠåããã»ããµã§ãããã®ã¢ãŒããã¯ãã£ã«ãããGPU 㯠LLM ãå«ããã£ãŒãã©ãŒãã³ã° ã¢ãã«ã®ãã¬ãŒãã³ã°ã«äŒŽãèšç®éçŽåã®ã¿ã¹ã¯ã«æé©ã§ããNVIDIA ãéçºãã CUDA ãã©ãããã©ãŒã ã¯ãéçºè
ããããã® GPU ã®æœåšèœåãæå€§éã«æŽ»çšã§ãããœãããŠã§ã¢ç°å¢ãæäŸããããŒããŠã§ã¢ã®äžŠååŠçæ©èœã掻çšã§ããã³ãŒããèšè¿°ã§ããããã«ããŸãã
å éãã LLM GPU ãš CUDA ã䜿çšãããã¬ãŒãã³ã°ã
å€§èŠæš¡èšèªã¢ãã«ã®ãã¬ãŒãã³ã°ã¯ãèšå€§ãªéã®ããã¹ã ããŒã¿ãåŠçãã倿°ã®è¡åæŒç®ãå®è¡ããå¿ èŠããããèšç®è² è·ã®é«ãã¿ã¹ã¯ã§ããæ°åã®ã³ã¢ãšé«ãã¡ã¢ãªåž¯åå¹ ãåãã GPU ã¯ããããã®ã¿ã¹ã¯ã«æé©ã§ããCUDA ãæŽ»çšããããšã§ãéçºè ã¯ã³ãŒããæé©åã㊠GPU ã®äžŠååŠçæ©èœã掻çšããLLM ã®ãã¬ãŒãã³ã°ã«å¿ èŠãªæéãå€§å¹ ã«ççž®ã§ããŸãã
äŸãã°ã GPTãããŸã§ã§æå€§ã®èšèªã¢ãã«ã® 3 ã€ã§ãã CUDA ã«æé©åãããã³ãŒããå®è¡ããæ°åã® NVIDIA GPU ã䜿çšããããšã§å®çŸããŸãããããã«ãããã¢ãã«ã¯åäŸã®ãªãéã®ããŒã¿ã§ãã¬ãŒãã³ã°ã§ããããã«ãªããèªç¶èšèªã¿ã¹ã¯ã§åªããããã©ãŒãã³ã¹ãçºæ®ããŸãã
import torch import torch.nn as nn import torch.optim as optim from transformers import GPT2LMHeadModel, GPT2Tokenizer # Load pre-trained GPT-2 model and tokenizer model = GPT2LMHeadModel.from_pretrained('gpt2') tokenizer = GPT2Tokenizer.from_pretrained('gpt2') # Move model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) # Define training data and hyperparameters train_data = [...] # Your training data batch_size = 32 num_epochs = 10 learning_rate = 5e-5 # Define loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=learning_rate) # Training loop for epoch in range(num_epochs): for i in range(0, len(train_data), batch_size): # Prepare input and target sequences inputs, targets = train_data[i:i+batch_size] inputs = tokenizer(inputs, return_tensors="pt", padding=True) inputs = inputs.to(device) targets = targets.to(device) # Forward pass outputs = model(**inputs, labels=targets) loss = outputs.loss # Backward pass and optimization optimizer.zero_grad() loss.backward() optimizer.step() print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')
ãã®ãµã³ãã«ã³ãŒãã¹ããããã§ã¯ã GPT-2 PyTorch ãš CUDA å¯Ÿå¿ GPU ã䜿çšããèšèªã¢ãã«ãã¢ãã«ã¯ GPU (䜿çšå¯èœãªå Žå) ã«ããŒãããããã¬ãŒãã³ã° ã«ãŒã㯠GPU ã®äžŠååŠçãæŽ»çšããŠå¹ççãªé æ¹åãã¹ãšéæ¹åãã¹ãå®è¡ãããã¬ãŒãã³ã° ããã»ã¹ãå éããŸãã
ãã£ãŒãã©ãŒãã³ã°ã®ããã® CUDA ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ ã©ã€ãã©ãª
CUDA ãã©ãããã©ãŒã èªäœã«å ããŠãNVIDIA ãšãªãŒãã³ãœãŒã¹ ã³ãã¥ããã£ã¯ãLLM ãå«ããã£ãŒãã©ãŒãã³ã° ã¢ãã«ã®å¹ççãªå®è£ ãå¯èœã«ããããŸããŸãª CUDA ã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ ã©ã€ãã©ãªãéçºããŸããããããã®ã©ã€ãã©ãªã¯ãè¡åä¹ç®ãç³ã¿èŸŒã¿ã掻æ§å颿°ãªã©ã®äžè¬çãªæŒç®ã®æé©åãããå®è£ ãæäŸãããããéçºè ã¯äœã¬ãã«ã®æé©åã§ã¯ãªããã¢ãã« ã¢ãŒããã¯ãã£ãšãã¬ãŒãã³ã° ããã»ã¹ã«éäžã§ããŸãã
ãã®ãããªã©ã€ãã©ãªã® 1 ã€ã cuDNN (CUDA Deep Neural Network ã©ã€ãã©ãª) ã§ããããã¯ããã£ãŒã ãã¥ãŒã©ã« ãããã¯ãŒã¯ã§äœ¿çšãããæšæºã«ãŒãã³ã®é«åºŠã«èª¿æŽãããå®è£ ãæäŸããŸããcuDNN ãæŽ»çšããããšã§ãéçºè ã¯ã¢ãã«ã®ãã¬ãŒãã³ã°ãšæšè«ãå€§å¹ ã«é«éåããCPU ããŒã¹ã®å®è£ ãšæ¯èŒããŠæå€§æ°æ¡ã®ããã©ãŒãã³ã¹åäžãå®çŸã§ããŸãã
import torch import torch.nn as nn import torch.nn.functional as F from torch.cuda.amp import autocast class ResidualBlock(nn.Module): def __init__(self, in_channels, out_channels, stride=1): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels)) def forward(self, x): with autocast(): out = F.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) out = F.relu(out) return out
ãã®ã³ãŒã ã¹ããããã§ã¯ãPyTorch ã䜿çšããŠç³ã¿èŸŒã¿ãã¥ãŒã©ã« ãããã¯ãŒã¯ (CNN) ã®æ®å·®ãããã¯ãå®çŸ©ããŸããPyTorch ã® Automatic Mixed Precision (AMP) ã®èªåãã£ã¹ã ã³ã³ããã¹ã ãããŒãžã£ãŒã䜿çšããŠæ··å粟床ãã¬ãŒãã³ã°ãæå¹ã«ããŸããããã«ãããCUDA å¯Ÿå¿ GPU ã§é«ã粟床ãç¶æããªããå€§å¹ ãªããã©ãŒãã³ã¹åäžãå®çŸã§ããŸããF.relu 颿°ã¯ cuDNN ã«ãã£ãŠæé©åãããGPU ã§ã®å¹ççãªå®è¡ãä¿èšŒãããŸãã
ã¹ã±ãŒã©ããªãã£ã®ããã®ãã«ãGPUãšåæ£ãã¬ãŒãã³ã°
LLM ãšãã£ãŒãã©ãŒãã³ã° ã¢ãã«ã®ãµã€ãºãšè€éããå¢å€§ããã«ã€ããŠããããã®ã¢ãã«ããã¬ãŒãã³ã°ããããã®èšç®èŠä»¶ãå¢å€§ããŸãããã®èª²é¡ã«å¯ŸåŠããããã«ãç ç©¶è ãšéçºè ã¯ãè€æ°ã®ãã·ã³ã«ãããè€æ°ã® GPU ã®è€ååŠçèœåãæŽ»çšã§ãããã«ã GPU ãšåæ£ãã¬ãŒãã³ã°æè¡ã«ç®ãåããŠããŸãã
CUDA ãšãNCCL (NVIDIA Collective Communications Library) ãªã©ã®é¢é£ã©ã€ãã©ãªã¯ãè€æ°ã® GPU éã§ã·ãŒã ã¬ã¹ãªããŒã¿è»¢éãšåæãå¯èœã«ããå¹ççãªéä¿¡ããªããã£ããæäŸãããããŸã§ã«ãªãèŠæš¡ã§ã®åæ£ãã¬ãŒãã³ã°ãå¯èœã«ããŸãã
</pre> import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel as DDP # Initialize distributed training dist.init_process_group(backend='nccl', init_method='...') local_rank = dist.get_rank() torch.cuda.set_device(local_rank) # Create model and move to GPU model = MyModel().cuda() # Wrap model with DDP model = DDP(model, device_ids=[local_rank]) # Training loop (distributed) for epoch in range(num_epochs): for data in train_loader: inputs, targets = data inputs = inputs.cuda(non_blocking=True) targets = targets.cuda(non_blocking=True) outputs = model(inputs) loss = criterion(outputs, targets) optimizer.zero_grad() loss.backward() optimizer.step()
ãã®äŸã§ã¯ãPyTorch ã® DistributedDataParallel (DDP) ã¢ãžã¥ãŒã«ã䜿çšãã忣ãã¬ãŒãã³ã°ã瀺ããŸããã¢ãã«ã¯ DDP ã§ã©ãããããŠãããNCCL ã䜿çšããŠããŒã¿ã®äžŠååŠçãåŸé åæãè€æ°ã® GPU éã®éä¿¡ãèªåçã«åŠçããŸãããã®ã¢ãããŒãã«ãããè€æ°ã®ãã·ã³éã§ã®ãã¬ãŒãã³ã° ããã»ã¹ã®å¹ççãªã¹ã±ãŒãªã³ã°ãå¯èœã«ãªããç ç©¶è ãéçºè ã¯ããå€§èŠæš¡ã§è€éãªã¢ãã«ã劥åœãªæéå ã«ãã¬ãŒãã³ã°ã§ããããã«ãªããŸãã
CUDA ã䜿çšãããã£ãŒãã©ãŒãã³ã° ã¢ãã«ã®ãããã€
GPUãšCUDAã¯äž»ã«ãã£ãŒãã©ãŒãã³ã°ã¢ãã«ã®ãã¬ãŒãã³ã°ã«äœ¿çšãããŠããŸãããå¹ççãªå±éãšæšè«ã«ãäžå¯æ¬ ã§ãããã£ãŒãã©ãŒãã³ã°ã¢ãã«ããŸããŸãè€éã«ãªãããªãœãŒã¹ã倧éã«æ¶è²»ããããã«ãªãã«ã€ããŠãGPUã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ã¯ å®çšŒåç°å¢ã§ãªã¢ã«ã¿ã€ã ã®ããã©ãŒãã³ã¹ãå®çŸãã.
NVIDIAã®TensorRTã¯ã髿§èœãªãã£ãŒãã©ãŒãã³ã°æšè«ãªããã£ãã€ã¶ãŒãšã©ã³ã¿ã€ã ã§ããã äœé å»¶ãšé«ã¹ã«ãŒããã CUDA å¯Ÿå¿ GPU ã§ã®æšè«ãTensorRT ã¯ãTensorFlowãPyTorchãMXNet ãªã©ã®ãã¬ãŒã ã¯ãŒã¯ã§ãã¬ãŒãã³ã°ãããã¢ãã«ãæé©åããã³é«éåã§ãããããçµã¿èŸŒã¿ã·ã¹ãã ããããŒã¿ ã»ã³ã¿ãŒãŸã§ãããŸããŸãªãã©ãããã©ãŒã ã«å¹ççã«å±éã§ããŸãã
import tensorrt as trt # Load pre-trained model model = load_model(...) # Create TensorRT engine logger = trt.Logger(trt.Logger.INFO) builder = trt.Builder(logger) network = builder.create_network() parser = trt.OnnxParser(network, logger) # Parse and optimize model success = parser.parse_from_file(model_path) engine = builder.build_cuda_engine(network) # Run inference on GPU context = engine.create_execution_context() inputs, outputs, bindings, stream = allocate_buffers(engine) # Set input data and run inference set_input_data(inputs, input_data) context.execute_async_v2(bindings=bindings, stream_handle=stream.ptr) # Process output # ...
ãã®äŸã§ã¯ãTensorRT ã䜿çšããŠãCUDA å¯Ÿå¿ GPU ã«äºåãã¬ãŒãã³ã°æžã¿ã®ãã£ãŒãã©ãŒãã³ã° ã¢ãã«ãå±éããæ¹æ³ã瀺ããŸããã¢ãã«ã¯æåã« TensorRT ã«ãã£ãŠè§£æããã³æé©åãããç¹å®ã®ã¢ãã«ãšããŒããŠã§ã¢ã«åãããŠé«åºŠã«æé©åãããæšè«ãšã³ãžã³ãçæãããŸãããã®ãšã³ãžã³ã䜿çšã㊠GPU ã§å¹ççãªæšè«ãå®è¡ããCUDA ãæŽ»çšããŠèšç®ãé«éåã§ããŸãã
ãŸãšãïŒ
GPU ãš CUDA ã®çµã¿åããã¯ãå€§èŠæš¡èšèªã¢ãã«ãã³ã³ãã¥ãŒã¿ãŒ ããžã§ã³ãé³å£°èªèãããã³ãã£ãŒãã©ãŒãã³ã°ã®ããŸããŸãªä»ã®é åã«ããã鲿©ãä¿é²ããäžã§éèŠãªåœ¹å²ãæãããŠããŸãããGPU ã®äžŠååŠçæ©èœãš CUDA ãæäŸããæé©åãããã©ã€ãã©ãªã掻çšããããšã§ãç ç©¶è ãéçºè ã¯ãŸããŸãè€éåããã¢ãã«ãé«ãå¹çã§ãã¬ãŒãã³ã°ããã³å±éã§ããŸãã
AI åéãé²åãç¶ããã«ã€ããŠãGPU ãš CUDA ã®éèŠæ§ã¯é«ãŸãã°ããã§ããããã«åŒ·åãªããŒããŠã§ã¢ãšãœãããŠã§ã¢ã®æé©åã«ãããAI ã·ã¹ãã ã®éçºãšå±éã«ãããŠãããªã鲿©ãèŠãããå¯èœæ§ã®éçãæŒãäžããããããšãæåŸ ã§ããŸãã