CUDA Programming Model II

Previously, we discuss the basic programming model of CUDA: memory and thread. In this note, we will dive into the hardware implementation. By doing this, we can better understand the philosophy of CUDA programming, so as to accelerate the computation. The GPU architecture is built around a scalable array of multi-threaded Streaming Multiprocessors (SMs). There are usually numerous SMs in every GPU. And every SM can host hundreds of threads. When a CUDA program on the host GPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available and distributed to SMs with available execution capacity. ...

November 25, 2024 · 2 min · 380 words · Ethan Lyu

CUDA Programming Model I

The programming model gives us a high level idea of how to write CUDA programs. Also, we need to know how to debug the program using the toolchains. There are several key points that we need to be aware of in the GPU programming: Kernel Function Memory Management Thread Management Streams A typical environment usually encompasses several CPUs and GPUs. They use PCIe to communicate with each other. The memory is strictly isolated between CPU and GPU (though they share uniform addressing). A complete CUDA program can be executed in the following way: The host code is followed by the parallel code. And it will be returned immediately to the main thread. In other word, when the first parallel code is running, the second host code is likely to run as well simultaneously. ...

November 24, 2024 · 7 min · 1402 words · Ethan Lyu

CUDA Programming Intro

There are numerous tutorials online to teach how to use CUDA C++ for parallel programming. And we will adopt some of them. The best way to learn is to read the official documents. This tutorial, however, will start from how to write the code at the beginning. We will illustrate the architecture part throughout the code. We will take few days to go over the principle of CUDA programming. Afterwards, we will read over the ML code to see how the operations are implemented in CUDA core. There is no strict prerequisite (except basic programming), but it would be better to know OS and CPU (a little bit). ...

November 1, 2024 · 3 min · 454 words · Ethan Lyu