Notes on Ethan's Blog

Deep Generative Model I

Tue, 21 Oct 2025 00:00:00 +0000

The rise of large models has advanced the way of how we move towards to the intelligence. In the past decades, people worked on “discriminative” intelligence, that is, we predict a dog image as a “dog”. To do so, we train on a batch of dataset containing different categories with labels, and train with the goal that every output matches the corresponding classes. The great success of LLM has brought us into a new era: GenAI. Unlike the discriminative model, we generate the image “dog” given the input prompt (text, image, …). This is called generative model.

CUDA Programming Model II

Mon, 25 Nov 2024 00:00:00 +0000

Previously, we discuss the basic programming model of CUDA: memory and thread. In this note, we will dive into the hardware implementation. By doing this, we can better understand the philosophy of CUDA programming, so as to accelerate the computation.

The GPU architecture is built around a scalable array of multi-threaded Streaming Multiprocessors (SMs). There are usually numerous SMs in every GPU. And every SM can host hundreds of threads. When a CUDA program on the host GPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available and distributed to SMs with available execution capacity.

CUDA Programming Model I

Sun, 24 Nov 2024 00:00:00 +0000

The programming model gives us a high level idea of how to write CUDA programs. Also, we need to know how to debug the program using the toolchains.

There are several key points that we need to be aware of in the GPU programming:

Kernel Function
Memory Management
Thread Management
Streams

A typical environment usually encompasses several CPUs and GPUs. They use PCIe to communicate with each other. The memory is strictly isolated between CPU and GPU (though they share uniform addressing). A complete CUDA program can be executed in the following way: The host code is followed by the parallel code. And it will be returned immediately to the main thread. In other word, when the first parallel code is running, the second host code is likely to run as well simultaneously.

CUDA Programming Intro

Fri, 01 Nov 2024 00:00:00 +0000

There are numerous tutorials online to teach how to use CUDA C++ for parallel programming. And we will adopt some of them. The best way to learn is to read the official documents. This tutorial, however, will start from how to write the code at the beginning. We will illustrate the architecture part throughout the code.

We will take few days to go over the principle of CUDA programming. Afterwards, we will read over the ML code to see how the operations are implemented in CUDA core. There is no strict prerequisite (except basic programming), but it would be better to know OS and CPU (a little bit).