CUDA Kernel

티스토리 뷰

CUDA

CUDA Kernel

developer0hye 2022. 9. 20. 00:26

Kernel

GPU 에서 실행되는 함수

__global__ 이라는 키워드가 앞에 붙는다.

Kernel 을 호출할때 다음과 같은 꼴로 호출한다.

KernelFunction<<<A, B>>>(Params)

<<< >>> 는 Triple angle brackets 혹은 Triple chevron 으로 불린다.

A는 Kernel이 실행될 블록(Block)의 개수이고

B는 블록 당 할당할 쓰레드(Thread) 수이다.

*A, B 가 int 변수값 일때를 가정하고 설명한다. 아직 dim3 변수값은 안넣어봐서 잘모른다.

즉, A x B 개만큼의 커널이 병렬적으로 실행된다.

Params 는 그냥 함수 인자이니 여기선 설명 패스.

블록은 뭐고 블록 당 할당할 쓰레드 수는 무엇을 의미할까?

이 의미를 파악하기 위해서는 GPU의 구조를 조금 알고있어야한다.

가장 작은 처리 단위가 쓰레드다.

그리고 이 쓰레드가 여러개 모여서 하나의 그룹안에 존재하는데 이 그룹이 블록이다.

그리고 이 블록도 다수개가 존재한다.

이 블록이 모이면 또 하나의 그룹이 되는데 이게 바로 그리드다.

그리드도 여러개 묶이는 건가? 아직 잘 모르겠다. 이것도 체크포인트다.

https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/

CUDA Refresher: The CUDA Programming Model | NVIDIA Technical Blog

This is the fourth post in the CUDA Refresher series, which has the goal of refreshing key concepts in CUDA, tools, and optimization for beginning or intermediate developers.

developer.nvidia.com

https://docs.nvidia.com/cuda/cuda-c-programming-guide/#execution-configuration

Programming Guide :: CUDA Toolkit Documentation

Texture gather is a special texture fetch that is available for two-dimensional textures only. It is performed by the tex2Dgather() function, which has the same parameters as tex2D(), plus an additional comp parameter equal to 0, 1, 2, or 3 (see tex2Dgathe

docs.nvidia.com

커널을 실행하면 A x B 개 만큼의 커널이 병렬적으로 실행된다.

여기서 커널별로 고유의 ID를 얻을 수 있다. 그리고 이 고유의 ID는 데이터를 인덱싱하는데 사용할 수 있다.

고유의 ID를 계산하려면 아래의 변수들을 알아야한다.

blockIdx.x, blockIdx.y, blockIdx.z

A에 int 값을 넣었다면

blockIdx.x 는 [0, A-1] 의 정수값을 가진다.

blockDim.x, blockDim.y, blockDim.z

B에 int 값을 넣었다면

blockDim.x,는 B 다.

threadIdx.x, threadIdx.y, threadIdx.z

B에 int 값을 넣었다면

threadIdx.x 는 [0, B-1] 의 정수값을 가진다.

위 변수들은 커널에서 명시적으로 정의하지 않고 접근 가능하다.

__global__ void print_hello_parallel_cuda_world_kernel(){
  int thread_id = threadIdx.x;
  printf("Hello CUDA World! %d\n", thread_id);
}

void print_hello_parallel_cuda_world(int num_threads){
  print_hello_parallel_cuda_world_kernel<<<1, num_threads>>>();
  cudaDeviceSynchronize();
}

http://users.wfu.edu/choss/CUDA/docs/Lecture%205.pdf

'CUDA' 카테고리의 다른 글

윈도우에서 GPU 전력 제한하기 (2)	2024.02.12
[OpenCV, CUDA] Multiple GPUs 시스템에서 OpenCV CUDA 함수 쓸때 유의점 (0)	2023.09.04
Kernel 내 gridDim, blockIdx, blockDim, threadIdx 값 출력 (0)	2022.09.24
CUDA Compute Capabilities 최대 블록 사이즈, 그리드 사이즈 등 확인하는 법 (2)	2022.09.20

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

지속 가능한 꾸준함

티스토리 뷰

CUDA Kernel

'CUDA' 카테고리의 다른 글

티스토리툴바