本文主要实现CUDA的 Hello World 。介绍了nvcc CUDA 编译时虚拟架构算力指定和真实架构算力指定,最后使用xmake编译CUDA程序。
// src/main.cu
#include
__global__ void hello_from_gpu() {printf("Hello World the GPU! \n");
}int main() {hello_from_gpu<<<1,1>>>();cudaDeviceSynchronize();return 0;
}
比如这里我们可以选择 虚拟架构算力为3.5 真实架构算力为6.0:
-arch=compute_35 -code=sm_60
但是不允许 虚拟架构算力大于真实架构算力
-arch=compute_60 -code=sm_35
如我们指定算力选项为:
-arch=compute_35 -code=sm_35
编译出来的可执行文件只能在帕斯卡架构的GPU中执行。如果希望编译出来的可执行文件能够在更多的GPU中执行,可以同时指定多组计算能力,每一组用如下形式的编译选项:
-gencode arch=compute_35 code=sm_35
-gencode arch=compute_50 code=sm_50
-gencode arch=compute_60 code=sm_60
-gencode arch=compute_70 code=sm_70
这样在不同GPU中运行会自动选择对应的二进制版本。
-- generate SASS code for SM architecture of current hostadd_cugencodes("native")-- generate PTX code for the virtual architecture to guarantee compatibilityadd_cugencodes("compute_35")-- -- generate SASS code for each SM architecture-- add_cugencodes("sm_35", "sm_37", "sm_50", "sm_52", "sm_60", "sm_61", "sm_70", "sm_75")-- -- generate PTX code from the highest SM architecture to guarantee forward-compatibility-- add_cugencodes("compute_75")
xmake 编译脚本
add_rules("mode.debug", "mode.release")target("LearningCUDA")set_kind("binary")add_files("src/*.cu")-- generate relocatable device code for device linker of dependents.-- if __device__ or __global__ functions will be called cross file,-- or dynamic parallelism will be used,-- this instruction should be opted in.-- add_cuflags("-rdc=true")-- generate SASS code for SM architecture of current hostadd_cugencodes("native")-- generate PTX code for the virtual architecture to guarantee compatibilityadd_cugencodes("compute_35")-- -- generate SASS code for each SM architecture-- add_cugencodes("sm_35", "sm_37", "sm_50", "sm_52", "sm_60", "sm_61", "sm_70", "sm_75")-- -- generate PTX code from the highest SM architecture to guarantee forward-compatibility-- add_cugencodes("compute_75")
--