Category : cuda

I want to use cudaMemset for non-integer types to tidy the code as below, where I used the glm::vec2 type just as an example. However, cudaMemset appears to work only for integers. So, are there any possible equivalents? #include <glm/glm.hpp> #include <cuda.h> int main(){ glm::vec2* ptr1; int num = 10; cudaMallocManaged(&ptr1, num*sizeof(glm::vec2)); cudaMemset(ptr1, {0, 0}, ..

Read more

I was trying to use shared pointers in CUDA by using CUDA’s version of the standard library. So tried to include <cuda/std/detail/libcxx/include/memory>. I got some errors. One of them includes cannot open source file with <__config> and <__functional_base>. Those files were clearly in the directory. It’s like visual studios acts like those files don’t exist ..

Read more

So a minimal operational example of my code (class definition): #ifdef __CUDA_ARCH__ #define CF __device__ #else #define CF #endif template<typename T> struct coord { T *d; long unsigned int dim; CF coord(initializer_list<T> l): dim{l.size()} { //printf("init %pn", this); //d=(T *)calloc(dim, sizeof(T)); d=new T[dim]; memcpy(d, l.begin(), sizeof(T)*dim); //std::copy(l.begin(), l.end(), d); } CF ~coord() { #ifdef __NVCC__ ..

Read more

(Grammar mistake in title forced by "Title cannot contain "Why does std::sin work in CUDA kernel?"." message from SO’s validation logic) The following code compiles (with nvcc test.cu -o test) and runs without error, meaning that std::sin does work on the device. #include <cmath> #include <vector> #include <cassert> #include <numeric> __global__ void map_sin(double* in, double* ..

Read more

The following code compiles (with nvcc test.cu -o test) and runs without error, meaning that std::sin() does work on the device: #include <cmath> #include <vector> #include <cassert> #include <numeric> __global__ void map_sin(double* in, double* out, int n) { const int i = blockIdx.x * 512 + threadIdx.x; if (i < n) { out[i] = std::sin(in[i]); ..

Read more

I’m trying to write a simple CUDA program which adds two 2D matrices. Below is my code: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> #include <iostream> #include <cmath> #define N 100 #define T 1024 __global__ void MatrixAdd(float* A, float* B, float* C) { int col = blockDim.x * blockIdx.x + threadIdx.x; int row = blockDim.y * ..

Read more

I am currently trying to build OpenCV with CUDA on my Windows machine. I already configured and generated in CMake with version=14.28.29910. After going into VS2019, changing debug to release, and building the ALL_BUILD project, I got everything to build successfully except 1. Below is the error message: C:Program Files (x86)Microsoft Visual Studio19CommunityVCToolsMSVC.29.30037includexutility(1309): error : ..

Read more

I have this bash script: #!/usr/bin/env bash PYTHON_CMD=${PYTHON_CMD:=python} CUDA_PATH="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3" CUDA_INCLUDE_DIR=CUDA_PATH="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3/include" GENCODE="-gencode arch=compute_61,code=sm_61 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52" NVCCOPT="-std=c++11 -x cu –expt-extended-lambda -O3 -Xcompiler -fPIC" ROOTDIR=$PWD echo "========= Build BatchNorm2dSync =========" if [ -z "$1" ]; then TORCH=$($PYTHON_CMD -c "import os; import torch; print(os.path.dirname(torch.__file__))"); else TORCH="$1"; fi cd modules/functional/_syncbn/src ..

Read more