I’m trying to check my GPUs from Windows PowerShell with nvidia-smi but I can’t get it to work. I already checked this post but I don’t see a folder that starts with nvdm in my C:WindowsSystem32DriverStoreFileRepository directory. I have two version of CUDA installed v8.0 and v11.2 but my System Variables (CUDA_HOME, CUDA_PATH, CUDA_PATH_v11_2) all ..
My Laptop always did work great and I didn’t use it for a Month. Recently I tried to do some Windows Updates on my Laptop and had a blue screen. After that I could update as usual. But I realised my GPU was not enabled. I checked for the Device Manager and could only find ..
I recently Made A OpenCL wrapper in C++ that Creates and Manages Resources automatically(more resources can be added/shared/interchanged with a simple function) Does My or Any Wrapper Cause any Unwanted Overhead on GPU? when compared with just CL/cl.h? Link goes to Github source code. Source: Windows Que..
I have a profile on VTune and it shows something running on the GPU (the line highlighted with the pale blue dot in the attached screenshot). How can I debug what in my codebase is running on there? To clarify: when that highlighted line is expanded, it’s the nvoglv64.dll process eating up all that time, ..
I have been trying to set up CUDA computing under Julia for my RTX 2070 GPU and, so far, I did not get any errors related to failed CUDA initialization when executing CUDA-parallelized code. However, the parallelized computations seem surprisingly slow, so I launched Pkg.test("CUDA") from Julia to get some more insight into why that ..
I have just installed the nvidia CUDA toolkit on my fresh Ubuntu 20.04 installation. Nvcc compiles CUDA programs, and they run without crashing. However, none of the results are correct. Here is the output of the test script (deviceQuery) that Nvidia provides: ./deviceQuery Starting… CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 ..
I managed to sort different arrays with CPU or GPU (without using shared memory) implementation of Odd-Even sort algorithm, but I’m having issues with using shared memory in CUDA. This is my invocation of the kernel -> main.cu: //shared memory size sizeVSM = nThreadPerBlock.x * sizeof(float); for (int i = 0; i < array_size; i++) ..
I have a piece of code that is tested on various Ubuntu 18 and Ubuntu 20 servers. It worked fine. But while deploying the same code on a new Laptop with GeForce GTX 1650 SUPER we are getting the following exception. Openedterminate called after throwing an instance of ‘cv::Exception’ what(): OpenCV(4.5.1-dev) /home/user/opencv_build/opencv_contrib/modules/cudaimgproc/src/canny.cpp:140: error: (-215:Assertion failed) ..
I have a design in C++ where I can compile time generate function calls for templated functions based on an std::integer_sequence. This works for CPU, but I would like to extend this to GPU using CUDA. However, I am not allowed to put a __global__ function into a struct. Is there another way to achieve ..
I have ported an algorithm, which exhibits good parallel efficiency through OpenMP on CPU, to the GPU through the OpenMP target directive (targeting nvptx). The performance however is lacking, as I only get a <2.0 speed up compared to single core performance. I have tried to minimize data movement and optimize for memory coalescing. I ..