I’m writing an OpenCL kernel and need to have a certain define. Through the compiler arguments for the kernel I set a define called VECTOR_SIZE which is just a number, e.g. 2, 8 or 16. Then in the kernel I need a define that would extend to float2, float8 or float16. I tried this: #define ..
I tried to make parallel bfs in openCL but I didn’t have enough experience with c++. So this is probably memory error, but I really don’t know how to fix it. I also can’t find what does error value -51 means. As a result I got "Unhandled exception at 0x00007FFCFB06A549 (amdocl64.dll) in my project.exe: 0xC0000005: ..
I Have an AMD RX 570 4G, Opencl tells me that I can use a Maximum of 256 Workgroup and 256 WorkItem per group… Let’s say I use all 256 Workgroup with 256 WorkItem in each of them, Now, What is the Maximum Size of private memory per work item? Source: Windows Que..
This is the Path Tracer program by Kevin Beason. I need to make it work with OpenCL. I have difficulties in finding which function from the below code is the best for declaring as kernel and how it should look like. Do I need to include any other library in order to use OpenCL? I ..
Running the example below of a simple opencl vector addition,it seems that an i7 cpu beats the gpu gtx 960 in execution time benchmark. That’s mostly because the host to device and vice versa data transfers are too expensive. So, my question is that, is worth to invest to a gpu framework like opencl or ..
I produce a log file (with OpenCL on OSX) doing string buildlog = program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(device); ofstream file("./error.txt"); file << buildlog << endl; file.close(); But the resulting file seems to interpret the code as a single VERY long line. For example, an error is located as <program source>:1:10604:. Which is not very convenient for debugging. Do you ..
I have written an OpenCL kernel for a simple image processing task : doing an opening(erosion + dilation) and then on the result of that opening operation, perform a closing (dilation + erosion). The problem is the output I get as shown in the following image: I tried doing only the opening operation, and it ..
I’ve been having an issue with a certain function call in the dphaseWeighted = af::convolve(dphaseWeighted, m_slowTimeFilter); which seem to produce nothing but nan’s. The back ground is we have recently switched from using AF OpenCL to AF Cuda and the problem that we are seeing happens in the function. dphaseWeighted = af::convolve(dphaseWeighted, m_slowTimeFilter); This seems ..
I am trying to use GPU to parallelize my code in Visual Studio C++. Currently, I used OpenMP to use CPU parallelization. But I am thinking of using GPU parallelization because I think it would be faster if I use a bigger size of arrays in calculations. Below is the code that I am working ..
This is a problem encountered in practice. Regarding cmake, it involves opencl and cuda. I want to use cmake to build a project with two parts of code, opencl and cuda. But what I finally realized was only the opencl part, and the cuda part did not come out. Through my own debugging, I can ..