How can you use managed (unified) memory for an image?

  c++, cuda, opencv

I have spent all day yesterday reading how to use managed (unified) memory array for a CUDA program (using the book Professional CUDA Programming, practiced some of the sample code (although I still got doubts about the profiler info) and I am ready to apply it to my program that uses both a CUDA kernel and some OpenCV functions.

I have several questions, but let me address here the first one.

I have

cv::Mat h_image;
h_image = cv::imread(dirname+image_filenames[ni], cv::IMREAD_GRAYSCALE);

cv::cuda::GpuMat d_image;
// 2. Upload the Image

So I have an image read with imread and I upload it to the device memory.
How can I use unified memory for this?

In theory, to use unified memory I can have (with float arrays)

float *A;
cudaMallocManaged((void **)&A, nBytes);

or even (and I prefer this)

__device__ __managed__ float A[67108864];

Is there a way to do something similar with Mats and GpuMats?

Source: Windows Questions C++