Is there a faster argmin/argmax implementation in OpenACC?

  argmax, c++, openacc, optimization

Is there a faster alternative for computing the argmin in OpenACC, than splitting the work in a minimum-reduction loop and another loop to actually find the index of the minimum?

This looks very wasteful:

    float minVal = std::numeric_limits<float>::max();
    #pragma acc parallel loop reduction(min: minVal)
    for(int i = 0; i < arraySize; ++i) {
        minVal = fmin(minVal, array[i]);
    #pragma acc parallel loop
    for(int i = 0; i < arraySize; ++i) {
        if(array[i] == minVal){
            minIndex = i;

In fact, this became a bottleneck for my current project.

Source: Windows Questions C++