Is there a faster alternative for computing the argmin in OpenACC, than splitting the work in a minimum-reduction loop and another loop to actually find the index of the minimum?

This looks very wasteful:

```
float minVal = std::numeric_limits<float>::max();
#pragma acc parallel loop reduction(min: minVal)
for(int i = 0; i < arraySize; ++i) {
minVal = fmin(minVal, array[i]);
}
#pragma acc parallel loop
for(int i = 0; i < arraySize; ++i) {
if(array[i] == minVal){
minIndex = i;
}
}
```

In fact, this became a bottleneck for my current project.

Source: Windows Questions C++