So, indexing issues are not present anymore, sort of.
Now, it is something that I feel I need to understand CUDA and the GPU more than I do currently to figure it out.
Here is the deal.
I have a 1D array of size 150 corresponding to a grid of size 6x5x5. So, in my kernel, the block size is dim3(6,5,5). I am only trying to write 10.0f in every spot in the array. In this case, all the numbers are random floats.
If instead I have dim3(6,5,4), it writes 10.0f to the corresponding spots.
Why is this? The number of threads is not even greater than 512, so I don't understand why this is happening.
Any suggestion is appreciated, especially since Google has just failed me.