So, I finally got the density advection to work in the GPU. Time to try to get the velocity advection in as well.
I gave up on 3D blocks and set for a 1D block and just figuring out the 3D indexing from it. This works much better when there are more than 1 block.
Also, I found out that a lot of my problems consisted on using more resources than what I had. So, everybody, error checking is GOOD. On the bright side, I was right about needing more registers when I first saw those weird things happening, so go me!
Here is a 5x5x5 grid running with the density advection (I have run this on 100x50x50 grids and it works but forgot to get a screenshot and it would just take too much time to do it again).