So I timed the different parts of the simulation both in the CPU and GPU. This is what I found.
The y-axis is the amount of milliseconds (Excel doesn't like me much).
Two things are worth noting:
1. The GPU project function (this is where the CG is done) is a lot slower in the GPU.
2. Advecting velocities is very slow as well, which makes for a good candidate to be moved to the GPU.