Monday, December 19, 2011

The year ends

The semester has come to an end.

So here is my video:

I am satisfied with what I accomplished and I got a lot more familiar with choosing number of threads and debugging CUDA.

There are a few things I would like to do once I get a good computer like:

- Get the velocity advection in the GPU
- Get the whole simulation in the GPU so there is no data going back and forth between the CPU and GPU
- Make it look good (i.e. better renderer)

Saturday, December 17, 2011


So, I finally got the density advection to work in the GPU. Time to try to get the velocity advection in as well.

I gave up on 3D blocks and set for a 1D block and just figuring out the 3D indexing from it. This works much better when there are more than 1 block.
Also, I found out that a lot of my problems consisted on using more resources than what I had. So, everybody, error checking is GOOD. On the bright side, I was right about needing more registers when I first saw those weird things happening, so go me!

Here is a 5x5x5 grid running with the density advection (I have run this on 100x50x50 grids and it works but forgot to get a screenshot and it would just take too much time to do it again).

Friday, December 16, 2011

More issues but with size of blocks

So, indexing issues are not present anymore, sort of.

Now, it is something that I feel I need to understand CUDA and the GPU more than I do currently to figure it out.

Here is the deal.
I have a 1D array of size 150 corresponding to a grid of size 6x5x5. So, in my kernel, the block size is dim3(6,5,5). I am only trying to write 10.0f in every spot in the array. In this case, all the numbers are random floats.

If instead I have dim3(6,5,4), it writes 10.0f to the corresponding spots.

Why is this? The number of threads is not even greater than 512, so I don't understand why this is happening.

Any suggestion is appreciated, especially since Google has just failed me.

Wednesday, December 7, 2011

Gone for the week

I am still trying to get the density to work in the GPU. At least I fixed the indexing problem.

This will be the only update for the week because I am going away until mid next week.

Friday, December 2, 2011


This week I decided to put off on trying to make CG faster.
Instead, I am starting to put other parts of the simulation into the GPU.

I started by trying to get velocity advection to work but later switched gears to getting density advection to work. It seemed to me like a good thing to do because it involves less functions where things can go wrong. If it works I can be sure that the common functions to all advections are working, so if the other advections don't work, then I already have a bunch of possible places for error ruled out.

But I have no further developments in this area. I am slowly checking that I haven't missed any details in the implementation of the functions in the GPU also checking the accessing of the data and such.

Let's see how this goes this weekend.

Friday, November 25, 2011

So Slow

I still haven't managed to make it run any faster. It might be as fast as it is going to get with cublas and cusparse, in which case it means I need to find better libraries (I am convinced writing it all myself is not going to be any faster, plus people have already dealt with these problems, so I shouldn't need to reinvent the wheel, but suggestions are welcome).

The slow down is considerable:

For a 10x10x10 grid, the GPU takes around 33ms in total (~7 in the analysis phase, ~24ms in total for the iterations, ~2ms doing other things).
For the same grid, the CPU does it in 7ms (no anlaysis phase, ~5ms in total for the iterations, ~2ms doing other things).
And the GPU just takes longer and longer time as the grid fills up, with as much as 60ms being spent (I didn't wait more to get higher numbers).

I was looking into a library, CUSP, but I couldn't get it to compile with my project, so if there are any CUSP users, please let me know how to do that =)

And that is it for now.

Sunday, November 20, 2011


So I timed the different parts of the simulation both in the CPU and GPU. This is what I found.

The y-axis is the amount of milliseconds (Excel doesn't like me much).

Two things are worth noting:
1. The GPU project function (this is where the CG is done) is a lot slower in the GPU.
2. Advecting velocities is very slow as well, which makes for a good candidate to be moved to the GPU.