Friday, November 25, 2011

So Slow

I still haven't managed to make it run any faster. It might be as fast as it is going to get with cublas and cusparse, in which case it means I need to find better libraries (I am convinced writing it all myself is not going to be any faster, plus people have already dealt with these problems, so I shouldn't need to reinvent the wheel, but suggestions are welcome).

The slow down is considerable:

For a 10x10x10 grid, the GPU takes around 33ms in total (~7 in the analysis phase, ~24ms in total for the iterations, ~2ms doing other things).
For the same grid, the CPU does it in 7ms (no anlaysis phase, ~5ms in total for the iterations, ~2ms doing other things).
And the GPU just takes longer and longer time as the grid fills up, with as much as 60ms being spent (I didn't wait more to get higher numbers).

I was looking into a library, CUSP, but I couldn't get it to compile with my project, so if there are any CUSP users, please let me know how to do that =)

And that is it for now.

Sunday, November 20, 2011

Timing

So I timed the different parts of the simulation both in the CPU and GPU. This is what I found.


The y-axis is the amount of milliseconds (Excel doesn't like me much).

Two things are worth noting:
1. The GPU project function (this is where the CG is done) is a lot slower in the GPU.
2. Advecting velocities is very slow as well, which makes for a good candidate to be moved to the GPU.





























































































Saturday, November 19, 2011

GPU Time!

I finally got it all down in the GPU, back, and rendering!
I also got it work for 2D as well as 3D grids.

Here is a graph of current frame rate in frames per second (y axis) vs. size of the grid (x-axis). The CPU is way faster than the GPU which means it is time for a lot of optimizations to be done or use a new approach (or both).


But having everything finally working in the GPU is a good thing.

And here is a screen shot of smoking in the GPU.


I am going to make a movie later and post it.


Plans for the week:

-CHANGE THE COLOR OF THE SMOKE
-Make improvisations on the GPU code
-Make movies for large grids (I am sorry computers)

Friday, November 11, 2011

Presentation, presentation, presentation

Who would have said that working on a presentation would mean so much time invested?

My presentation is (apparently) going to be on Conjugate Gradient on the GPU.

My references are:

For an introduction to the Conjugate Gradient Method:
 
For more in depth GPU CG:

For Samples on CG with Cuda (CUBLAS, cuSPARCE):

For fluids simulation:

And I was going to read more about fluids simulation on the GPU but decided it was time to update my blog.

So what have I learnt?

- The samples from CUDA are helpful to understand CG behind the scenes. Unfortunately, they aren't optimized (to make code clearer to understand) but I think I should be able to use them as a starting point.
- In CH 38, the code was implemented in the fragment shaders using Cg. They also moved every step to the GPU (i.e. advection, diffusion, etc).
- The dominating operations for CG is the matrix-vector multiplication. So this is where the most of the optimizations should be centered. 

This has given me a good idea what to read next, if not for the presentation, for the implementation of the CG solver later on, after ditchinig CUBLAS and cuSPARCE. Hopefully, one day I'll be done with the reading and get on with the code.

I have also started playing around with the code. I think I now understand how the A matrix is built in the framework so that I can translate it to the data structures asked for in the methods used from the libraries to implement the CG solver. What is holding me back is a crash when I try to print out... My guess: some memory leak or plain bad luck. 

Next steps:
  • Finish the presentation (which means start the presentation in PowerPoint)
  • Find out why the code is breaking.
  • Tackle the feedback received from the checkpoint presentation (probably for the final video):
    • Get running times (FPS in current implementation in the CPU)
    • Try larger grids; report sizes and FPS 
    • Explain the A-Matrix better and the semi lagrangian method

Friday, November 4, 2011

And I finally got the framework! So now I have to make it work =) So no real updates yet, possibly by Monday.