GPU Programming Final Project: 2011

Monday, December 19, 2011

The year ends

The semester has come to an end.

So here is my video:

I am satisfied with what I accomplished and I got a lot more familiar with choosing number of threads and debugging CUDA.

There are a few things I would like to do once I get a good computer like:

- Get the velocity advection in the GPU
- Get the whole simulation in the GPU so there is no data going back and forth between the CPU and GPU
- Make it look good (i.e. better renderer)

Saturday, December 17, 2011

Progress!!

So, I finally got the density advection to work in the GPU. Time to try to get the velocity advection in as well.

I gave up on 3D blocks and set for a 1D block and just figuring out the 3D indexing from it. This works much better when there are more than 1 block.
Also, I found out that a lot of my problems consisted on using more resources than what I had. So, everybody, error checking is GOOD. On the bright side, I was right about needing more registers when I first saw those weird things happening, so go me!

Here is a 5x5x5 grid running with the density advection (I have run this on 100x50x50 grids and it works but forgot to get a screenshot and it would just take too much time to do it again).

Friday, December 16, 2011

More issues but with size of blocks

So, indexing issues are not present anymore, sort of.

Now, it is something that I feel I need to understand CUDA and the GPU more than I do currently to figure it out.

Here is the deal.
I have a 1D array of size 150 corresponding to a grid of size 6x5x5. So, in my kernel, the block size is dim3(6,5,5). I am only trying to write 10.0f in every spot in the array. In this case, all the numbers are random floats.

If instead I have dim3(6,5,4), it writes 10.0f to the corresponding spots.

Why is this? The number of threads is not even greater than 512, so I don't understand why this is happening.

Any suggestion is appreciated, especially since Google has just failed me.

Wednesday, December 7, 2011

Gone for the week

I am still trying to get the density to work in the GPU. At least I fixed the indexing problem.

This will be the only update for the week because I am going away until mid next week.

Friday, December 2, 2011

Updates

This week I decided to put off on trying to make CG faster.

Instead, I am starting to put other parts of the simulation into the GPU.

I started by trying to get velocity advection to work but later switched gears to getting density advection to work. It seemed to me like a good thing to do because it involves less functions where things can go wrong. If it works I can be sure that the common functions to all advections are working, so if the other advections don't work, then I already have a bunch of possible places for error ruled out.

But I have no further developments in this area. I am slowly checking that I haven't missed any details in the implementation of the functions in the GPU also checking the accessing of the data and such.

Let's see how this goes this weekend.

Friday, November 25, 2011

So Slow

I still haven't managed to make it run any faster. It might be as fast as it is going to get with cublas and cusparse, in which case it means I need to find better libraries (I am convinced writing it all myself is not going to be any faster, plus people have already dealt with these problems, so I shouldn't need to reinvent the wheel, but suggestions are welcome).

The slow down is considerable:

For a 10x10x10 grid, the GPU takes around 33ms in total (~7 in the analysis phase, ~24ms in total for the iterations, ~2ms doing other things).

For the same grid, the CPU does it in 7ms (no anlaysis phase, ~5ms in total for the iterations, ~2ms doing other things).

And the GPU just takes longer and longer time as the grid fills up, with as much as 60ms being spent (I didn't wait more to get higher numbers).

I was looking into a library, CUSP, but I couldn't get it to compile with my project, so if there are any CUSP users, please let me know how to do that =)

And that is it for now.

Sunday, November 20, 2011

Timing

So I timed the different parts of the simulation both in the CPU and GPU. This is what I found.

The y-axis is the amount of milliseconds (Excel doesn't like me much).

Two things are worth noting:

1. The GPU project function (this is where the CG is done) is a lot slower in the GPU.

2. Advecting velocities is very slow as well, which makes for a good candidate to be moved to the GPU.

Saturday, November 19, 2011

GPU Time!

I finally got it all down in the GPU, back, and rendering!

I also got it work for 2D as well as 3D grids.

Here is a graph of current frame rate in frames per second (y axis) vs. size of the grid (x-axis). The CPU is way faster than the GPU which means it is time for a lot of optimizations to be done or use a new approach (or both).

But having everything finally working in the GPU is a good thing.

And here is a screen shot of smoking in the GPU.

I am going to make a movie later and post it.

Plans for the week:

-CHANGE THE COLOR OF THE SMOKE

-Make improvisations on the GPU code

-Make movies for large grids (I am sorry computers)

Friday, November 11, 2011

Presentation, presentation, presentation

Who would have said that working on a presentation would mean so much time invested?

My presentation is (apparently) going to be on Conjugate Gradient on the GPU.

My references are:

For an introduction to the Conjugate Gradient Method:

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

For more in depth GPU CG:

Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid

Accelerating the Conjugate Gradient Method with CUDA

For Samples on CG with Cuda (CUBLAS, cuSPARCE):

Preconditioned CG

For fluids simulation:

Chapter 38. Fast Fluid Dynamics Simulation on the GPU

And I was going to read more about fluids simulation on the GPU but decided it was time to update my blog.

So what have I learnt?

- The samples from CUDA are helpful to understand CG behind the scenes. Unfortunately, they aren't optimized (to make code clearer to understand) but I think I should be able to use them as a starting point.

- In CH 38, the code was implemented in the fragment shaders using Cg. They also moved every step to the GPU (i.e. advection, diffusion, etc).

- The dominating operations for CG is the matrix-vector multiplication. So this is where the most of the optimizations should be centered.

This has given me a good idea what to read next, if not for the presentation, for the implementation of the CG solver later on, after ditchinig CUBLAS and cuSPARCE. Hopefully, one day I'll be done with the reading and get on with the code.

I have also started playing around with the code. I think I now understand how the A matrix is built in the framework so that I can translate it to the data structures asked for in the methods used from the libraries to implement the CG solver. What is holding me back is a crash when I try to print out... My guess: some memory leak or plain bad luck.

Next steps:

Finish the presentation (which means start the presentation in PowerPoint)
Find out why the code is breaking.
Tackle the feedback received from the checkpoint presentation (probably for the final video):

Get running times (FPS in current implementation in the CPU)
Try larger grids; report sizes and FPS
Explain the A-Matrix better and the semi lagrangian method

Sunday, November 6, 2011

Alpha Presentation Video

Friday, November 4, 2011

And I finally got the framework! So now I have to make it work =) So no real updates yet, possibly by Monday.

Tuesday, October 18, 2011

Smoke Simulation in the GPU - Pitch

Smoke Simulation in the CPU is very slow and not suited for real time projects, even in small grids. Each frame takes many seconds to render, depending on the size of the grid, and the most realistic simulations usually require bigger grids than 10 X 1 cells. I would like to make it more efficient by using the GPU to implement costly calculations that slow down the simulation and be able to simulate smoke in bigger grids. I will also use this time to get a good-looking rendering of smoke.

The first step would be to move the conjugate gradient solver to the GPU, since this is the slowest step in the simulation. After that I could focus on either improving the looks by implementing blackbody rendering and/or continue taking more steps into the GPU to increase performance further.

Previous work on fluid simulation on the GPU:

http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html

http://www.cs.columbia.edu/cg/pdfs/28_GPUSim.pdf

Thursday, October 6, 2011

First post for my final project in GPU programming. I still haven't made up my mind what my project is so I am using a generic title and URL =D