So alright, I registered.

Hello!
Yes, CUDA is great for highly parallel operations. Convolution will lend itself great to this, that's why one of first audio uses of CUDA (or, better said, GPGPU - we have OpenCL as well) was precisely convolution. There are things which won't work as well, because they depend on linearly executed algorithms - like delays, algorithmic reverbs, lookahead compressors/limiters and most filter designs. Why? Because they depend on previously calculated samples, and with this high level of parallelism that we have in GPGPUs, there is a problem of returning values consistently in time, which is highly relevant for linear operations.
So, this means that you cannot simply port the whole synth structure to a GPU, because things depend on each other - oscillators precede mixers which precede filters etc. This is what would actually cause greater latency than when using a regular CPU which has special registers to help with fast calculation of certain operations (MMX, SSE, AltiVec, etc.). And that is why GPGPU is not yet used for offloading whole synth architectures from the main CPU.
Now, SOME things can be offloaded to them, and when they are done well, it's a splendid thing. For example, the analytic zero-feedback delay filter calculation that's done in u-he Diva could be parallelized on a GPU to great extent - but this is not yet in u-he's plans.
There is also a problem with compatibility and lack of proper standard: we have CUDA, OpenCL and Microsoft's Direct Compute. As if it's not enough to support VST, AU, RTAS and AAX?
