Performance
From K-3D
From a recent mailing-list post by Bart Janssens <bart.janssens@lid.kviv.be>
I've looked into improving performance using multithreading on a dual core machine, and using SSE2 instructions to vectorize certain calculations. Before rewriting code, I have done some profiling. This indicated that during transformation of points, most work was done in the transformation loop over all points in the TweakPoints module, so I made a test program with a similar loop and tried to optimize it. As it turns out (on my Pentium D) speed is actually limited by memory bandwidth, and SSE2 or multithreading have very little effect. If the data fits into the cache, the expected speedup is there, but of course for a 3D object this is almost never the case. I also tried eliminating most of the work of the TweakPoints loop by using a map containing only modified points, and then looping over that instead of all points. This has no effect on the framerate, so I am guessing most time is actually lost during the opengl drawing (which I could not get to show up in the profiling output). Recently, there was some discussion about rewriting the OpenGL drawing code. Are there any plans on when this will be done, and who will do it? I can start on it in August at the earliest, since until then I'm caught up in my thesis. I just wanted to share these findings on multithreading and SSE2 so noone else would waste time on trying to improve performance this way, since there are some other bottlenecks that need to be tackled before these techniques will have a noticeable effect. I'm guessing the same is true for doing vector ops using the GPU, which has also been mentioned here. After the current bottlenecks have been identified and eliminated we can look at SSE2 and multi-core again, and hopefully there will be a noticeable speedup then.
See also: Mesh Painters and Array Based Mesh Design.