From K-3D

Jump to: navigation, search

From a recent mailing-list post by Bart Janssens <>

I've looked into improving performance using multithreading on a dual core 
machine, and using SSE2 instructions to vectorize certain calculations.

Before rewriting code, I have done some profiling. This indicated that during 
transformation of points, most work was done in the transformation loop over 
all points in the TweakPoints module, so I made a test program with a similar 
loop and tried to optimize it. As it turns out (on my Pentium D) speed is 
actually limited by memory bandwidth, and SSE2 or multithreading have very 
little effect. If the data fits into the cache, the expected speedup is 
there, but of course for a 3D object this is almost never the case.

I also tried eliminating most of the work of the TweakPoints loop by using a 
map containing only modified points, and then looping over that instead of 
all points. This has no effect on the framerate, so I am guessing most time 
is actually lost during the opengl drawing (which I could not get to show up 
in the profiling output).

Recently, there was some discussion about rewriting the OpenGL drawing code. 
Are there any plans on when this will be done, and who will do it? I can 
start on it in August at the earliest, since until then I'm caught up in my 
thesis. I just wanted to share these findings on multithreading and SSE2 so 
noone else would waste time on trying to improve performance this way, since 
there are some other bottlenecks that need to be tackled before these 
techniques will have a noticeable effect. I'm guessing the same is true for 
doing vector ops using the GPU, which has also been mentioned here.

After the current bottlenecks have been identified and eliminated we can look 
at SSE2 and multi-core again, and hopefully there will be a noticeable 
speedup then.

See also: Mesh Painters and Array Based Mesh Design.