Monday, April 11, 2011

C++ Parallelization Libraries: OpenMP vs. Thread Building Blocks

Hi,

I'm going to retrofit my custom graphics engine so that it takes advantage of multicore CPUs. More exactly, I am looking for a library to parallelize loops.

It seems to me that both OpenMP and Intel's Thread Building Blocks are very well suited for the job. Also, both are supported by Visual Studio's C++ compiler and most other popular compilers. And both libraries seem quite straight-forward to use.

So, which one should I choose? Has anyone tried both libraries and can give me some cons and pros of using either library? Also, what did you choose to work with in the end?

Thanks,

Adrian

From stackoverflow
  • From Intel's software blog: Compare Windows* threads, OpenMP*, Intel® Threading Building Blocks for parallel programming

    It is also the matter of style - for me TBB is very C++ like, while I don't like OpenMP pragmas that much (reeks of C a bit, would use it if I had to write in C).

    I would also consider the existing knowledge and experience of the team. Learning a new library (especially when it comes to threading/concurrency) does take some time. I think that for now, OpenMP is more widely known and deployed than TBB (but this is just mine opinion).

    Yet another factor - but considering most common platforms, probably not an issue - portability. But the license might be an issue.

    • TBB incorporates some of nice research originating from academic research, for example recursive data parallel approach.
    • There is some work on cache-friendliness, for example.
    • Lecture of the Intel blog seems really interesting.
    Adrian Grigore : Thanks for the link, but since it is hosted on Intel's website, I would not really trust it with providing a completely unbiased opinion. Clearly they wrote the article to promote usage of their own library.
    Anonymous : Yes, forgot the emoticon somewhere in the first line ;)
  • I haven't used TBB extensively, but my impression is that they complement each other more than competing. TBB provides threadsafe containers and some parallel algorithms, whereas OpenMP is more of a way to parallelise existing code.

    Personally I've found OpenMP very easy to drop into existing code where you have a parallelisable loop or bunch of sections that can be run in parallel. However it doesn't help you particularly for a case where you need to modify some shared data - where TBB's concurrent containers might be exactly what you want.

    If all you want is to parallelise loops where the iterations are independent (or can be fairly easily made so), I'd go for OpenMP. If you're going to need more interaction between the threads, I think TBB may offer a little more in that regard.

    Anonymous : Good point about the existing code. It's easier to plug few pragma's here and there. Plugging in TBB might be more difficult (much depends on the existing code style)
  • Viva64 links: Parallel Programming.

  • In general I have found that using TBB requires much more time consuming changes to the code base with a high payoff while OpenMP gives a quick but moderate payoff. If you are staring a new module from scratch and thinking long term go with TBB. If you want small but immediate gains go with OpenMP.

    Also, TBB and OpenMP are not mutually exclusive.

  • I've actually used both, and my general impression is that if your algorithm is fairly easy to make parallel (e.g. loops of even size, not too much data interdependence) OpenMP is easier, and quite nice to work with. In fact, if you find you can use OpenMP, it's probably the better way to go, if you know your platform will support it. I haven't used OpenMP's new Task structures, which are much more general than the original loop and section options.

    TBB gives you more data structures up front, but definitely requires more up front. As a plus, it might be better at making you aware of race condition bugs. What I mean by this is that it is fairly easy in OpenMP to enable race conditions by not making something shared (or whatever) that should be. You only see this when you get bad results. I think this is a bit less likely to occur with TBB.

    Overall my personal preference was for OpenMP, especially given its increased expressiveness with tasks.

  • In Visual Studio 2008, you can add the following line to parallelize any "for" loop. It even works with multiple nested for loops. Here is an example:

    #pragma omp parallel for private(i,j)
    for (i=0; i<num_particles; i++)
    {
      p[i].fitness = fitnessFunction(p[i].present);
      if (p[i].fitness > p[i].pbestFitness)
      { 
         p[i].pbestFitness = p[i].fitness;
         for (j=0; j<p[i].numVars; j++) p[i].pbest[j] = p[i].present[j];
      }
    }  
    gbest = pso_get_best(num_particles, p);
    

    After we added the #pragma omp parallel, both cores on my Core 2 Duo were used to their maximum capacity, so total CPU usage went from 50% to 100%.

0 comments:

Post a Comment