Threading Building Blocks

Threading Building Blocks (TBB) is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

by Wikipedia

How to use

on MacOS (tested on Catalina)

  1. brew install tbb

  2. echo | gcc -v -x c++ -E -

    • If there is not /usr/local/include below #include <...> search starts here:

    • Include the path first.

    • Also, check ls -l /usr/local/lib/*tbb*

  3. ln -s /usr/local/Cellar/tbb/2020_U3_1/include/tbb /usr/local/include/tbb

  4. g++ -g -std=c++17 <FILENAME> -pthread -ltbb

  5. ./a.out

There are some useful library files.

It's fast!

 

I referred to parallel_reduce().

#include <iostream>
#include <vector>
#include <numeric>
#include <chrono>
#include <tbb/parallel_reduce.h>
#include <tbb/blocked_range.h>
using namespace std;
using namespace tbb;

class SumFoo {
    float* my_a;
public:
    float my_sum;
    void operator()( const blocked_range<size_t>& r ) {
        float *a = my_a;
        float sum = my_sum;
        size_t end = r.end();
        for( size_t i=r.begin(); i!=end; ++i )
            sum += a[i];
        my_sum = sum;
    }

    SumFoo( SumFoo& x, split ) : my_a(x.my_a), my_sum(0) {}

    void join( const SumFoo& y ) {my_sum+=y.my_sum;}

    SumFoo( float a[] ) :
        my_a(a), my_sum(0)
    {}
};

float SerialSumFoo( float a[], size_t n ) {
    float sum = 0;
    for( size_t i=0; i!=n; ++i )
        sum += a[i];
    return sum;
}

float ParallelSumFoo( float a[], size_t n ) {
    SumFoo sf(a);
    parallel_reduce( blocked_range<size_t>(0,n), sf );
    return sf.my_sum;
}

const size_t MAX_SZ = 10'000'000;
float a[MAX_SZ];
int main() {
    for (int i = 0; i < MAX_SZ; i++) a[i] = 0.1;

    {
        const auto t1 = std::chrono::high_resolution_clock::now();
        float result = SerialSumFoo(a, MAX_SZ);
        const auto t2 = std::chrono::high_resolution_clock::now();
        const std::chrono::duration<double, std::milli> ms = t2 - t1;
        cout << "Serial Sum = " << result << ", took " << ms.count() << " ms" << endl;
    }
    {
        const auto t1 = std::chrono::high_resolution_clock::now();
        float result = ParallelSumFoo(a, MAX_SZ);
        const auto t2 = std::chrono::high_resolution_clock::now();
        const std::chrono::duration<double, std::milli> ms = t2 - t1;
        cout << "Parallel Sum: " << result << ", took " << ms.count() << " ms" << endl;
    }

    return 0;
}

 

  • In addition, we are able to use Parallel STL.

  • Before including headers, install pstl from OneDPL.

 

+ Recent posts