Threading Building Blocks
Threading Building Blocks (TBB) is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.
by Wikipedia
How to use
on MacOS (tested on Catalina)
-
brew install tbb
-
echo | gcc -v -x c++ -E -
-
If there is not /usr/local/include below #include <...> search starts here:
-
Include the path first.
-
Also, check ls -l /usr/local/lib/*tbb*
-
-
ln -s /usr/local/Cellar/tbb/2020_U3_1/include/tbb /usr/local/include/tbb
-
g++ -g -std=c++17 <FILENAME> -pthread -ltbb
-
./a.out
There are some useful library files.
It's fast!
I referred to parallel_reduce().
#include <iostream>
#include <vector>
#include <numeric>
#include <chrono>
#include <tbb/parallel_reduce.h>
#include <tbb/blocked_range.h>
using namespace std;
using namespace tbb;
class SumFoo {
float* my_a;
public:
float my_sum;
void operator()( const blocked_range<size_t>& r ) {
float *a = my_a;
float sum = my_sum;
size_t end = r.end();
for( size_t i=r.begin(); i!=end; ++i )
sum += a[i];
my_sum = sum;
}
SumFoo( SumFoo& x, split ) : my_a(x.my_a), my_sum(0) {}
void join( const SumFoo& y ) {my_sum+=y.my_sum;}
SumFoo( float a[] ) :
my_a(a), my_sum(0)
{}
};
float SerialSumFoo( float a[], size_t n ) {
float sum = 0;
for( size_t i=0; i!=n; ++i )
sum += a[i];
return sum;
}
float ParallelSumFoo( float a[], size_t n ) {
SumFoo sf(a);
parallel_reduce( blocked_range<size_t>(0,n), sf );
return sf.my_sum;
}
const size_t MAX_SZ = 10'000'000;
float a[MAX_SZ];
int main() {
for (int i = 0; i < MAX_SZ; i++) a[i] = 0.1;
{
const auto t1 = std::chrono::high_resolution_clock::now();
float result = SerialSumFoo(a, MAX_SZ);
const auto t2 = std::chrono::high_resolution_clock::now();
const std::chrono::duration<double, std::milli> ms = t2 - t1;
cout << "Serial Sum = " << result << ", took " << ms.count() << " ms" << endl;
}
{
const auto t1 = std::chrono::high_resolution_clock::now();
float result = ParallelSumFoo(a, MAX_SZ);
const auto t2 = std::chrono::high_resolution_clock::now();
const std::chrono::duration<double, std::milli> ms = t2 - t1;
cout << "Parallel Sum: " << result << ", took " << ms.count() << " ms" << endl;
}
return 0;
}
-
In addition, we are able to use Parallel STL.
-
Before including headers, install pstl from OneDPL.
'Programming Language > C++' 카테고리의 다른 글
표준 템플릿 라이브러리(STL) - 수치 알고리즘 (0) | 2020.11.22 |
---|---|
표준 템플릿 라이브러리(STL) - 순열 알고리즘 (0) | 2020.11.22 |
표준 템플릿 라이브러리(STL) - 비교 알고리즘 (0) | 2020.11.22 |
표준 템플릿 라이브러리(STL) - 최대, 최소 원소 탐색 알고리즘 (0) | 2020.11.22 |
표준 템플릿 라이브러리(STL) - 힙 알고리즘 (0) | 2020.11.22 |