青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

posts - 319, comments - 22, trackbacks - 0, articles - 11
  C++博客 :: 首頁 :: 新隨筆 :: 聯系 :: 聚合  :: 管理

Parallel Programming Essentials via the Intel TBB

Posted on 2011-09-15 07:05 RTY 閱讀(657) 評論(0)  編輯 收藏 引用 所屬分類: TBB需要注意的
http://www.codeproject.com/KB/Parallel_Programming/Threading_Blocks.aspx

Introduction

The Intel TBB was released by Intel during their movement to enhance system performance by cores, rather than by Gigahertz. It was meant for their use, but they released a trial download and an open source edition. It will scale to function as a powerful tool for those going into parallel programming and multi-core programming. Note that some powerful sample code can be downloaded from http://www.threadingbuildingblocks.org/.

Preface

Sometimes, the terms concurrency and parallelism are used interchangeably. But a multithreaded application can have threads execute simultaneously while residing on one physical chip. Parallelism, however, is achieved only when those concurrent threads are executing on separate cores of a multicore chip. When we write an algorithm to use concurrency, our most important design choice should be to figure out how to partition the original problem into individual sub-parts. Because most programs spend a lot of their execution time running loops over an iteration range, we should standardize the use of the Intel TBB. For example, loops which use C++ STL iterators loop over the contents of a collection of data. This iteration space must be determined before we can parallelize those loops. IntelTBB is a C++ template-based library for loop-level parallelism that concentrates on defining tasks rather than explicit threads. The components of TBB include generic parallel algorithms, concurrent containers, low-level synchronization primitives, and a task scheduler. At the time of this writing, TBB 3.0 is the most recent version. Programmers usingTBB can parallelize the execution of loop iterations by treating chunks of iterations as tasks and allowing the TBBtask scheduler to determine the task sizes, number of threads to use assignment of tasks to those threads, and how those threads are scheduled for execution. The task scheduler will give precedence to tasks that have been most recently in a core with the idea of making best use of the cache that likely contains the task’s data. When thinking tasks as opposed to threads, we should be concerned about performance, load balance between different sub-problems, data sharing, control, and data dependencies among the sub-problems.

The task scheduler utilizes a task-stealing mechanism to load balance the execution. Furthermore, breaking a program into separate functional blocks and assigning a separate thread to each block is a solution that usually does not scale well because, typically, the number of functional blocks is fixed. In contrast, Threading Building Blocks emphasizes data-parallel programming, enabling multiple threads to work most efficiently together. Data-parallel programming scales well to larger numbers of processors by dividing a data set into smaller pieces. Breaking apart a function into smaller bits of work will limit scalability and possibly render a performance hit, as the number of methods in a program is rarely dynamic. Taking a data parallelism approach resolves that issue by breaking the iteration space into independent units of work. Any thread that uses an algorithm template from the library or the task scheduler must have an initialized tbb::task_scheduler_init object. A thread may have more than one of these objects initialized at a time. The task scheduler shuts down when all task_scheduler_init objects terminate. By default, the constructor for task_scheduler_init does the initialization and the destructor does the termination. Thus, declaring a task_scheduler_init in main( ), as in the example shown below, both starts and shuts down the scheduler.

#include "tbb/task_scheduler_init.h"
using namespace tbb;
int main( ) {
task_scheduler_init init;
...
return 0;
}

The parallel_for template parallelizes tasks that are contained within a for loop. The template requires two parameters: a range type over which to iterate and a body type that executes iterations over the range or a subrange. The range class must define a copy constructor and a destructor, the methods is_empty() (which returns TRUE if the range is empty) and is_divisible() (which returns TRUE if the range can be split), and a splitting constructor (to divide the range in half). A partitioner class object can be used to heuristically find the smallest number of iterations that should be assigned. The TBB library contains two predefined range types:blocked_range and blocked_range2D. These ranges are used for single- and two dimensional ranges, respectively. The body class must define a copy constructor and a destructor as well as the operator(). Theoperator() will contain a copy of the original serial loop that has been modified to run over a subrange of values that come from the range type. The parallel_reduce template will iterate over a range and combine partial results computed by each task into a final (reduction) value. The range type for parallel_reduce has the same requirements as parallel_for. The body type needs a splitting constructor and a join method. The splitting constructor in the body copies read-only data required to run the loop body and to assign the identity element of the reduction operation that initializes the reduction variable. The join method combines partial results of tasks based on the reduction operation being used. So having that, let’s look at an example.

A Simple Example of Parallel Sum

The parallel sum operation can work with any associative and commutative combining operation. For example, the operations of multiplication, maximum, minimum, and some logical operations would all be appropriate. In a serial program, the commutative property of the operation is not important, since the order with which the serial execution combines the elements is the same each and every time. Nevertheless, to have any chance of performing the computation concurrently, we rely on this property of the operation, since concurrent execution does not impose a fixed schedule on the order of independent computations.

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <ctype.h>
#include "parallel_reduce.h"
#include "blocked_range.h"
#include "task_scheduler_init.h"
#include "tick_count.h"

using namespace tbb;

// Uncomment the line below to enable the auto_partitioner
#define AUTO_GRAIN

struct Sum {
    float value;
    Sum() : value(0) {}
    Sum( Sum& s, split ) {value = 0;}
    void operator()( const blocked_range<float*>& range ) {
        float temp = value;
        for( float* a=range.begin(); a!=range.end(); ++a ) {
            temp += *a;
        }
        value = temp;
    }
    void join( Sum& rhs ) {value += rhs.value;}
};

float ParallelSum( float array[], size_t n ) {
    Sum total;
#ifndef AUTO_GRAIN
    parallel_reduce( blocked_range<float*>( array, array+n, 1000 ), 
                     total );
#else /* AUTO_GRAIN */
    parallel_reduce( blocked_range<float*>( array, array+n ), 
                     total, auto_partitioner() );
#endif /* AUTO_GRAIN */
    return total.value;
}

//! Problem size
const int N = 1000000;

//! Number of threads to use.
static int MaxThreads = 4;

//! If true, print out bits of the array
static bool VerboseFlag = false;

//! Parse the command line.
static void ParseCommandLine( int argc, char* argv[] ) {
    int i = 1;
    if( i<argc )="=0" verboseflag="true;"><argc>
	<argc maxthreads="strtol(argv[i++],0,0);" i="0;">< N; ++i ) {
      input[i] = (float)(rand() % 10);
    }

    if( VerboseFlag ) {
      printf(" Input: ");
      for ( size_t i = 0; i < 7; ++i ) {
        printf("%7.2f ", input[i]);
      }
      printf("...\n");
    }

    // Try different numbers of threads
    for( int p=1; p<=MaxThreads; ++p ) {

        task_scheduler_init init(p);

        tick_count t0 = tick_count::now();
        float output = ParallelSum(input, N);
        tick_count t1 = tick_count::now();
        
        if( VerboseFlag ) {
          printf("Output: %7.2f\n", output);
        }
        printf("%2d threads: %.3f msec\n", p, (t1-t0).seconds()*1000);
    }
    return 0;
}

This Intel code was compiled via the Intel C++ Professional Compiler, a compiler that when installed, integrates into Microsoft Visual Studio 2008 or 2010. Once the installation is completed, then it is time to install the performance library Intel TBB 3.0. The included documentation will step you through configuring the environment in order for the compiler to reference the include header files and the Lib files. Notice the results:

 1 threads: 6.513 msec
 2 threads: 3.333 msec
 3 threads: 3.496 msec
 4 threads: 3.345 msec 

It turns out that executing 4 threads in parallel via the Intel TBB takes the time that the execution of one thread does. Another performance gain is how the Intel TBB library manages to keep things thread-safe. After ascertaining that the body is safe to run in parallel, then the simplest approach to parallelizing the loop would be to divide the size of the loop (the upper range – lowest range – assuming that the iterations are in ascending order) by the number of processors. This approach however, is best left to the task scheduler. A simple division process could easily lead to an inefficient use those processors.

Now consider a composite integer, composite meaning that the integer is either odd or even. We know that a prime number is divisible by itself and one. But any composite integer can be broken down into a factor of primes. Therefore, calculating and then locating the number of primes in a range of integers would involve a continual computational procedure. Examine this code below:

#include <cctype>
#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <math.h>
#include <cassert>
#include "parallel_reduce.h"
#include "task_scheduler_init.h"
#include "tick_count.h"

using namespace std;
using namespace tbb;

typedef unsigned long Number;

static bool PrintPrimes = false;

static Number GrainSize = 1000;

class Multiples {
    inline Number strike( Number start, 
                          Number limit, 
                          Number stride ) {
        
bool* is_composite = my_is_composite;

assert( stride>=2 );

for(start =0;start < limit; start+="stride" i+="2" +="m&1;" i="3;"
 n_factor="0;"
 my_factor="new" my_striker="new" 
my_is_composite="new"
 m="Number(sqrt(double(n)));" 
is_composite[start]="true;" k="0;"
is_composite="my_is_composite;" my_factor[n_factor++]="i;" 
my_striker[n_factor]="strike(" 

As we would expect, the higher degree of parallelism we go, the quicker the computation outputs its results:

#primes from [2..100000000] = 5761455 (1.10 sec with serial code)
#primes from [2..100000000] = 5761455 (1.10 sec with 1-way parallelism)
#primes from [2..100000000] = 5761455 (0.59 sec with 2-way parallelism)
#primes from [2..100000000] = 5761455 (0.57 sec with 3-way parallelism)
#primes from [2..100000000] = 5761455 (0.60 sec with 4-way parallelism)

Where Are We Now?

Up to this point, we have been noting that sometimes part of a computation involves computing the summation of a large set of values. If multiple processors are available, instead of adding the values together sequentially, the set can be partitioned and the summations of the subsets computed simultaneously, each on a different processor. The partial sums are then combined to get the final answer. Thus, using multiple processors to compute in parallel may allow us to obtain a solution sooner. Also, if each processor has its own memory, partitioning the data between the processors may allow larger problems to be handled than could be handled on a single processor. So if we wanted to form a set of integers that start at 1, and goes from 1/1 +n.. we could add them but the sequence would not end at the number 2. The sequence would be an infinite set of fractional values that approach to 2 as a limit, never actually equaling 2. But the more values that we add, the smaller they become.

Here is a simple seismic wave simulation (wave propagation) based on parallel_for and blocked_range. The main body of the program steps through the simulation of a seismic wave in a core loop that sets the impulse from the source of the disturbance, does the two tough computations of stress updates and velocities, and finally cleans the edges of the simulation. Seismic waves are waves of sound that travel through the core of the earth, perhaps because of the result of an earthquake or explosion. Because sound waves form a chain of waves (similar to how a train gets louder upon arrival; upon departure, the sound minimizes until it cannot be heard. That is the end of the chain of sound waves). In earlier times, scientists would blow some dynamite. With a device (similar in principle to a seismograph) that would track the directions of the sound waves. One direction meant diffraction off of a denser medium, and they learned by following those waves until diffraction would cause a change in direction. After a while, they would that one type of change in direction on the graph, mine that area, and perhaps would find salt. Sometimes they would even find oil, but their intentions were to find gold. Similarly, when a particle has been displaced, some force displaced it. It has moved in a vector path, meaning the displacement is measured by both the direction and magnitude. Tracing the path, and then plotting each point of that path on a grid would reveal where the change in direction took place.

I introduce this program because it espouses an underlying concept used in DNA sequencing, etc. in computer animation, rendering is the step where information from the animation files, such as lighting, textures, and shading, is applied to 3D models to generate the 2D image that makes up a frame of the film. Parallel computing is essential to generate the needed number of frames (24 per second) for a feature-length film. Now notice the rippling effect in the image below:

wave.jpg

Seismic Wave Propagation

//#define _CRT_SECURE_NO_DEPRECATE
#define VIDEO_WINMAIN_ARGS
#include "../../common/gui/video.h"
#include <cstring>
#include <cstdlib>
#include <math.h
#include <cctype>
#include <cstdio>
#include <cassert>
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/tick_count.h"

using namespace std;

#ifdef _MSC_VER
// warning C4068: unknown pragma
#pragma warning(disable: 4068)
#endif

#define DEFAULT_NUMBER_OF_FRAMES 100
int number_of_frames = -1;
const size_t MAX_WIDTH = 1024;
const size_t MAX_HEIGHT = 512;

int UniverseHeight=MAX_HEIGHT;
int UniverseWidth=MAX_WIDTH;

typedef float value;

//! Velocity at each grid point
static value V[MAX_HEIGHT][MAX_WIDTH];

//! Horizontal stress
static value S[MAX_HEIGHT][MAX_WIDTH];

//! Vertical stress
static value T[MAX_HEIGHT][MAX_WIDTH];

//! Coefficient related to modulus
static value M[MAX_HEIGHT][MAX_WIDTH];

//! Coefficient related to lightness
static value L[MAX_HEIGHT][MAX_WIDTH];

//! Damping coefficients
static value D[MAX_HEIGHT][MAX_WIDTH];

static tbb::affinity_partitioner Affinity;

enum MaterialType {
    WATER=0,
    SANDSTONE=1,
    SHALE=2
};

//! Values are MaterialType, cast to an unsigned char to save space.
static unsigned char Material[MAX_HEIGHT][MAX_WIDTH];

static const colorcomp_t MaterialColor[4][3] = { // BGR
    {96,0,0},     // WATER
    {0,48,48},    // SANDSTONE
    {32,32,23}    // SHALE
};

static const int DamperSize = 32;

static const int ColorMapSize = 1024;
static color_t ColorMap[4][ColorMapSize];

static int PulseTime = 100;
static int PulseCounter;
static int PulseX = UniverseWidth/3;
static int PulseY = UniverseHeight/4;

static bool InitIsParallel = true;
const char *titles[2] = {"Seismic Simulation: Serial", "Seismic Simulation: Parallel"};
int threads_low = 0, threads_high = tbb::task_scheduler_init::automatic;

static void UpdatePulse() {
    if( PulseCounter>0 ) {
        value t = (PulseCounter-PulseTime/2)*0.05f;
        V[PulseY][PulseX] += 64*sqrt(M[PulseY][PulseX])*exp(-t*t);
        --PulseCounter;
    }
}

static void SerialUpdateStress() {
    drawing_area drawing(0, 0, UniverseWidth, UniverseHeight);
    for( int i=1; i<universeheight-1; j="1;">
	<universewidth-1; +="M[i][j]*(V[i][j+1]-V[i][j]);" 
	index="(int)(V[i][j]*(ColorMapSize/2))"><0 ) index = 0;
            if( index>=ColorMapSize ) index = ColorMapSize-1;
            color_t* c = ColorMap[Material[i][j]];
            drawing.put_pixel(c[index]);
        }
    }
}

struct UpdateStressBody {
    void operator()( const tbb::blocked_range<int>& range ) const {
        drawing_area drawing(0, range.begin(), 
UniverseWidth, range.end()-range.begin());
        int i_end = range.end();
        for( int y = 0, i=range.begin(); i!=i_end; ++i,y++ ) {
            drawing.set_pos(1, y);
#pragma ivdep
            for( int j=1; j<universewidth-1; 
		+="M[i][j]*(V[i][j+1]-V[i][j]);" 
		index="(int)(V[i][j]*(ColorMapSize/2))"><0 ) index = 0;
                if( index>=ColorMapSize ) index = ColorMapSize-1;
                color_t* c = ColorMap[Material[i][j]];
                drawing.put_pixel(c[index]);
            }
        }
    }
};

static void ParallelUpdateStress() {
    tbb::parallel_for( tbb::blocked_range<int>( 1, UniverseHeight-1 ), 
// Index space for loop
                       UpdateStressBody(),                            
 // Body of loop
                       Affinity );                                     
// Affinity hint
}

static void SerialUpdateVelocity() {
    for( int i=1; i<universeheight-1; j="1;">
		<universewidth-1; v[i][j]="D[i][j]*(V[i][j]"><int>& range ) const {
        int i_end = range.end();
        for( int i=range.begin(); i!=i_end; ++i ) 
#pragma ivdep
            for( int j=1; j<universewidth-1; 
		v[i][j]="D[i][j]*(V[i][j]"><int>( 1, 
		UniverseHeight-1 ), // Index space for loop
                       UpdateVelocityBody(),                           

                       Affinity );                                    
 
}

void SerialUpdateUniverse() {
    UpdatePulse();
    SerialUpdateStress();
    SerialUpdateVelocity();
}

void ParallelUpdateUniverse() {
    UpdatePulse();
    ParallelUpdateStress();
    ParallelUpdateVelocity();
}

class seismic_video : public video
{
    void on_mouse(int x, int y, int key) {
        if(key == 1 && PulseCounter == 0) {
            PulseCounter = PulseTime;
            PulseX = x; PulseY = y;
        }
    }
    void on_key(int key) {
        key &= 0xff;
        if(char(key) == ' ') InitIsParallel = !InitIsParallel;
        else if(char(key) == 'p') InitIsParallel = true;
        else if(char(key) == 's') InitIsParallel = false;
        else if(char(key) == 'e') updating = true;
        else if(char(key) == 'd') updating = false;
        else if(key == 27) running = false;
        title = InitIsParallel?titles[1]:titles[0];
    }
    void on_process() {
        tbb::task_scheduler_init Init(threads_high);
        do {
            if( InitIsParallel )
                ParallelUpdateUniverse();
            else
                SerialUpdateUniverse();
            if( number_of_frames > 0 ) --number_of_frames;
        } while(next_frame() && number_of_frames);
    }
} video;

void InitializeUniverse() {
    PulseCounter = PulseTime;
    // Initialize V, S, and T to slightly non-zero values,
 in order to avoid denormal waves.
    for( int i=0; i<universeheight; j="0;"><universewidth; 
		t[i][j]="S[i][j]" i="1;"><universeheight-1; j="1;">
		<universewidth-1; x="float(j-UniverseWidth/2)/
		(UniverseWidth/2);" t="(value)i/UniverseHeight;" 
		d[i][j]="1.0;"><0.3f ) {
                m = WATER;
                M[i][j] = 0.125;
                L[i][j] = 0.125;
            } else if( fabs(t-0.7+0.2*exp(-8*x*x)+0.025*x)<=0.1 ) {
                m = SHALE;
                M[i][j] = 0.5;
                L[i][j] = 0.6;
            } else {
                m = SANDSTONE;
                M[i][j] = 0.3;
                L[i][j] = 0.4;
            } 
            Material[i][j] = m;
        }
    }
    value scale = 2.0f/ColorMapSize;
    for( int k=0; k<4; ++k ) {
        for( int i=0; i<colormapsize; t="(i-ColorMapSize/2)*scale;" r="t">0 ? t : 0;
            value b = t<0 ? -t : 0;
            value g = 0.5f*fabs(t);
            memcpy(c, MaterialColor[k], sizeof(c));
            c[2] = colorcomp_t(r*(255-c[2])+c[2]);
            c[1] = colorcomp_t(g*(255-c[1])+c[1]);
            c[0] = colorcomp_t(b*(255-c[0])+c[0]);
            ColorMap[k][i] = video.get_color(c[2], c[1], c[0]);
        }
    }
    // Set damping coefficients around border to reduce reflections from boundaries.
    value d = 1.0;
    for( int k=DamperSize-1; k>0; --k ) {
        d *= 1-1.0f/(DamperSize*DamperSize);
        for( int j=1; j<universewidth-1; *="d;" i="1;">
		<universeheight-1; *="d;"> 1 && isdigit(argv[1][0])) {
        char* end; threads_high = threads_low = (int)strtol(argv[1],&end,0);
        switch( *end ) {
            case ':': threads_high = (int)strtol(end+1,0,0); break;
            case '\0': break;
            default: printf("unexpected character = %c\n",*end);
        }
    }
    if (argc > 2 && isdigit(argv[2][0])){
        number_of_frames = (int)strtol(argv[2],0,0);
    }
    // video layer init
    video.title = InitIsParallel?titles[1]:titles[0];
#ifdef _WINDOWS
    #define MAX_LOADSTRING 100
    TCHAR szWindowClass[MAX_LOADSTRING];    // the main window class name
    LoadStringA(video::win_hInstance, 
    IDC_SEISMICSIMULATION, szWindowClass, MAX_LOADSTRING);
    LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);
    WNDCLASSEX wcex; memset(&wcex, 0, sizeof(wcex));
    wcex.lpfnWndProc    = (WNDPROC)WndProc;
    wcex.hIcon          = 
     LoadIcon(video::win_hInstance, MAKEINTRESOURCE(IDI_SEISMICSIMULATION));
    wcex.hCursor        = LoadCursor(NULL, IDC_ARROW);
    wcex.hbrBackground  = (HBRUSH)(COLOR_WINDOW+1);
    wcex.lpszMenuName   = LPCTSTR(IDC_SEISMICSIMULATION);
    wcex.lpszClassName  = szWindowClass;
    wcex.hIconSm        = LoadIcon(video::win_hInstance, MAKEINTRESOURCE(IDI_SMALL));
    video.win_set_class(wcex); // ascii convention here
    video.win_load_accelerators(IDC_SEISMICSIMULATION);
#endif
    if(video.init_window(UniverseWidth, UniverseHeight)) {
        video.calc_fps = true;
        video.threaded = threads_low > 0;
        // video is ok, init universe
        InitializeUniverse();
        // main loop
        video.main_loop();
    }
    else if(video.init_console()) {
        // do console mode
        if(number_of_frames <= 0) number_of_frames = DEFAULT_NUMBER_OF_FRAMES;
        if(threads_high == tbb::task_scheduler_init::automatic) threads_high = 4;
        if(threads_high < threads_low) threads_high = threads_low;
        for( int p = threads_low; p <= threads_high; ++p ) {
            InitializeUniverse();
            tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);
            if( p > 0 )
                init.initialize( p );
            tbb::tick_count t0 = tbb::tick_count::now();
            if( p > 0 )
                for( int i=0; i<number_of_frames; i="0;">
		<number_of_frames; t1="tbb::tick_count::now();"> 0 ) 
                printf(" with %d way parallelism\n",p);
            else
                printf(" with serial version\n"); 
        }
    }
    video.terminate();
    return 0;
}

#ifdef _WINDOWS
LRESULT CALLBACK About(HWND hDlg, UINT message, WPARAM wParam, LPARAM lParam)
{
    switch (message)
    {
    case WM_INITDIALOG: return TRUE;
    case WM_COMMAND:
        if (LOWORD(wParam) == IDOK || LOWORD(wParam) == IDCANCEL) {
            EndDialog(hDlg, LOWORD(wParam));
            return TRUE;
        }
        break;
    }
    return FALSE;
}

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
    int wmId, wmEvent;
    switch (message) {
    case WM_COMMAND:
        wmId    = LOWORD(wParam); 
        wmEvent = HIWORD(wParam); 
        // Parse the menu selections:
        switch (wmId)
        {
        case IDM_ABOUT:
            DialogBox(video::win_hInstance, MAKEINTRESOURCE(IDD_ABOUTBOX), hWnd, 
            (DLGPROC)About);
            break;
        case IDM_EXIT:
            PostQuitMessage(0);
            break;
        case ID_FILE_PARALLEL:
            if( !InitIsParallel ) {
                InitIsParallel = true;
                video.title = titles[1];
            }
            break;
        case ID_FILE_SERIAL:
            if( InitIsParallel ) {
                InitIsParallel = false;
                video.title = titles[0];
            }
            break;
        case ID_FILE_ENABLEGUI:
            video.updating = true;
            break;
        case ID_FILE_DISABLEGUI:
            video.updating = false;
            break;
        default:
            return DefWindowProc(hWnd, message, wParam, lParam);
        }
        break;
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    return 0;
}

#endif

Admittedly, the concept of wave propagation is difficult enough with adding both a serial and a parallel approach. But the number of professional areas that use parallel programming is astounding. This is why choosing a powerful threading library is essential. Examine the output. The image takes a few minutes to complete:

a.jpg

A disturbance occurs in the earth or some elastic medium and triggers a chain, or sequence of waves, waves that traveling with velocity and direction. Acceleration is not really an accurate measure. Now examine the image after around 30 seconds later. The frames second pretty much remain in the same range:

b.jpg

The Ray Tracer Example

The MSDN Parallel Computing Center offers a download of parallel programming examples, both in native code and managed code. The Ray Tracer examples has become famous because Ray tracing is fundamentally a “delightfully parallel problem”. Each individual pixel in the image is generated by firing an imaginary ray of light, examining the color of that ray as it bounces off of and through objects in the scene, and storing the resulting color. Every pixel is thus independent of every other pixel, allowing them all to be processed in parallel. The example can be downloaded fromhttp://code.msdn.microsoft.com/ParExtSamples. Now if we consider some underlying principles that involve medical imaging, then we could take a look at an X-Ray. An x-ray has the shortest wave length of any of the light waves that there are. An x-ray’s frequency is higher than ultra-violet waves. So when shot from an x-ray tube, the sudden burst captures solid body images. Nowadays they use gamma rays. But in any event, the resolution is always poor.

To solve this problem, models are designed to show how radiation propagates through the body. Randomly selected points within the body are assumed to emit radiation (usually a gamma ray), and the trajectory of each ray is followed. As a particle (ray) passes through the body, it is attenuated by the different organs it traverses, continuing until the particle leaves the body and hits a camera model, thereby defining a full trajectory. To create a statistically significant simulation, thousands, if not millions, of trajectories are followed. This problem can be parallelized in two ways. Because each trajectory is independent, it is possible to parallelize the application by associating each trajectory with a task or data parallelization.

So, at this point, we will conclude with a simple example of a 2-D Ray Tracer, a tea cup. The code executes by having a data file that contains the image modeling of an image. Then we execute the program and note quicker the frames per second calculations are performed, as well as the improved resolution. The code is downloadable at the top section of this article. The reader should download a trial evaluation of the Intel C++ Compiler. Once installed, you’ll see how it integrates with Visual Studio 2008 as a built-in. Next, install the Intel TBB beta 3.0 version as your threading library. After that, extract the files into a newly-made folder within your projects directory folder . But we are going to use the “nmake” utility to build the files and their executables. This will be made more clear shortly.

tea.jpg

The command that provided the above image is:

C>\..\tachyon.tbb.exe teacup.dat

To build this executable, and other comparison executable that runs with the files held in the dat directory, then build a directory to unzip the downloadable dat.zip files. Then set the environmental variables because we are going to use “nmake” from VS Studio 2008. We first start by configuring the Intel compilers path, and then run a batch script to set the environment. Afterwards, we do the same for the Microsoft VC++ compiler and run a similar batch script to set the environment:

set PATH=%PATH%;.;c:\Program Files\Intel\compiler\11.1\065\bin\ia32
iclvars_ia32.bat
set PATH=%PATH%;.;C:\Program Files\Microsoft Visual Studio 9.0\VC\bin
vcvars32.bat
(now run nmake)
nmake.exe 

The build will show balls appearing after serial and parallel executions. Thus, the files are balls.datteapot.dat, etc. Try and copy and then paste those .dat files into the prior directory where nmake built the executables.

References

  • Intel Threading Building Blocks, by James Reiner
  • Patterns of Parallel Programming by Stephen Toub

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

logicchild

Other
Pref. Trust
United States United States

Member
I started electronics training at age 33. I began studying microprocessor technology in an RF communications oriented program. I am 43 years old now. I have studied C code, opcode (mainly x86 and AT+T) for around 3 years in order to learn how to recognize viral code and the use of procedural languages. I am currently learning C# and the other virtual runtime system languages. I guess I started with the egg rather than the chicken. My past work would indicate that my primary strength is in applied mathematics.


青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            亚洲精品一区在线| 国产欧美一区二区三区在线看蜜臀| 欧美超级免费视 在线| 久久久久久久久岛国免费| 欧美在线观看一区二区| 久久精品午夜| 欧美激情一级片一区二区| 亚洲国内在线| 欧美激情免费在线| 一本久久综合亚洲鲁鲁五月天| 亚洲综合另类| 麻豆亚洲精品| 欧美视频专区一二在线观看| 国产午夜精品美女毛片视频| 尹人成人综合网| 亚洲视频免费看| 久久久久国内| 最新国产乱人伦偷精品免费网站| 中文在线资源观看网站视频免费不卡 | 中文国产一区| 久久精品国产久精国产思思| 欧美国产大片| 亚洲在线观看免费| 你懂的亚洲视频| 国产精品一级二级三级| 亚洲第一精品夜夜躁人人躁| 亚洲一区二区在线看| 狼狼综合久久久久综合网| 亚洲美女免费视频| 久久久噜噜噜久久中文字免| 欧美日韩亚洲高清一区二区| 影音欧美亚洲| 欧美一区二区私人影院日本| 亚洲激情一区二区三区| 久久精品女人的天堂av| 国产精品啊v在线| 亚洲精品自在久久| 欧美va天堂| 久久精品国产免费看久久精品| 国产精品国码视频| 在线视频欧美日韩精品| 欧美11—12娇小xxxx| 欧美在线视频在线播放完整版免费观看 | 日韩性生活视频| 麻豆成人91精品二区三区| 国产精品三上| 亚洲一区二区三区免费观看| 亚洲国产美女| 欧美**人妖| 1000部精品久久久久久久久| 久久久久久亚洲精品杨幂换脸| 在线亚洲欧美专区二区| 欧美精品在欧美一区二区少妇| 伊人色综合久久天天| 久久蜜桃av一区精品变态类天堂| 欧美一区高清| 狠狠色综合网| 巨胸喷奶水www久久久免费动漫| 亚洲女同精品视频| 国产精品专区第二| 久久精精品视频| 久久狠狠婷婷| 亚洲国产精品欧美一二99| 欧美第一黄网免费网站| 噜噜噜久久亚洲精品国产品小说| 影音先锋中文字幕一区| 欧美jizz19性欧美| 欧美激情中文字幕一区二区| 99国产麻豆精品| 一本一本a久久| 国产农村妇女毛片精品久久麻豆| 欧美在线视频一区二区| 久久不见久久见免费视频1| 在线观看视频一区| 亚洲精品国产精品乱码不99按摩| 欧美视频你懂的| 久久九九99视频| 免费观看成人鲁鲁鲁鲁鲁视频| 亚洲精选一区二区| 亚洲午夜国产成人av电影男同| 国产毛片精品视频| 欧美成人免费大片| 欧美视频一区二区在线观看 | 狠狠色综合网| 亚洲人午夜精品免费| 国产精品久久久久久久久久妞妞| 欧美在线你懂的| 久久亚洲风情| 中国成人亚色综合网站| 亚洲欧美中文另类| 亚洲国产精品一区二区第一页| 亚洲啪啪91| 国产偷国产偷亚洲高清97cao| 免费视频一区二区三区在线观看| 欧美日韩国产va另类| 久久国产综合精品| 欧美激情中文字幕在线| 欧美在现视频| 欧美日韩亚洲视频| 久久视频国产精品免费视频在线 | 国产乱码精品1区2区3区| 欧美在线视频观看免费网站| 美国十次成人| 欧美在线观看一区| 欧美片第一页| 牛夜精品久久久久久久99黑人| 欧美日韩成人综合天天影院| 久久久亚洲成人| 国产精品久久夜| 亚洲片国产一区一级在线观看| 狠狠色综合色综合网络| 亚洲综合电影一区二区三区| 亚洲精品老司机| 久久久国产精品亚洲一区| 亚洲男女自偷自拍图片另类| 欧美成人午夜激情| 久久天堂国产精品| 国产情侣久久| 亚洲欧美日韩在线综合| 亚洲一区免费观看| 欧美日韩激情小视频| 欧美福利电影在线观看| 国语自产精品视频在线看一大j8| 亚洲一二三区在线| 亚洲午夜在线视频| 欧美日韩激情网| 亚洲精品中文字幕在线| 亚洲日本成人| 欧美激情一区二区三级高清视频| 欧美电影免费观看高清完整版| 国产真实久久| 久久精品国产一区二区三区| 久久久www成人免费精品| 国产三级欧美三级| 久久福利资源站| 久久午夜色播影院免费高清| 国产日韩亚洲欧美| 欧美亚洲一区三区| 久久免费视频这里只有精品| 国产真实乱偷精品视频免| 欧美在线视频观看免费网站| 久久久久国产精品厨房| 激情综合激情| 欧美不卡在线| 日韩一级成人av| 先锋资源久久| 韩国一区二区在线观看| 免费不卡中文字幕视频| 亚洲高清不卡一区| 在线视频亚洲一区| 国产精品一区二区三区久久 | 久久国产高清| 欧美超级免费视 在线| 亚洲精品社区| 国产精品你懂的在线| 欧美一级理论片| 欧美国产综合| 亚洲淫性视频| 在线精品视频一区二区| 欧美另类一区| 欧美一区二区精品在线| 欧美激情视频一区二区三区在线播放 | 久久国产主播| 激情成人综合| 欧美经典一区二区三区| 制服丝袜激情欧洲亚洲| 久久久久久久一区| 亚洲精品国产精品乱码不99| 欧美视频一区二区| 久久久免费精品视频| 亚洲美女av在线播放| 久久久精品tv| 亚洲深夜激情| 亚洲第一精品电影| 国产精品久久久久久久一区探花| 久久精品亚洲一区二区三区浴池| 欧美搞黄网站| 欧美诱惑福利视频| 99香蕉国产精品偷在线观看| 国产视频精品xxxx| 欧美日韩精品一区二区| 久久精品一区二区三区四区| 亚洲精品自在久久| 美女黄色成人网| 欧美一区二区三区婷婷月色| 亚洲精品国产欧美| 黑人巨大精品欧美一区二区小视频 | 在线视频日韩| 欧美激情一区二区三区在线| 欧美一区二区视频网站| 夜夜爽av福利精品导航| 在线观看欧美一区| 国产日韩欧美一区二区| 欧美日韩精品在线播放| 欧美jjzz| 浪潮色综合久久天堂| 欧美日韩亚洲综合| 亚洲一区二区三区色| 免费在线观看精品| 99re66热这里只有精品3直播|