青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

posts - 319, comments - 22, trackbacks - 0, articles - 11
  C++博客 :: 首頁 :: 新隨筆 :: 聯系 :: 聚合  :: 管理

Parallel Programming Essentials via the Intel TBB

Posted on 2011-09-15 07:05 RTY 閱讀(657) 評論(0)  編輯 收藏 引用 所屬分類: TBB需要注意的
http://www.codeproject.com/KB/Parallel_Programming/Threading_Blocks.aspx

Introduction

The Intel TBB was released by Intel during their movement to enhance system performance by cores, rather than by Gigahertz. It was meant for their use, but they released a trial download and an open source edition. It will scale to function as a powerful tool for those going into parallel programming and multi-core programming. Note that some powerful sample code can be downloaded from http://www.threadingbuildingblocks.org/.

Preface

Sometimes, the terms concurrency and parallelism are used interchangeably. But a multithreaded application can have threads execute simultaneously while residing on one physical chip. Parallelism, however, is achieved only when those concurrent threads are executing on separate cores of a multicore chip. When we write an algorithm to use concurrency, our most important design choice should be to figure out how to partition the original problem into individual sub-parts. Because most programs spend a lot of their execution time running loops over an iteration range, we should standardize the use of the Intel TBB. For example, loops which use C++ STL iterators loop over the contents of a collection of data. This iteration space must be determined before we can parallelize those loops. IntelTBB is a C++ template-based library for loop-level parallelism that concentrates on defining tasks rather than explicit threads. The components of TBB include generic parallel algorithms, concurrent containers, low-level synchronization primitives, and a task scheduler. At the time of this writing, TBB 3.0 is the most recent version. Programmers usingTBB can parallelize the execution of loop iterations by treating chunks of iterations as tasks and allowing the TBBtask scheduler to determine the task sizes, number of threads to use assignment of tasks to those threads, and how those threads are scheduled for execution. The task scheduler will give precedence to tasks that have been most recently in a core with the idea of making best use of the cache that likely contains the task’s data. When thinking tasks as opposed to threads, we should be concerned about performance, load balance between different sub-problems, data sharing, control, and data dependencies among the sub-problems.

The task scheduler utilizes a task-stealing mechanism to load balance the execution. Furthermore, breaking a program into separate functional blocks and assigning a separate thread to each block is a solution that usually does not scale well because, typically, the number of functional blocks is fixed. In contrast, Threading Building Blocks emphasizes data-parallel programming, enabling multiple threads to work most efficiently together. Data-parallel programming scales well to larger numbers of processors by dividing a data set into smaller pieces. Breaking apart a function into smaller bits of work will limit scalability and possibly render a performance hit, as the number of methods in a program is rarely dynamic. Taking a data parallelism approach resolves that issue by breaking the iteration space into independent units of work. Any thread that uses an algorithm template from the library or the task scheduler must have an initialized tbb::task_scheduler_init object. A thread may have more than one of these objects initialized at a time. The task scheduler shuts down when all task_scheduler_init objects terminate. By default, the constructor for task_scheduler_init does the initialization and the destructor does the termination. Thus, declaring a task_scheduler_init in main( ), as in the example shown below, both starts and shuts down the scheduler.

#include "tbb/task_scheduler_init.h"
using namespace tbb;
int main( ) {
task_scheduler_init init;
...
return 0;
}

The parallel_for template parallelizes tasks that are contained within a for loop. The template requires two parameters: a range type over which to iterate and a body type that executes iterations over the range or a subrange. The range class must define a copy constructor and a destructor, the methods is_empty() (which returns TRUE if the range is empty) and is_divisible() (which returns TRUE if the range can be split), and a splitting constructor (to divide the range in half). A partitioner class object can be used to heuristically find the smallest number of iterations that should be assigned. The TBB library contains two predefined range types:blocked_range and blocked_range2D. These ranges are used for single- and two dimensional ranges, respectively. The body class must define a copy constructor and a destructor as well as the operator(). Theoperator() will contain a copy of the original serial loop that has been modified to run over a subrange of values that come from the range type. The parallel_reduce template will iterate over a range and combine partial results computed by each task into a final (reduction) value. The range type for parallel_reduce has the same requirements as parallel_for. The body type needs a splitting constructor and a join method. The splitting constructor in the body copies read-only data required to run the loop body and to assign the identity element of the reduction operation that initializes the reduction variable. The join method combines partial results of tasks based on the reduction operation being used. So having that, let’s look at an example.

A Simple Example of Parallel Sum

The parallel sum operation can work with any associative and commutative combining operation. For example, the operations of multiplication, maximum, minimum, and some logical operations would all be appropriate. In a serial program, the commutative property of the operation is not important, since the order with which the serial execution combines the elements is the same each and every time. Nevertheless, to have any chance of performing the computation concurrently, we rely on this property of the operation, since concurrent execution does not impose a fixed schedule on the order of independent computations.

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <ctype.h>
#include "parallel_reduce.h"
#include "blocked_range.h"
#include "task_scheduler_init.h"
#include "tick_count.h"

using namespace tbb;

// Uncomment the line below to enable the auto_partitioner
#define AUTO_GRAIN

struct Sum {
    float value;
    Sum() : value(0) {}
    Sum( Sum& s, split ) {value = 0;}
    void operator()( const blocked_range<float*>& range ) {
        float temp = value;
        for( float* a=range.begin(); a!=range.end(); ++a ) {
            temp += *a;
        }
        value = temp;
    }
    void join( Sum& rhs ) {value += rhs.value;}
};

float ParallelSum( float array[], size_t n ) {
    Sum total;
#ifndef AUTO_GRAIN
    parallel_reduce( blocked_range<float*>( array, array+n, 1000 ), 
                     total );
#else /* AUTO_GRAIN */
    parallel_reduce( blocked_range<float*>( array, array+n ), 
                     total, auto_partitioner() );
#endif /* AUTO_GRAIN */
    return total.value;
}

//! Problem size
const int N = 1000000;

//! Number of threads to use.
static int MaxThreads = 4;

//! If true, print out bits of the array
static bool VerboseFlag = false;

//! Parse the command line.
static void ParseCommandLine( int argc, char* argv[] ) {
    int i = 1;
    if( i<argc )="=0" verboseflag="true;"><argc>
	<argc maxthreads="strtol(argv[i++],0,0);" i="0;">< N; ++i ) {
      input[i] = (float)(rand() % 10);
    }

    if( VerboseFlag ) {
      printf(" Input: ");
      for ( size_t i = 0; i < 7; ++i ) {
        printf("%7.2f ", input[i]);
      }
      printf("...\n");
    }

    // Try different numbers of threads
    for( int p=1; p<=MaxThreads; ++p ) {

        task_scheduler_init init(p);

        tick_count t0 = tick_count::now();
        float output = ParallelSum(input, N);
        tick_count t1 = tick_count::now();
        
        if( VerboseFlag ) {
          printf("Output: %7.2f\n", output);
        }
        printf("%2d threads: %.3f msec\n", p, (t1-t0).seconds()*1000);
    }
    return 0;
}

This Intel code was compiled via the Intel C++ Professional Compiler, a compiler that when installed, integrates into Microsoft Visual Studio 2008 or 2010. Once the installation is completed, then it is time to install the performance library Intel TBB 3.0. The included documentation will step you through configuring the environment in order for the compiler to reference the include header files and the Lib files. Notice the results:

 1 threads: 6.513 msec
 2 threads: 3.333 msec
 3 threads: 3.496 msec
 4 threads: 3.345 msec 

It turns out that executing 4 threads in parallel via the Intel TBB takes the time that the execution of one thread does. Another performance gain is how the Intel TBB library manages to keep things thread-safe. After ascertaining that the body is safe to run in parallel, then the simplest approach to parallelizing the loop would be to divide the size of the loop (the upper range – lowest range – assuming that the iterations are in ascending order) by the number of processors. This approach however, is best left to the task scheduler. A simple division process could easily lead to an inefficient use those processors.

Now consider a composite integer, composite meaning that the integer is either odd or even. We know that a prime number is divisible by itself and one. But any composite integer can be broken down into a factor of primes. Therefore, calculating and then locating the number of primes in a range of integers would involve a continual computational procedure. Examine this code below:

#include <cctype>
#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <math.h>
#include <cassert>
#include "parallel_reduce.h"
#include "task_scheduler_init.h"
#include "tick_count.h"

using namespace std;
using namespace tbb;

typedef unsigned long Number;

static bool PrintPrimes = false;

static Number GrainSize = 1000;

class Multiples {
    inline Number strike( Number start, 
                          Number limit, 
                          Number stride ) {
        
bool* is_composite = my_is_composite;

assert( stride>=2 );

for(start =0;start < limit; start+="stride" i+="2" +="m&1;" i="3;"
 n_factor="0;"
 my_factor="new" my_striker="new" 
my_is_composite="new"
 m="Number(sqrt(double(n)));" 
is_composite[start]="true;" k="0;"
is_composite="my_is_composite;" my_factor[n_factor++]="i;" 
my_striker[n_factor]="strike(" 

As we would expect, the higher degree of parallelism we go, the quicker the computation outputs its results:

#primes from [2..100000000] = 5761455 (1.10 sec with serial code)
#primes from [2..100000000] = 5761455 (1.10 sec with 1-way parallelism)
#primes from [2..100000000] = 5761455 (0.59 sec with 2-way parallelism)
#primes from [2..100000000] = 5761455 (0.57 sec with 3-way parallelism)
#primes from [2..100000000] = 5761455 (0.60 sec with 4-way parallelism)

Where Are We Now?

Up to this point, we have been noting that sometimes part of a computation involves computing the summation of a large set of values. If multiple processors are available, instead of adding the values together sequentially, the set can be partitioned and the summations of the subsets computed simultaneously, each on a different processor. The partial sums are then combined to get the final answer. Thus, using multiple processors to compute in parallel may allow us to obtain a solution sooner. Also, if each processor has its own memory, partitioning the data between the processors may allow larger problems to be handled than could be handled on a single processor. So if we wanted to form a set of integers that start at 1, and goes from 1/1 +n.. we could add them but the sequence would not end at the number 2. The sequence would be an infinite set of fractional values that approach to 2 as a limit, never actually equaling 2. But the more values that we add, the smaller they become.

Here is a simple seismic wave simulation (wave propagation) based on parallel_for and blocked_range. The main body of the program steps through the simulation of a seismic wave in a core loop that sets the impulse from the source of the disturbance, does the two tough computations of stress updates and velocities, and finally cleans the edges of the simulation. Seismic waves are waves of sound that travel through the core of the earth, perhaps because of the result of an earthquake or explosion. Because sound waves form a chain of waves (similar to how a train gets louder upon arrival; upon departure, the sound minimizes until it cannot be heard. That is the end of the chain of sound waves). In earlier times, scientists would blow some dynamite. With a device (similar in principle to a seismograph) that would track the directions of the sound waves. One direction meant diffraction off of a denser medium, and they learned by following those waves until diffraction would cause a change in direction. After a while, they would that one type of change in direction on the graph, mine that area, and perhaps would find salt. Sometimes they would even find oil, but their intentions were to find gold. Similarly, when a particle has been displaced, some force displaced it. It has moved in a vector path, meaning the displacement is measured by both the direction and magnitude. Tracing the path, and then plotting each point of that path on a grid would reveal where the change in direction took place.

I introduce this program because it espouses an underlying concept used in DNA sequencing, etc. in computer animation, rendering is the step where information from the animation files, such as lighting, textures, and shading, is applied to 3D models to generate the 2D image that makes up a frame of the film. Parallel computing is essential to generate the needed number of frames (24 per second) for a feature-length film. Now notice the rippling effect in the image below:

wave.jpg

Seismic Wave Propagation

//#define _CRT_SECURE_NO_DEPRECATE
#define VIDEO_WINMAIN_ARGS
#include "../../common/gui/video.h"
#include <cstring>
#include <cstdlib>
#include <math.h
#include <cctype>
#include <cstdio>
#include <cassert>
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/tick_count.h"

using namespace std;

#ifdef _MSC_VER
// warning C4068: unknown pragma
#pragma warning(disable: 4068)
#endif

#define DEFAULT_NUMBER_OF_FRAMES 100
int number_of_frames = -1;
const size_t MAX_WIDTH = 1024;
const size_t MAX_HEIGHT = 512;

int UniverseHeight=MAX_HEIGHT;
int UniverseWidth=MAX_WIDTH;

typedef float value;

//! Velocity at each grid point
static value V[MAX_HEIGHT][MAX_WIDTH];

//! Horizontal stress
static value S[MAX_HEIGHT][MAX_WIDTH];

//! Vertical stress
static value T[MAX_HEIGHT][MAX_WIDTH];

//! Coefficient related to modulus
static value M[MAX_HEIGHT][MAX_WIDTH];

//! Coefficient related to lightness
static value L[MAX_HEIGHT][MAX_WIDTH];

//! Damping coefficients
static value D[MAX_HEIGHT][MAX_WIDTH];

static tbb::affinity_partitioner Affinity;

enum MaterialType {
    WATER=0,
    SANDSTONE=1,
    SHALE=2
};

//! Values are MaterialType, cast to an unsigned char to save space.
static unsigned char Material[MAX_HEIGHT][MAX_WIDTH];

static const colorcomp_t MaterialColor[4][3] = { // BGR
    {96,0,0},     // WATER
    {0,48,48},    // SANDSTONE
    {32,32,23}    // SHALE
};

static const int DamperSize = 32;

static const int ColorMapSize = 1024;
static color_t ColorMap[4][ColorMapSize];

static int PulseTime = 100;
static int PulseCounter;
static int PulseX = UniverseWidth/3;
static int PulseY = UniverseHeight/4;

static bool InitIsParallel = true;
const char *titles[2] = {"Seismic Simulation: Serial", "Seismic Simulation: Parallel"};
int threads_low = 0, threads_high = tbb::task_scheduler_init::automatic;

static void UpdatePulse() {
    if( PulseCounter>0 ) {
        value t = (PulseCounter-PulseTime/2)*0.05f;
        V[PulseY][PulseX] += 64*sqrt(M[PulseY][PulseX])*exp(-t*t);
        --PulseCounter;
    }
}

static void SerialUpdateStress() {
    drawing_area drawing(0, 0, UniverseWidth, UniverseHeight);
    for( int i=1; i<universeheight-1; j="1;">
	<universewidth-1; +="M[i][j]*(V[i][j+1]-V[i][j]);" 
	index="(int)(V[i][j]*(ColorMapSize/2))"><0 ) index = 0;
            if( index>=ColorMapSize ) index = ColorMapSize-1;
            color_t* c = ColorMap[Material[i][j]];
            drawing.put_pixel(c[index]);
        }
    }
}

struct UpdateStressBody {
    void operator()( const tbb::blocked_range<int>& range ) const {
        drawing_area drawing(0, range.begin(), 
UniverseWidth, range.end()-range.begin());
        int i_end = range.end();
        for( int y = 0, i=range.begin(); i!=i_end; ++i,y++ ) {
            drawing.set_pos(1, y);
#pragma ivdep
            for( int j=1; j<universewidth-1; 
		+="M[i][j]*(V[i][j+1]-V[i][j]);" 
		index="(int)(V[i][j]*(ColorMapSize/2))"><0 ) index = 0;
                if( index>=ColorMapSize ) index = ColorMapSize-1;
                color_t* c = ColorMap[Material[i][j]];
                drawing.put_pixel(c[index]);
            }
        }
    }
};

static void ParallelUpdateStress() {
    tbb::parallel_for( tbb::blocked_range<int>( 1, UniverseHeight-1 ), 
// Index space for loop
                       UpdateStressBody(),                            
 // Body of loop
                       Affinity );                                     
// Affinity hint
}

static void SerialUpdateVelocity() {
    for( int i=1; i<universeheight-1; j="1;">
		<universewidth-1; v[i][j]="D[i][j]*(V[i][j]"><int>& range ) const {
        int i_end = range.end();
        for( int i=range.begin(); i!=i_end; ++i ) 
#pragma ivdep
            for( int j=1; j<universewidth-1; 
		v[i][j]="D[i][j]*(V[i][j]"><int>( 1, 
		UniverseHeight-1 ), // Index space for loop
                       UpdateVelocityBody(),                           

                       Affinity );                                    
 
}

void SerialUpdateUniverse() {
    UpdatePulse();
    SerialUpdateStress();
    SerialUpdateVelocity();
}

void ParallelUpdateUniverse() {
    UpdatePulse();
    ParallelUpdateStress();
    ParallelUpdateVelocity();
}

class seismic_video : public video
{
    void on_mouse(int x, int y, int key) {
        if(key == 1 && PulseCounter == 0) {
            PulseCounter = PulseTime;
            PulseX = x; PulseY = y;
        }
    }
    void on_key(int key) {
        key &= 0xff;
        if(char(key) == ' ') InitIsParallel = !InitIsParallel;
        else if(char(key) == 'p') InitIsParallel = true;
        else if(char(key) == 's') InitIsParallel = false;
        else if(char(key) == 'e') updating = true;
        else if(char(key) == 'd') updating = false;
        else if(key == 27) running = false;
        title = InitIsParallel?titles[1]:titles[0];
    }
    void on_process() {
        tbb::task_scheduler_init Init(threads_high);
        do {
            if( InitIsParallel )
                ParallelUpdateUniverse();
            else
                SerialUpdateUniverse();
            if( number_of_frames > 0 ) --number_of_frames;
        } while(next_frame() && number_of_frames);
    }
} video;

void InitializeUniverse() {
    PulseCounter = PulseTime;
    // Initialize V, S, and T to slightly non-zero values,
 in order to avoid denormal waves.
    for( int i=0; i<universeheight; j="0;"><universewidth; 
		t[i][j]="S[i][j]" i="1;"><universeheight-1; j="1;">
		<universewidth-1; x="float(j-UniverseWidth/2)/
		(UniverseWidth/2);" t="(value)i/UniverseHeight;" 
		d[i][j]="1.0;"><0.3f ) {
                m = WATER;
                M[i][j] = 0.125;
                L[i][j] = 0.125;
            } else if( fabs(t-0.7+0.2*exp(-8*x*x)+0.025*x)<=0.1 ) {
                m = SHALE;
                M[i][j] = 0.5;
                L[i][j] = 0.6;
            } else {
                m = SANDSTONE;
                M[i][j] = 0.3;
                L[i][j] = 0.4;
            } 
            Material[i][j] = m;
        }
    }
    value scale = 2.0f/ColorMapSize;
    for( int k=0; k<4; ++k ) {
        for( int i=0; i<colormapsize; t="(i-ColorMapSize/2)*scale;" r="t">0 ? t : 0;
            value b = t<0 ? -t : 0;
            value g = 0.5f*fabs(t);
            memcpy(c, MaterialColor[k], sizeof(c));
            c[2] = colorcomp_t(r*(255-c[2])+c[2]);
            c[1] = colorcomp_t(g*(255-c[1])+c[1]);
            c[0] = colorcomp_t(b*(255-c[0])+c[0]);
            ColorMap[k][i] = video.get_color(c[2], c[1], c[0]);
        }
    }
    // Set damping coefficients around border to reduce reflections from boundaries.
    value d = 1.0;
    for( int k=DamperSize-1; k>0; --k ) {
        d *= 1-1.0f/(DamperSize*DamperSize);
        for( int j=1; j<universewidth-1; *="d;" i="1;">
		<universeheight-1; *="d;"> 1 && isdigit(argv[1][0])) {
        char* end; threads_high = threads_low = (int)strtol(argv[1],&end,0);
        switch( *end ) {
            case ':': threads_high = (int)strtol(end+1,0,0); break;
            case '\0': break;
            default: printf("unexpected character = %c\n",*end);
        }
    }
    if (argc > 2 && isdigit(argv[2][0])){
        number_of_frames = (int)strtol(argv[2],0,0);
    }
    // video layer init
    video.title = InitIsParallel?titles[1]:titles[0];
#ifdef _WINDOWS
    #define MAX_LOADSTRING 100
    TCHAR szWindowClass[MAX_LOADSTRING];    // the main window class name
    LoadStringA(video::win_hInstance, 
    IDC_SEISMICSIMULATION, szWindowClass, MAX_LOADSTRING);
    LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);
    WNDCLASSEX wcex; memset(&wcex, 0, sizeof(wcex));
    wcex.lpfnWndProc    = (WNDPROC)WndProc;
    wcex.hIcon          = 
     LoadIcon(video::win_hInstance, MAKEINTRESOURCE(IDI_SEISMICSIMULATION));
    wcex.hCursor        = LoadCursor(NULL, IDC_ARROW);
    wcex.hbrBackground  = (HBRUSH)(COLOR_WINDOW+1);
    wcex.lpszMenuName   = LPCTSTR(IDC_SEISMICSIMULATION);
    wcex.lpszClassName  = szWindowClass;
    wcex.hIconSm        = LoadIcon(video::win_hInstance, MAKEINTRESOURCE(IDI_SMALL));
    video.win_set_class(wcex); // ascii convention here
    video.win_load_accelerators(IDC_SEISMICSIMULATION);
#endif
    if(video.init_window(UniverseWidth, UniverseHeight)) {
        video.calc_fps = true;
        video.threaded = threads_low > 0;
        // video is ok, init universe
        InitializeUniverse();
        // main loop
        video.main_loop();
    }
    else if(video.init_console()) {
        // do console mode
        if(number_of_frames <= 0) number_of_frames = DEFAULT_NUMBER_OF_FRAMES;
        if(threads_high == tbb::task_scheduler_init::automatic) threads_high = 4;
        if(threads_high < threads_low) threads_high = threads_low;
        for( int p = threads_low; p <= threads_high; ++p ) {
            InitializeUniverse();
            tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);
            if( p > 0 )
                init.initialize( p );
            tbb::tick_count t0 = tbb::tick_count::now();
            if( p > 0 )
                for( int i=0; i<number_of_frames; i="0;">
		<number_of_frames; t1="tbb::tick_count::now();"> 0 ) 
                printf(" with %d way parallelism\n",p);
            else
                printf(" with serial version\n"); 
        }
    }
    video.terminate();
    return 0;
}

#ifdef _WINDOWS
LRESULT CALLBACK About(HWND hDlg, UINT message, WPARAM wParam, LPARAM lParam)
{
    switch (message)
    {
    case WM_INITDIALOG: return TRUE;
    case WM_COMMAND:
        if (LOWORD(wParam) == IDOK || LOWORD(wParam) == IDCANCEL) {
            EndDialog(hDlg, LOWORD(wParam));
            return TRUE;
        }
        break;
    }
    return FALSE;
}

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
    int wmId, wmEvent;
    switch (message) {
    case WM_COMMAND:
        wmId    = LOWORD(wParam); 
        wmEvent = HIWORD(wParam); 
        // Parse the menu selections:
        switch (wmId)
        {
        case IDM_ABOUT:
            DialogBox(video::win_hInstance, MAKEINTRESOURCE(IDD_ABOUTBOX), hWnd, 
            (DLGPROC)About);
            break;
        case IDM_EXIT:
            PostQuitMessage(0);
            break;
        case ID_FILE_PARALLEL:
            if( !InitIsParallel ) {
                InitIsParallel = true;
                video.title = titles[1];
            }
            break;
        case ID_FILE_SERIAL:
            if( InitIsParallel ) {
                InitIsParallel = false;
                video.title = titles[0];
            }
            break;
        case ID_FILE_ENABLEGUI:
            video.updating = true;
            break;
        case ID_FILE_DISABLEGUI:
            video.updating = false;
            break;
        default:
            return DefWindowProc(hWnd, message, wParam, lParam);
        }
        break;
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    return 0;
}

#endif

Admittedly, the concept of wave propagation is difficult enough with adding both a serial and a parallel approach. But the number of professional areas that use parallel programming is astounding. This is why choosing a powerful threading library is essential. Examine the output. The image takes a few minutes to complete:

a.jpg

A disturbance occurs in the earth or some elastic medium and triggers a chain, or sequence of waves, waves that traveling with velocity and direction. Acceleration is not really an accurate measure. Now examine the image after around 30 seconds later. The frames second pretty much remain in the same range:

b.jpg

The Ray Tracer Example

The MSDN Parallel Computing Center offers a download of parallel programming examples, both in native code and managed code. The Ray Tracer examples has become famous because Ray tracing is fundamentally a “delightfully parallel problem”. Each individual pixel in the image is generated by firing an imaginary ray of light, examining the color of that ray as it bounces off of and through objects in the scene, and storing the resulting color. Every pixel is thus independent of every other pixel, allowing them all to be processed in parallel. The example can be downloaded fromhttp://code.msdn.microsoft.com/ParExtSamples. Now if we consider some underlying principles that involve medical imaging, then we could take a look at an X-Ray. An x-ray has the shortest wave length of any of the light waves that there are. An x-ray’s frequency is higher than ultra-violet waves. So when shot from an x-ray tube, the sudden burst captures solid body images. Nowadays they use gamma rays. But in any event, the resolution is always poor.

To solve this problem, models are designed to show how radiation propagates through the body. Randomly selected points within the body are assumed to emit radiation (usually a gamma ray), and the trajectory of each ray is followed. As a particle (ray) passes through the body, it is attenuated by the different organs it traverses, continuing until the particle leaves the body and hits a camera model, thereby defining a full trajectory. To create a statistically significant simulation, thousands, if not millions, of trajectories are followed. This problem can be parallelized in two ways. Because each trajectory is independent, it is possible to parallelize the application by associating each trajectory with a task or data parallelization.

So, at this point, we will conclude with a simple example of a 2-D Ray Tracer, a tea cup. The code executes by having a data file that contains the image modeling of an image. Then we execute the program and note quicker the frames per second calculations are performed, as well as the improved resolution. The code is downloadable at the top section of this article. The reader should download a trial evaluation of the Intel C++ Compiler. Once installed, you’ll see how it integrates with Visual Studio 2008 as a built-in. Next, install the Intel TBB beta 3.0 version as your threading library. After that, extract the files into a newly-made folder within your projects directory folder . But we are going to use the “nmake” utility to build the files and their executables. This will be made more clear shortly.

tea.jpg

The command that provided the above image is:

C>\..\tachyon.tbb.exe teacup.dat

To build this executable, and other comparison executable that runs with the files held in the dat directory, then build a directory to unzip the downloadable dat.zip files. Then set the environmental variables because we are going to use “nmake” from VS Studio 2008. We first start by configuring the Intel compilers path, and then run a batch script to set the environment. Afterwards, we do the same for the Microsoft VC++ compiler and run a similar batch script to set the environment:

set PATH=%PATH%;.;c:\Program Files\Intel\compiler\11.1\065\bin\ia32
iclvars_ia32.bat
set PATH=%PATH%;.;C:\Program Files\Microsoft Visual Studio 9.0\VC\bin
vcvars32.bat
(now run nmake)
nmake.exe 

The build will show balls appearing after serial and parallel executions. Thus, the files are balls.datteapot.dat, etc. Try and copy and then paste those .dat files into the prior directory where nmake built the executables.

References

  • Intel Threading Building Blocks, by James Reiner
  • Patterns of Parallel Programming by Stephen Toub

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

logicchild

Other
Pref. Trust
United States United States

Member
I started electronics training at age 33. I began studying microprocessor technology in an RF communications oriented program. I am 43 years old now. I have studied C code, opcode (mainly x86 and AT+T) for around 3 years in order to learn how to recognize viral code and the use of procedural languages. I am currently learning C# and the other virtual runtime system languages. I guess I started with the egg rather than the chicken. My past work would indicate that my primary strength is in applied mathematics.


青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            在线观看精品视频| 久久精品国产精品| 久久精品一区二区三区不卡牛牛| 国产精品99久久久久久白浆小说 | 欧美专区福利在线| 国产精品日韩欧美一区二区三区| 亚洲国产精品久久久久久女王| 日韩午夜剧场| 香蕉av福利精品导航| 国产视频精品va久久久久久| 久久成人综合视频| 久久精品亚洲精品国产欧美kt∨| 久久久精品国产一区二区三区| 久久久久女教师免费一区| 欧美激情一区二区三区全黄| 亚洲一区在线播放| 国产专区欧美精品| 欧美精品日韩| 久久精品视频导航| 欧美高清视频免费观看| 欧美在线视频播放| 亚洲国产1区| 国产麻豆午夜三级精品| 欧美日本亚洲视频| 亚洲视频 欧洲视频| 99re66热这里只有精品3直播| 亚洲资源av| 亚洲欧洲美洲综合色网| 欧美自拍丝袜亚洲| 亚洲高清av| 亚洲免费在线看| 老司机午夜精品| 欧美与黑人午夜性猛交久久久| 久久一区二区三区四区| 久久精品在线免费观看| 欧美日韩国产精品专区| 欧美高清在线播放| 国产精品三级久久久久久电影| 亚洲国产精品va| 在线观看精品视频| 亚洲一区中文字幕在线观看| 狂野欧美激情性xxxx| 性欧美在线看片a免费观看| 欧美成人午夜激情视频| 狂野欧美激情性xxxx欧美| 一区二区三区久久网| 欧美激情一区二区在线| 亚洲免费影视| 欧美日韩国产经典色站一区二区三区| 国产午夜精品全部视频在线播放| 在线亚洲精品| 午夜亚洲福利| 久久成人免费| 亚洲视频专区在线| 欧美在线1区| 国产精品久久久久毛片大屁完整版 | 鲁大师成人一区二区三区| 久热精品视频在线观看一区| 国产欧美日韩亚洲一区二区三区| 国产亚洲精品aa| 在线成人www免费观看视频| 欧美一区=区| 久久综合免费视频影院| 久久av最新网址| 亚洲欧美电影院| 欧美视频免费看| 国产亚洲精品成人av久久ww| 小嫩嫩精品导航| 亚洲欧美在线x视频| 久久一区中文字幕| 亚洲第一页中文字幕| 亚洲精品韩国| 亚洲嫩草精品久久| 欧美成人一区在线| 久久亚洲高清| 免费久久99精品国产自| 久久精品亚洲| 国产精品香蕉在线观看| 欧美永久精品| 午夜精品美女久久久久av福利| 国产色产综合色产在线视频| 久久久久五月天| 亚洲日韩欧美一区二区在线| 一区二区三区福利| 国产农村妇女精品一区二区| 久久精品综合| 日韩视频在线一区| 国产精品蜜臀在线观看| 久久精品1区| 亚洲免费观看高清完整版在线观看熊 | 欧美一区免费视频| 校园激情久久| 欧美久久久久久| 亚洲欧美在线视频观看| 欧美制服丝袜| 日韩午夜激情电影| 午夜精品国产| 亚洲精品专区| 欧美成人三级在线| 欧美午夜影院| 日韩一级精品视频在线观看| 亚洲免费视频在线观看| 91久久精品www人人做人人爽| 久久精品视频免费观看| 欧美黄色免费| 久久在线免费观看| 欧美三级欧美一级| 免费日韩av电影| 久久精品噜噜噜成人av农村| 日韩视频精品在线观看| 先锋影音网一区二区| 在线亚洲一区| 久久久久久穴| 久久精品道一区二区三区| 欧美日韩在线一区| 亚洲国产毛片完整版| 狠狠色丁香婷综合久久| 久久久噜噜噜久久中文字免| 欧美日韩亚洲网| 日韩一级黄色av| 另类尿喷潮videofree | 欧美在线观看网址综合| 欧美日韩免费看| 亚洲国产mv| 在线精品观看| 久久九九国产精品怡红院| 欧美在线视频日韩| 国产精品一区二区在线| 亚洲一卡二卡三卡四卡五卡| 亚洲婷婷综合久久一本伊一区| 欧美成人在线网站| 日韩午夜中文字幕| 久久婷婷综合激情| 美女成人午夜| 亚洲影院污污.| 午夜在线视频观看日韩17c| 亚洲视频一起| 欧美日韩在线一区| 亚洲深夜福利视频| 狠狠色丁香婷婷综合久久片| 销魂美女一区二区三区视频在线| 香蕉久久一区二区不卡无毒影院| 欧美人妖在线观看| 亚洲免费观看| 性欧美精品高清| 国产午夜精品全部视频播放| 久久久精品国产一区二区三区| 女仆av观看一区| 欧美午夜无遮挡| 99天天综合性| 国产一区二区三区视频在线观看 | 一区二区日韩| 欧美亚洲免费在线| 国产亚洲精品一区二区| 久久爱www.| 亚洲高清视频一区| 亚洲中字在线| 激情综合久久| 99视频+国产日韩欧美| 亚洲免费视频成人| 国产日韩成人精品| 欧美一级播放| 亚洲天堂网站在线观看视频| 欧美色道久久88综合亚洲精品| 亚洲一区二区免费在线| 久久成人18免费观看| 亚洲电影免费在线| 欧美小视频在线| 久久成人羞羞网站| 亚洲国产精品第一区二区| 亚洲一本视频| 欧美激情综合五月色丁香小说| 洋洋av久久久久久久一区| 欧美亚洲系列| 亚洲高清激情| 国产精品成人国产乱一区| 亚洲人成艺术| 久久爱www.| 日韩小视频在线观看| 国产精品美女一区二区在线观看| 久久久国产精品一区| 99这里只有久久精品视频| 久久综合给合久久狠狠色| 99精品久久久| 狠狠88综合久久久久综合网| 欧美精品三级| 久久色在线观看| 亚洲欧美国产日韩中文字幕| 亚洲动漫精品| 久久免费一区| 亚洲欧美中文日韩在线| 亚洲精品小视频在线观看| 国产在线视频欧美一区二区三区| 欧美日韩一区二区欧美激情| 久久久久女教师免费一区| 亚洲欧美日韩国产成人| 亚洲欧洲综合| 亚洲国产婷婷| 欧美激情视频网站| 米奇777超碰欧美日韩亚洲|