• <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>

            chaogu ---大寫的人!

            LOG-2011-04

             

            //============================================================

            //============================================================

            DATE:2011-4-12

            TIME:01:18

            ICBC.pdf –finish

            //============================================================

            //============================================================

            DATE:2011-4-15

            TIME:00:00

            Reading the “NoSQL Datebase”

               Reason for use NoSQL

            1. Avoidance of Unneeded Complexity

            2. High Throughput

            3. Horizontal Scalability and Running on Commodity Hardware

            4. Avoidance of Expensive Object-Relational Mapping

            5. Complexity and Cost of Setting up Database Clusters

            6. Compromising Reliability for Better Performance

            7. The Current “One size fit’s it all” Databases Thinking Was and Is Wrong

            8. The Myth of Effortless Distribution and Partitioning of Centralized Data Models

            9. Movements in Programming Languages and Development Frameworks

            10. Requirements of Cloud Computing

            11. The RDBMS plus Caching-Layer Pattern/Workaround vs. Systems Built from Scratch with Scalability in Mind

            12. Yesterday’s vs. Today’s Needs

            Nosqldbs.pdf ----page19

             

            //============================================================
            //============================================================
            DATE:2011-4-16

            TIME:00:24

            Reading the cudaArticle—05

            A multiprocessor takes four clock cycles to issue one memory instruction for a "warp"

            Accessing local or global memory incurs an additional 400 to 600 clock cycles of memory latency

            -----------------------------------

            Cuda Memory

            Registers:

            The fastest form of memory on the multi-processor.

            Is only accessible by the thread.

            Has the lifetime of the thread.

            Shared Memory:

            Can be as fast as a register when there are no bank conflicts or when reading from the same address.

            Accessible by any thread of the block from which it was created.

            Has the lifetime of the block.

            Global memory:

            Potentially 150x slower than register or shared memory -- watch out for uncoalesced reads and writes which will be discussed in the next column.

            Accessible from either the host or device.

            Has the lifetime of the application.

            Local memory:

            A potential performance gotcha, it resides in global memory and can be 150x slower than register or shared memory.

            Is only accessible by the thread.

            Has the lifetime of the thread.

             

            // includes, system
            #include <stdio.h>
            #include <assert.h>
             
            // Simple utility function to check for CUDA runtime errors
            void checkCUDAError(const char* msg);
             
            // Part 2 of 2: implement the fast kernel using shared memory
            __global__ void reverseArrayBlock(int *d_out, int *d_in)
            {
                extern __shared__ int s_data[];
             
                int inOffset = blockDim.x * blockIdx.x;
                int in = inOffset + threadIdx.x;
             
                // Load one element per thread from device memory and store it 
                // *in reversed order* into temporary shared memory
                s_data[blockDim.x - 1 - threadIdx.x] = d_in[in];
             
            // Block until all threads in the block have written 
            //their data to shared mem
                __syncthreads();
             
                // write the data from shared memory in forward order, 
                // but to the reversed block offset as before
             
                int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x);
             
                int out = outOffset + threadIdx.x;
                d_out[out] = s_data[threadIdx.x];
            }
             
            ////////////////////////////////////////////////////////////////////
            // Program main
            ////////////////////////////////////////////////////////////////////
            int main( int argc, char** argv) 
            {
                // pointer for host memory and size
                int *h_a;
                int dimA = 256 * 1024; // 256K elements (1MB total)
             
                // pointer for device memory
                int *d_b, *d_a;
             
                // define grid and block size
                int numThreadsPerBlock = 256;
             
            // Compute number of blocks needed based on array size 
            //and desired block size
                int numBlocks = dimA / numThreadsPerBlock; 
             
                // Part 1 of 2: Compute the number of bytes of shared memory needed
                // This is used in the kernel invocation below
                int sharedMemSize = numThreadsPerBlock * sizeof(int);
             
                // allocate host and device memory
                size_t memSize = numBlocks * numThreadsPerBlock * sizeof(int);
                h_a = (int *) malloc(memSize);
                cudaMalloc( (void **) &d_a, memSize );
                cudaMalloc( (void **) &d_b, memSize );
             
                // Initialize input array on host
                for (int i = 0; i < dimA; ++i) {
                    h_a[i] = i;
                }
             
                // Copy host array to device array
                cudaMemcpy( d_a, h_a, memSize, cudaMemcpyHostToDevice );
             
                // launch kernel
                dim3 dimGrid(numBlocks);
                dim3 dimBlock(numThreadsPerBlock);
            reverseArrayBlock<<< dimGrid, dimBlock, sharedMemSize >>>( d_b, d_a );
             
                // block until the device has completed
                cudaThreadSynchronize();
             
                // check if kernel execution generated an error
                // Check for any CUDA errors
                checkCUDAError("kernel invocation");
             
                // device to host copy
                cudaMemcpy( h_a, d_b, memSize, cudaMemcpyDeviceToHost );
             
                // Check for any CUDA errors
                checkCUDAError("memcpy");
             
                // verify the data returned to the host is correct
                for (int i = 0; i < dimA; i++){
                    assert(h_a[i] == dimA - 1 - i );
                }
             
                // free device memory
                cudaFree(d_a);
                cudaFree(d_b);
             
                // free host memory
                free(h_a);
             
            // If the program makes it this far, 
            //then the results are correct and
                // there are no run-time errors. Good work!
                printf("Correct!\n");
             
                return 0;
            }
             
            void checkCUDAError(const char *msg)
            {
                cudaError_t err = cudaGetLastError();
                if( cudaSuccess != err) 
                {
                    fprintf(stderr, "Cuda error: %s: %s.\n", msg, 
                                      cudaGetErrorString( err) );
                    exit(EXIT_FAILURE);
                }                         
            }

             

            //============================================================

            TIME:01:16

            Finsh reading the cudaArticle 06

             

            //============================================================

            DATE:2011-4-23

            TIME:09:31

            Reading berkeley view on cloud computing

               Page 10 classes of utility computing

             

            //============================================================

            DATE:2011-4-24

            TIME:00:16

            Reading Makefile.pdf

             

            --------------------------------------------------------------

            List macros specified by defalut(Makefile)

               Using : make –p

            $@ name of target

            $? List of dependents

            $^ gives all dependencies,whether more recent than the target

            $+ same as above,but keep the duplicate names

            $< the first dependencies

             

            --------------------------------------------------------------

            Reading berkeley view on cloud computing

               Page 19 Number 5 Obstacle: Performance Unpredictability

             

            //============================================================

            //============================================================

            DATE:2011-4-25

            TIME:01:40

            Finish reading Berkeley view on cloud computing

             

            //============================================================

            //============================================================

            DATE:2011-4-28

            TIME:21:22

            Coding the motion project

            The Visual Studio 2005 return an error that stack overflow

            “Unhandled exception at 0x00439a57 in motion.exe: 0xC00000FD: Stack overflow.”

             

            --------------------------------------------------------------

            'motion.exe': Unloaded 'C:\WINDOWS\WinSxS\x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.4053_x-ww_e6967989\msvcr80.dll'

            'motion.exe': Unloaded 'C:\WINDOWS\system32\psapi.dll'

            'motion.exe': Unloaded 'C:\WINDOWS\system32\shimeng.dll'

            First-chance exception at 0x00439a57 in motion.exe: 0xC00000FD: Stack overflow.

            Unhandled exception at 0x00439a57 in motion.exe: 0xC00000FD: Stack overflow.

            The program '[2388] motion.exe: Native' has exited with code 0 (0x0).

            --------------------------------------------------------------

            Problem: using huge big objet

             

            //============================================================

            //============================================================

            DATE:2011-4-30

            TIME:01:40

            Coding CSE332 project 2

               Adding other data-counter Implementations

             

            posted on 2011-05-03 21:57 chaogu 閱讀(649) 評論(0)  編輯 收藏 引用

            導航

            <2011年5月>
            24252627282930
            1234567
            891011121314
            15161718192021
            22232425262728
            2930311234

            統計

            常用鏈接

            留言簿(1)

            隨筆檔案

            搜索

            最新評論

            閱讀排行榜

            評論排行榜

            久久午夜夜伦鲁鲁片免费无码影视| 久久99精品久久久久久hb无码| 久久人人爽人人爽人人片AV麻豆 | 中文字幕久久欲求不满| 久久99精品久久久久久9蜜桃| 精品久久久一二三区| 久久精品亚洲精品国产色婷| 精品久久久久一区二区三区| 亚洲va久久久噜噜噜久久天堂| 97超级碰碰碰碰久久久久| 久久亚洲精品成人无码网站 | 99久久精品久久久久久清纯| 99久久夜色精品国产网站| 久久精品男人影院| 新狼窝色AV性久久久久久| 久久精品成人免费国产片小草| 久久99国产综合精品女同| 2021最新久久久视精品爱| 国产精品成人久久久久久久| 久久96国产精品久久久| 亚洲中文字幕无码久久2017| 热综合一本伊人久久精品| 国产—久久香蕉国产线看观看| 国产产无码乱码精品久久鸭| 亚洲AV日韩精品久久久久久| 人人妻久久人人澡人人爽人人精品| 国产三级观看久久| 精品国产热久久久福利| 久久电影网| 久久久久国产视频电影| 久久影院亚洲一区| 四虎久久影院| 久久精品一区二区三区AV| 18禁黄久久久AAA片| 久久久久久久波多野结衣高潮| 久久亚洲日韩看片无码| 国产成人精品三上悠亚久久| 日韩乱码人妻无码中文字幕久久| 97精品依人久久久大香线蕉97| 亚洲精品无码久久久久| 久久久久亚洲AV无码永不|