parallel programming (Histogram)

$30-250 USD

Closed

Posted

over 4 years ago

$30-250 USD

Paid on delivery

In this project, you will develop a complete CUDA program to compute the Histogram of the input array. You will implement the Histogram on the device GPU. After the device Histogram is invoked, your program will also compute the Histogram sequentially on the CPU, and compare that solution with the device-computed solution. If it matches, then it will print out "Test PASSED" to the screen before exiting. Assume the Histogram will have 256 bins, i.e., bin 0, bin 1, …, and bin 255. Input value i will be mapped to the bin i. Use the following pseudo code for array initialization. int *A; A=malloc(sizeof(int)*N); //N is the size int init =1325; For (i=0;i<N;i++){ init=3125*init%65537; A[i]=init %256; } Task 1 - Basic CUDA Program using global memory Develop a CUDA program with GPU threads collectively performing the histogram calculation. Use an atomic instruction to enforce one thread at a time accessing to individual locations in the global histogram array. Task 2 – CUDA program that takes advantage of shared memory In Task 1, you will find that you GPU program speedup compared to the CPU version is very limited due to the atomic access to the global histogram array. Modify the code in Task 1 to try to improve the speedup by using GPU shared memory and registers. Record your runtime with respect to different input array sizes as shown in the following table for task 1 and task 2, and compute the speed up using the GPU computation time, and the CPU computation time. I did not specify the thread block size, you might can explore different thread block size to find the best thread block size for each input array size. The thread block size of 256 is the most obvious choice. Optional: You can also include the memory transfer time between CPU and GPU in the GPU computation time (In that case, it might be fair to also include the time for matrix initialization in the CPU computation time), and re-compute the speedup. Time 131072 (128*1024) 1048576 (1024*1024) CPU computation time GPU computation time GPU memory transfer time Note that the compiling command for the CUDA program using atomic instructions should add the -arch compiler option. The following compiling command can be used to compile the source CUDA program with file name histogram.cu. nvcc [login to view URL] –o histogram -arch=sm_30

C Programming

CUDA

Project ID: 22640222

About the project

7 proposals

Remote project

Active 4 yrs ago

Looking to make some money?

Email address

Benefits of bidding on Freelancer

Set your budget and timeframe

Get paid for your work

Outline your proposal

It's free to sign up and bid on jobs

7 freelancers are bidding on average $206 USD for this job

@okzhang321

I have read your description and I am so interested in your project. I am confident in your project and I can finish it clearly on time. I am well experienced and skillful CUDA/OpenMP/MPI programmer. I have +5 years of experience in software developing. I have finished a lot of project like this. I ensure the best quality of your project and to keep your deadline. Please contact me kindly and let us discuss in more detail. Working with me, you will have a good experience and good friend and save more time and money. Best regards!

$120 USD in 3 days

5.0

(89 reviews)

6.2

@ayeilyurt

Hello, I am a CUDA expert with experience in algorithm design. I have developed a lot of algorithms using CUDA and I would like to implement histogramm algorithm using CUDA. Please contact me to discuss the details and the timeline.

$300 USD in 1 day

5.0

(4 reviews)

5.3

@yashkatta1

Hi, There. I have plenty of experience in C++, CUDA. I have also done a similar project. Please have a chat about the project. I shall be glad to work on this project.

$180 USD in 1 day

5.0

(12 reviews)

3.8

@george19921014

Hi, I am Goerge. If you ping me, I can give you the result in an hour. Thanks.

$250 USD in 1 day

4.9

(6 reviews)

3.8

@akiramatsui0305

No problem! I have read your description carefully and very interested in your project. I am working on Desktop App with C/C++,C#,Python & Java for 7years. I think i can do it perfectly. If you hire me, you will get cool results. i can work full-time in your time zone. Best Regards

$140 USD in 7 days

0.0

(0 reviews)

0.0

@ajayvfreelancer

Hi, I have about seven years of experience in C and CUDA. I have developed similar algorithms in CUDA. I have two GPU cards, Telsa and Pascal. I will be able to complete your project as per your requirements and well within time. Thanks, Ajay

$200 USD in 3 days