CUDA Programming

A simple way to understand the difference between a GPU and a CPU is to compare how they process tasks. A CPU consists of a few cores optimized for sequential serial processing while a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.GPU does mathematically heavy tasks that usually would strain the CPU .

 

CUDA programming was invented to get great software to GPU , so we could run C,C++ or fotron code directly on GPU’s. NVIDIA , went out to build CUDA programming so developers can take advantage of this hardware for various tasks that strain the CPU. NVIDIA main was to , Bring Software and Hardware together. ¬†Now , we use GPU to perform any tasks that burden the CPU for example , floating point number operations. NVIDIA made it possible to harness the GPU power for its true potential and by removing the learning curve required before , to write code for GPU.

So by using CUDA extensions , we can write Kernel code in C

CUDA.png

source : https://blogs.nvidia.com/blog/2012/09/10/what-is-cuda-2/

CUDA is not API or a language for that matter , It is just a platform to write seamless GPU code. In the above picture , you can see with simple keywords let the developer write CUDA code.

First dip into Parallel Programming

Parallel Computing , words itself seem very intriguing. The reason behind Parallel Computing was the increase in power in our present systems and chance to exploit that power.

So I was very eager to dive into some Parallel Programming.What is the first program every computer science engineer writes in a new Environment,Hello world Program.

#include 
#include "omp.h"
int main(){
#pragma omp parallel for
for(int i=0;i<5;i++){
    printf(" Hello World %d ....",omp_get_thread_num());
}

}

Output I got were pretty Interesting
Test 1:

Hello World 2
3rd iteration in Thread 2
Hello World 1
2nd iteration in Thread 1
Hello World 0
0th iteration in Thread 0
Hello World 0
1st iteration in Thread 0
Hello World 3
4th iteration in Thread 3

Test 2:

 Hello World 0 
0th iteration in Thread 0 
 Hello World 0 
1st iteration in Thread 0 
 Hello World 2 
3rd iteration in Thread 2 
 Hello World 1 
2nd iteration in Thread 1 
 Hello World 3 
4th iteration in Thread 3 

#Pragma OpenMP for ,this flag tells the compiler that this for loop can be executed in parallel.(YES , as simple as that)
It created 4 threads on my machine.Which is running an Intel i5 Processor,two physical cores and two virtual cores.Which add up to 4 threads.

The outputs show an Interesting Behaviour!The for loop iterations may occur in any order when executed.
With that I found my first learning in Parallel Computing If the output of a program must remain consistent throughout multiple tests we must make sure the tasks we flag parallel are well and truly independent of the other iterations.