How to Use Your GPU in .NET

Introduction

This blog topic is going to tell you that a modern Core I7 is most probably the slowest piece of programmable hardware that your PC has. Modern Quad Core CPUs have almost 6 Gflops whereas modern GPUs have about 6Tflops of computational power.

This can dynamically execute simple programs written in a C dialect on your GPU, CPU or both. They are compiled and executed at the run time.

This also shows that GPU programming is not that hard. You will need a little bit of basic programming skills.

Why I need this?

Your PC is a powerful machine. By using only a CPU to execute tasks, you might waste about 90% of its potential.

Basically all data fits into some other numeric arrays.

Instances for a powerful speed up are:

  • Work on images or movies

  • When parallel works are done

  • Save time and energy by using GPU and CPU parallely

  • Use your GPU for a task while have your CPU free for something else

Keep in mind that this topic speaks about OpenCL. Unlike Cuda, it runs on any GPU (Amd, Nvidia, Intel) and also on the CPU. So any program you code can be used on any device. (Even phones)

Tested on NVIDA, AMD, and Intel.

These are the results for some prime calculations:

You can really speed up your program. Native C# is 5 times slower than the best speed you can get on your PC. The speedup factor can approach 500 times in pure multiply and add workloads. It is easy to write a program for your GPU and CPU with this class.

OpenCL code always runs faster than C# on arrays.

How To Use It?

OpenCL programming is very time consuming. This helper project will lessen your programming such that you can focus on the core problem. It is written in C# but could be adapted to any .NET language and also C++.

Suppose you want to know all the prime numbers from 2 to 10^8. Here is a simple implementation in C#

tatic void IsPrimeNet(int[] message)

{

Parallel.ForEach(message, (number, state, index) =>

{

int upperlimit = (int)Math.Sqrt(number);

for(int i=2;i<=upperlimit;i++)

{

if (message[index]%i == 0) //no lock required. every index is independent

{

message[index] = 0;

break;

}

}

});

}

Now take this code and translate to OpenCL-C.
The following
Kernel is declared as a string in a file, inline or in a resource file.

kernel void GetIfPrime(global int* message)

{

int index = get_global_id(0);

int upperl=(int)sqrt((float)message[index]);

for(int i=2;i<=upperl;i++)

{

if(message[index]%i==0)

{

//printf(“” %d / %d\n””,index,i );

message[index]=0;

return;

}

}

//printf(“” % d””,index);

}

OpenCL does cover your kernel in a loop. For simple 1D Arrays, you can get the index by calling get_global_id(0); The upper index is passed when you call the the kernel.

Instead of int[], you can write int* and so on.

You have to pass logics in the same order in which you declared them. You can also call printf inside your kernel to debug later. You can define many methods as you want inside the kernel. You can pick the entry point afterwards by stating Invoke(“Name Here”).

OpenCL C is the same as C but you can’t use pointers and you also have some special data types.

Here is how you can use this project:

Add the Nuget Package Cloo

Add reference to OpenCLlib.dll.
Download OpenCLLib.zip.

Add using OpenCL

static void Main(string[] args)

{

int[] Primes = Enumerable.Range(2, 1000000).ToArray();

EasyCL cl = new EasyCL();

set the accelerator after loading the kernel

cl.Accelerator = Accelerator.Gpu; //You can also

cl.LoadKernel(IsPrime); //Load kernel string here, (Compiles in the background)

cl.Invoke(“GetIfPrime”, Primes.Length, Primes); //Call Function By Name With Parameters

//Primes now contains all Prime Numbers

}

static string IsPrime

{

get

{

return @”

kernel void GetIfPrime(global int* message)

{

int index = get_global_id(0);

int upperl=(int)sqrt((float)message[index]);

for(int i=2;i<=upperl;i++)

{

if(message[index]%i==0)

{

//printf(“” %d / %d\n””,index,i );

message[index]=0;

return;

}

}

//printf(“” % d””,index);

}”;

}

}

With this, you can compile and call upon OpenCL kernels.

If you want to use every bit of computational power of your PC, you can use the class MultiCL. This class splits your work into N parts. Every part is pushed onto the GPU or CPU whenever required. This way, you attain the maximum performance from your PC.

static void Main(string[] args)

{

int[] Primes = Enumerable.Range(2, 1000000).ToArray();

int N = 200;

MultiCL cl = new MultiCL();

cl.ProgressChangedEvent += Cl_ProgressChangedEvent1;

cl.SetKernel(IsPrime, “GetIfPrime”);

cl.SetParameter(Primes);

cl.Invoke(0, Primes.Length, N);

}

private static void Cl_ProgressChangedEvent1(object sender, double e)

{

Console.WriteLine(e.ToString(“0.00%”));

}

Working

This work refers to the Nuget package Cloo. With Cloo, orders to OpenCL are possible from .NET.

It hides the installation details you need to know to use OpenCL and Cloo. To get more information about your kernel or device, use the class OpenCL.

There are 3 classes in this project:

  • EasyCL
  • MultiCL
  • OpenCL

Every call to Invoke calls the corresponding methods in the OpenCL API:

void Setargument(ComputeKernel kernel, int index, object arg)

{

if (arg == null) throw new ArgumentException(“Argument “ + index + ” is null”);

Type argtype = arg.GetType();

if (argtype.IsArray)

{

Type elementtype = argtype.GetElementType();

//ComputeBuffer<int> messageBuffer = new ComputeBuffer<int>(context,

//ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, (int[])arg);

ComputeMemory messageBuffer = (ComputeMemory)Activator.CreateInstance

(typeof(ComputeBuffer<int>), new object[]

{

context,

ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,

arg

});

kernel.SetMemoryArgument(index, messageBuffer); // set the array

}

else

{

//kernel.SetValueArgument(index, (int)arg); // set the array size

typeof(ComputeKernel).GetMethod(“SetValueArgument”).MakeGenericMethod(argtype).Invoke

(kernel, new object[] { index, arg });

}

}

Every time you change the kernel, the program gets recompiled:
For a faster prototype phase, this class tells you why you cannot compile your kernel.

public void LoadKernel(string Kernel)

{

this.kernel = Kernel;

program = new ComputeProgram(context, Kernel);

try

{

program.Build(null, null, null, IntPtr.Zero); //compile

}

catch (BuildProgramFailureComputeException)

{

string message = program.GetBuildLog(platform.Devices[0]);

throw new ArgumentException(message);

}

}

It is very important to know that if your GPU driver crashes or kernels use 100% of your GPU for more than 3 seconds (on pre Win10 machines), the kernel will get aborted. You should dispose the EasyCL object after that.

//If windows Vista,7,8,8.1 you better be ready to catch:

EasyCL cl = new EasyCL();

cl.InvokeAborted += (sender,e)=> Cl_InvokeAborted(cl,e);

private void Cl_InvokeAborted(EasyCL sender, string e)

{

//your logic here

}

What is missed?

You cannot choose if you want to use the host pointer or read write access to int[] passed to the kernel. This seems to be a legacy function.

This class is written for PCs. With VS/Xamarin, it could be easy adapted for phones.

We conclude the discussion here. Let us know your opinion in the comments sections below. And feel free to refer Microsoft’s site to gather more information.

If you want to improve your skill in ASP.Net and excel yourself in ASP.NET training program; our institute, CRB Tech Solutions would be of great help and for you. Come and join us with our well structured program for ASP .Net.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Buffer this pageEmail this to someoneDigg thisShare on FacebookShare on Google+Share on LinkedInPrint this pageShare on RedditPin on PinterestShare on StumbleUponTweet about this on TwitterShare on Tumblr

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>