HSA and C++ AMP

Category: visual studio parallelcpp


LKeene on Tue, 14 Jan 2014 19:09:33

While reading a review of the AMD Kaveri chip this morning I came across the following quote:

"C++ AMP was pioneered by Microsoft and the Microsoft stack is built on top of DirectCompute.  DirectCompute does not really expose unified memory, and even Direct3D 11.2 only takes only preliminary steps towards unified memory. Microsoft's C++ AMP implementation targets DirectCompute and thus won't be able to take full advantage of features offered by HSA enabled systems."

Can someone from the C++ AMP team please comment on the validity of the above statement? Thank you in advance.



Ade Miller on Wed, 15 Jan 2014 05:56:04

I don't believe this is entirely true although it depends on exactly what you class as an HSA.

Support for shared CPU\GPU memory

The CPU\GPU data transfer efficiency on accelerators that share physical memory with CPU is now significantly enhanced due to elimination of redundant copying of data between GPU and CPU memory. Depending upon how the code was written, C++ AMP application that run on integrated GPU and WARP accelerators should see no (or significantly reduced) time spent on copying data. This feature is available only on Windows 8.1 and is turned on by default for WARP and some integrated GPUs. Additionally, developers can also opt into the feature programmatically through a set of APIs.

Taken from What's New for C++ AMP in Visual Studio 2013

So it would appear that C++AMP does support some aspects of unified memory, in terms of removing redundant copies between the CPU and GPU by taking advantage of shared memory. For developers targeting mobile platforms this should be a big plus. There maybe other features of the Kaveri architecture that are not exposed through DirectCompute.

The general statement that C++ AMP is built on top of DirectCompute is however correct. The huge plus for this is portability across different hardware from AMD, Nvidia and Intel. There are minuses, mainly around the places where C++ AMP does not expose features of the hardware that are vendor specific or that DirectCompute has no need to expose, for example the WARP size of the GPU. IMO this is a very acceptable tradeoff. 


samlebon23 on Mon, 20 Jan 2014 20:00:26

Check this out:

Bolt Library uses C++ AMP under the covers

LKeene on Tue, 21 Jan 2014 18:03:37

Thanks for chiming in guys.