cuBLAS is an implementation of the BLAS library that leverages the teraflops of performance provided by NVIDIA GPUs. However, cuBLAS can not be used as a direct BLAS replacement for applications originally intended to run on the CPU. In order to use the cuBLAS API: Such an API permits the fine tuning required to minimize redundant data copies to and from the GPU in arbitrarily complicated��
]]>