CUDA 5? CUDA ??? nvprof?? ??? ? ??? ???????. nvprof? Linux, Windows ? OS X?? ??? ? ?? ??? ????????. ?? ?? nvprof? NVIDIA Visual Profiler ? NSight Eclipse Edition?? ??? ? ?? ??? ????? ???? GUI? ?? ???? ?? ? ????. ??? nvprof? ? ??? ??? ????. ?? ??? nvprof? ?? ???? ??? ? ?? ??? ?? ????????.
?? ??? ?? nvprof ??
?? CUDA ??????? ???? ???? ??? ??? ?? ????. ??? ?????? ???? ?? ??? ????. ?? GPU?? ????? ??? ???? ???? ???? ??? ??? ???? ???? nvprof ./myApp?? ??????? ???? ?? ?? ???? ? ? ???, ???????? ??? ?? ?? ? ??? ???? ??? ??? ??? ? ????.
==9261== Profiling application: ./tHogbomCleanHemi
==9261== Profiling result:
Time(%) Time Calls Avg Min Max Name
58.73% 737.97ms 1000 737.97us 424.77us 1.1405ms subtractPSFLoop_kernel(float const *, int, float*, int, int, int, int, int, int, int, float, float)
38.39% 482.31ms 1001 481.83us 475.74us 492.16us findPeakLoop_kernel(MaxCandidate*, float const *, int)
1.87% 23.450ms 2 11.725ms 11.721ms 11.728ms [CUDA memcpy HtoD]
1.01% 12.715ms 1002 12.689us 2.1760us 10.502ms [CUDA memcpy DtoH]
nvprof ?? ?? ????? ??????? GPU ?? ? ??? ???? ????? ?????. ? ????? ??? ??? ?? ?? ??? ?? ????? ??? ??? ? ?????? ??? ?? ??? ?????. nvprof??? ?? ?? ??? ?? ?? ?? ? ??? ??? ?? ??? ? ? ?? GPU ?? ? API ?? ??? ????, API ?? ??? ?? ?? CUDA API ???? ??? ? ????.
??? nvprof –print-gpu-trace? ???? PC? ? GPU?? ???? nbody ?? ??????? ??????? ?????. ? ??? ?? GPU?? ??????? ??? ? ???? ??? ??? ??? ??? ? ????. ? ??? ?? GPU ??????? ???? ???? ??? ???? ? ?? ?????.
nvprof --print-gpu-trace ./nbody --benchmark -numdevices=2 -i=1
...
==4125== Profiling application: ./nbody --benchmark -numdevices=2 -i=1
==4125== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
260.78ms 864ns - - - - - 4B 4.6296MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
260.79ms 960ns - - - - - 4B 4.1667MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
260.93ms 896ns - - - - - 4B 4.4643MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
260.94ms 672ns - - - - - 4B 5.9524MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
268.03ms 1.3120us - - - - - 8B 6.0976MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
268.04ms 928ns - - - - - 8B 8.6207MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
268.19ms 864ns - - - - - 8B 9.2593MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
268.19ms 800ns - - - - - 8B 10.000MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
274.59ms 2.2887ms (52 1 1) (256 1 1) 36 0B 4.0960KB - - Tesla K20c (0) 2 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [242]
274.67ms 981.47us (32 1 1) (256 1 1) 36 0B 4.0960KB - - GeForce GTX 680 1 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [257]
276.94ms 2.3146ms (52 1 1) (256 1 1) 36 0B 4.0960KB - - Tesla K20c (0) 2 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [275]
276.99ms 979.36us (32 1 1) (256 1 1) 36 0B 4.0960KB - - GeForce GTX 680 1 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [290]
Regs: Number of registers used per CUDA thread.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
????? ?????? ? ?? nvprof
CUDA ??? API ?? ???? API? ???? ????? ??, nvprof? ?? ??? ???? NVIDIA GPU?? ???? ?? CUDA ??? ?????? ? ????. ? nvprof? ???? ??? ??? ?? OpenACC ?????? ????? PTX ???? ??? ???? ??????? ?????? ? ????. Mark Ebersole? CUDA Python? ?? ??? CUDACast(???? 10)?? ??? ??? ??? ?? ?? ?????. ?? ? ?????? NumbaPro ????(Continuum Analytics)? ???? Python ??? JIT ?????? GPU?? ??? ??????.
OpenACC ?? CUDA Python ????? ?? ???? ?? ??? GPU ?? CPU?? ???? ??? ???? ?? ? ????(?? ?? ???? ???? ?? ??). ?????? ?? ???? Mark? nvprof? ??? Python ?????? ???? ??????? CUDA ?? ?? ? ?? ?? ??? ???? ??? ??? GPU?? ?? ???? ???? ??? CPU?? GPU? ???? ???? ? cudaMemcpy ??? ?????? ?????. ?? nvprof? ?? ?? ??? GPU ?????? ??? ??? ?? ??? ???? ??? ?????.
?? ?????? nvprof ??
???? ???? ?? ???? ???? ??? ?? ????. ?? ??, GPU ????? Amazon EC2? ?? ???? ???? ???? ??, ???? ???? ???? ???? ? ????. ??? ????? nvprof? ?? ?????. ssh ?? ???? ?? ???? ???? nvprof?? ??????? ????? ?? ???.
–output-profile ??? ??? ???? ??? ????? ?? ??? ??? nvprof ?? NVIDIA Visual Profiler? ??? ? ????. ?, ?? ????? ???? ??? ?? ????? Visual Profiler?? ??? ????? ??? ? ????(??? ??? “?? ?????” ??).
nvprof? Visual Profiler? “??? ??” ??? ??? ?? GPU ???? ??? ? ?? ??? ??(–analysis-metrics)? ?????. ??? ????? ??? ?? ??? ???? ?? ?? ?????? ???? ??? ?????. ? ??? ???? ?? ???? ???? ???????.
?? ??? ??
??? ??? ?? ???? ????? nvprof? ?? ??? ? ????. NVIDIA Visual Profiler?? ??? ?? ????? ???? ???? ? ???? ??? ?? ??? ??? nvprof?? ??? ? ????. ??? ??? nvprof ??? ?????.
? ???? ?? nvprof? ????? ??? ??? ????.
? ???? ??? SDK? ???? ?? ???, ?? ???, ?? ??, ??, ?? ??, ???? NVIDIA ??? ???? ??? ??? ??? ??? ? ????. ?? ??? ???? NVIDIA? ?? ????? ???? ? ??? ??? ??? ?????? ???? ??? ??? ???.