• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • CUDA ??? ?: ??? ?? GPU ????? nvprof

    Reading Time: 4 minutes

    CUDA 5? CUDA ??? nvprof?? ??? ? ??? ???????. nvprof? Linux, Windows ? OS X?? ??? ? ?? ??? ????????. ?? ?? nvprof? NVIDIA Visual Profiler ? NSight Eclipse Edition?? ??? ? ?? ??? ????? ???? GUI? ?? ???? ?? ? ????. ??? nvprof? ? ??? ??? ????. ?? ??? nvprof? ?? ???? ??? ? ?? ??? ?? ????????.

    ?? ??? ?? nvprof ??

    ?? CUDA ??????? ???? ???? ??? ??? ?? ????. ??? ?????? ???? ?? ??? ????. ?? GPU?? ????? ??? ???? ???? ???? ??? ??? ???? ???? nvprof ./myApp?? ??????? ???? ?? ?? ???? ? ? ???, ???????? ??? ?? ?? ? ??? ???? ??? ??? ??? ? ????.

        ==9261== Profiling application: ./tHogbomCleanHemi
        ==9261== Profiling result:
        Time(%)      Time     Calls       Avg       Min       Max  Name
         58.73%  737.97ms      1000  737.97us  424.77us  1.1405ms  subtractPSFLoop_kernel(float const *, int, float*, int, int, int, int, int, int, int, float, float)
         38.39%  482.31ms      1001  481.83us  475.74us  492.16us  findPeakLoop_kernel(MaxCandidate*, float const *, int)
          1.87%  23.450ms         2  11.725ms  11.721ms  11.728ms  [CUDA memcpy HtoD]
          1.01%  12.715ms      1002  12.689us  2.1760us  10.502ms  [CUDA memcpy DtoH]

    nvprof ?? ?? ????? ??????? GPU ?? ? ??? ???? ????? ?????. ? ????? ??? ??? ?? ?? ??? ?? ????? ??? ??? ? ?????? ??? ?? ??? ?????. nvprof??? ?? ?? ??? ?? ?? ?? ? ??? ??? ?? ??? ? ? ?? GPU ?? ? API ?? ??? ????, API ?? ??? ?? ?? CUDA API ???? ??? ? ????.

    ??? nvprof –print-gpu-trace? ???? PC? ? GPU?? ???? nbody ?? ??????? ??????? ?????. ? ??? ?? GPU?? ??????? ??? ? ???? ??? ??? ??? ??? ? ????. ? ??? ?? GPU ??????? ???? ???? ??? ???? ? ?? ?????.

    nvprof --print-gpu-trace ./nbody --benchmark -numdevices=2 -i=1
    ...
    ==4125== Profiling application: ./nbody --benchmark -numdevices=2 -i=1
    ==4125== Profiling result:
       Start  Duration            Grid Size      Block Size     Regs*    SSMem*    DSMem*      Size  Throughput           Device   Context    Stream  Name
    260.78ms     864ns                    -               -         -         -         -        4B  4.6296MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
    260.79ms     960ns                    -               -         -         -         -        4B  4.1667MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
    260.93ms     896ns                    -               -         -         -         -        4B  4.4643MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
    260.94ms     672ns                    -               -         -         -         -        4B  5.9524MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
    268.03ms  1.3120us                    -               -         -         -         -        8B  6.0976MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
    268.04ms     928ns                    -               -         -         -         -        8B  8.6207MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
    268.19ms     864ns                    -               -         -         -         -        8B  9.2593MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
    268.19ms     800ns                    -               -         -         -         -        8B  10.000MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
    274.59ms  2.2887ms             (52 1 1)       (256 1 1)        36        0B  4.0960KB         -           -   Tesla K20c (0)         2         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [242]
    274.67ms  981.47us             (32 1 1)       (256 1 1)        36        0B  4.0960KB         -           -  GeForce GTX 680         1         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [257]
    276.94ms  2.3146ms             (52 1 1)       (256 1 1)        36        0B  4.0960KB         -           -   Tesla K20c (0)         2         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [275]
    276.99ms  979.36us             (32 1 1)       (256 1 1)        36        0B  4.0960KB         -           -  GeForce GTX 680         1         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [290]
    
    Regs: Number of registers used per CUDA thread.
    SSMem: Static shared memory allocated per CUDA block.
    DSMem: Dynamic shared memory allocated per CUDA block.

    ????? ?????? ? ?? nvprof

    CUDA ??? API ?? ???? API? ???? ????? ??, nvprof? ?? ??? ???? NVIDIA GPU?? ???? ?? CUDA ??? ?????? ? ????. ? nvprof? ???? ??? ??? ?? OpenACC ?????? ????? PTX ???? ??? ???? ??????? ?????? ? ????. Mark Ebersole? CUDA Python? ?? ??? CUDACast(???? 10)?? ??? ??? ??? ?? ?? ?????. ?? ? ?????? NumbaPro ????(Continuum Analytics)? ???? Python ??? JIT ?????? GPU?? ??? ??????.

    OpenACC ?? CUDA Python ????? ?? ???? ?? ??? GPU ?? CPU?? ???? ??? ???? ?? ? ????(?? ?? ???? ???? ?? ??). ?????? ?? ???? Mark? nvprof? ??? Python ?????? ???? ??????? CUDA ?? ?? ? ?? ?? ??? ???? ??? ??? GPU?? ?? ???? ???? ??? CPU?? GPU? ???? ???? ? cudaMemcpy ??? ?????? ?????. ?? nvprof? ?? ?? ??? GPU ?????? ??? ??? ?? ??? ???? ??? ?????.

    ?? ?????? nvprof ??

    ???? ???? ?? ???? ???? ??? ?? ????. ?? ??, GPU ????? Amazon EC2? ?? ???? ???? ???? ??, ???? ???? ???? ???? ? ????. ??? ????? nvprof? ?? ?????. ssh ?? ???? ?? ???? ???? nvprof?? ??????? ????? ?? ???.

    –output-profile ??? ??? ???? ??? ????? ?? ??? ??? nvprof ?? NVIDIA Visual Profiler? ??? ? ????. ?, ?? ????? ???? ??? ?? ????? Visual Profiler?? ??? ????? ??? ? ????(??? ??? “?? ?????” ??).

    nvprof? Visual Profiler? “??? ??” ??? ??? ?? GPU ???? ??? ? ?? ??? ??(–analysis-metrics)? ?????.  ??? ????? ??? ?? ??? ???? ?? ?? ?????? ???? ??? ?????. ? ??? ???? ?? ???? ???? ???????.

    nvprof ??? ??????? ??? ???? ?? ?? NVIDIA Visual Profiler(nvvp)? ???????.

    ?? ??? ??

    ??? ??? ?? ???? ????? nvprof? ?? ??? ? ????.  NVIDIA Visual Profiler?? ??? ?? ????? ???? ???? ? ???? ??? ?? ??? ??? nvprof?? ??? ? ????. ??? ??? nvprof ??? ?????.

    ? ???? ?? nvprof? ????? ??? ??? ????.

    ? ???? ??? SDK? ???? ?? ???, ?? ???, ?? ??, ??, ?? ??, ???? NVIDIA ??? ???? ??? ??? ??? ??? ? ????. ?? ??? ???? NVIDIA? ?? ????? ???? ? ??? ??? ??? ?????? ???? ??? ??? ???.

    Discuss (0)
    +1

    Tags

    ?? ???

    人人超碰97caoporen国产