AI ? ??? ???? ?? ?? ?? ???: NVIDIA HGX H100

Reading Time: 4 minutes

NVIDIA? ??? ?? ??? ???? ????? ?? ???? ??? ????? ?? ??? ??? ?? ??? ????? ???? ????. ?? ??(AI), ??? ???(HPC), ??? ??? ???? ??????? ???? ?? ????? ??? ??? ????? 10? ?? ?? ?? ?? ??? ???? ? ?? ??? ??? ???? ?????.

??? ?? ??? ???? NVIDIA Hopper ???? ??? ?? GPU ?? ?? ??, NVIDIA HGX H100? ?????. ? ??? ??? ???? ?? ??? ?? ????, ?????? ??? ??? ??? ????? ?? ???? ??? ?? ?? ??? ?????.

? ?????? ?? ?? ??? ????? ???? NVIDIA HGX H100? ?? ? ?? ? ???? ??? ?? ??????.

HGX H100 8-GPU

HGX H100 8-GPU? ??? Hopper ?? GPU ??? ?? ?? ???H100 Tensor ?? GPU 8?? 3?? NVSwitch 4?? ??????. H100 GPU?? ?? ?? 4?? NVLink ??? ??? 4?? NVSwitch? ?? ?????. ? NVSwitch? H100 Tensor ?? GPU 8?? ?? ???? ???? ?? ??? ??????.

??? NVSwitch ?? ?? ????? ???? ?? H100 GPU? ?? H100 GPU? ??? ??? ? ????.?? ? ??? ?? 900?????(GB/s)? NVLink ??? ??? ????, ?? ?? PCIe Gen4 x16 ??? ????? 14? ?? ?? ?????.

?? 3?? NVSwitch? ????? ? NVIDIA SHARP ???? ? ??? ???? ?? ??? ?? ??? ???? ???? ?????. ? ??? NVLink ??? ???? ?? ??(all-reduce)? ?? ???? AI ?? ??? ?? ???? ???? HGX A100? ???? ?? 3? ?????. ???? NVSwitch ???? GPU? ??? ?? ?????.

	HGX A100 8-GPU	HGX H100 8-GPU	Improvement Ratio
FP8	–	32,000 TFLOPS	6X (vs A100 FP16)
FP16	4,992 TFLOPS	16,000 TFLOPS	3X
FP64	156 TFLOPS	480 TFLOPS	3X
In-Network Compute	0	3.6 TFLOPS	Infinite
Interface to host CPU	8x PCIe Gen4 x16	8x PCIe Gen5 x16	2X
Bisection Bandwidth	2.4 TB/s	3.6 TB/s	1.5X

?1. HGX A100 8-GPU? ??? HGX H100 8-GPU ??

*??: FP ??? ???? ???

NVLink-???? ??? ??? HGX H100 8-GPU

??? ??? AI? ?? ??? ?? ?? ???? ?? ????? HPC ? ? ?? ?? ??? AI ??? ???????? ?????? ? ? ??? ????. ?? ???? ??? ???? ? ?? ??? ??? ????? ?? ????? ?? GPU ?? ??? ???? ???? ???.

??? ??? ?? ??? ???? ?? ??? NVLink ? NVSwitch? HGX H100 8-GPU? ??? NVLink ????? ?? NVLink ??? ?? ? ?? ?? ? ??? ? ??? ???????. HGX H100 8-GPU? ? ?? ??? ? ??? NVLink-???? ??? ???? ???.

The HGX H100 8-GPU was designed to scale up to support a larger NVLink domain with the new NVLink-Network. — *?? 2. NVLink-???? ??? ??? HGX H100 8-GPU? ??? ?? ?????*

NVLink-???? ??? ??? HGX H100 8-GPU? ??? ??? ??? Octal Small Form Factor Pluggable(OSFP) LinkX ??? ? ??? ?? NVLink ???? ?? ?? ???? ??? ??? ? ????. ? ??? ?? ?? 256?? GPU NVLink ???? ??? ? ????. ?? 3? ???? ???????.

The cluster topology of the HGX H100 8-GPU with NVLink-Network support enables up to a maximum of 256 GPU NVLink domains. — *Figure 3. 256 H100 GPU Pod*

	256 A100 GPU Pod	256 H100 GPU Pod	Improvement Ratio
NVLINK Domain	8 GPU	256 GPU	32X
FP8	–	1,024 PFLOPS	6X (vs A100 FP16)
FP16	160 PFLOPS	512 PFLOPS	3X
FP64	5 PFLOPS	15 PFLOPS	3X
In-Network Compute	0	192 TFLOPS	Infinite
Bisection Bandwidth	6.4 TB/s	70 TB/s	11X

?2. 256 A100 GPU ??? 256 H100 GPU ?? ??

*??: FP ??? ???? ???

?? ?? ?? ? ????? ??

HGX H100 ??? ? ???? ??? ?? ???? ?? AI ? HPC ?????? ??? ?? ???????.

???? ????? AI ? HPC ??? ?? ??? ??? GPU ??? ?? ??? ??? ? ????. ?? ?? BERT-Large, Mask R-CNN ? HGX H100? ?? ?? ???? ???? ??????.

?? ???? ??? ? AI ? HPC ??? ??, ?? ?? ?? GPU ???? ??? ?? ? ?????. ?? ??, ????? ??? ???? ???? ?? DLRM(deep learning recommendation model), ??? MoE(mixture-of-experts) ??? ?? ?? ? NVLink ????? ??? HGX H100? ?? ?? ?? ??? ????? ??? ???? ??? ?? ??? ??????.

NVIDIA H100 GPU ???? ??? ?? 4? NVLink-????? ?? ???? ???? ?? ??? ?????.

HPC, AI Inference, and AI Training diagrams all show the extra performance boost enabled by the NVLink-Network. — *?? 4. ??? ??? ?? ?? ? ?????? ?? ??*

?? ?? ??? ??? ??? ??? ?? ???? ?? ?? ????? ?? ? ????. A100 ????: HDR IB ????. H100 ????: ??? ?? NVLink-????? ??? NDR IB ????.

# GPU: ?? ??? 1K, LQCD 1K, ???? 8, 3D-FFT 256, MT-NLG 32(?? ??: A100? ?? 4, 1?? H100? ?? 60, A100? ?? 8?, 1.5? ? 2?? H100? ?? 64), MRCNN 8(?? 32), GPT-3 16B 512(?? 256), DLRM 128(?? 64K), GPT-3 16K(?? 512), MoE 8K(?? 512, GPU? ??? ???)

HGX H100 4-GPU

HGX ????? 8-GPU ?? ?? 4-GPU ??? ???, ?? 4?? NVLink? ?? ?????.

The HGX family also features a version with a 4-GPU which is directly connected with fourth generation NVLink. — *?? 5. HGX H100 4-GPU? ??? ?? ?????*

H100-H100 P2P(point-to-point) ?? NVLink ???? ??? 300GB/s??, ??? PCIe Gen4 x16 bus?? ? 5? ? ????.

HGX H100 4-GPU ? ??? ??? HPC ??? ????? ????.

? ?? GPU ??? ????? ?? HGX H100 4-GPU ?? ?? 1U ??? ?? ?? ???? ??? ? ????.

? HGX H100 4-GPU? ??? ??? PCIe ????? ????? CPU? ?? ???? ??? ?? ??? ??? ?? ??? ?? ? ????.

? ? ? CPU ???? ????? ?? HGX H100 4-GPU? ? ?? CPU ??? ???? ??? ??? ??? ??? CPU ? GPU ??? ?? ? ????.

AI ? HPC? ?? ?? ???

NVIDIA? ?? ? HGX H100 ?? ?? ???? ??? ????? ??? ?????? ??? ???? ????. ??? ? ??? ??? ??? ???? ???? ?? ?? ?? ???? ?? ?? ??? ??? ? ??? ????.

AI ? ??? ???? ?? ?? ?? ???: NVIDIA HGX H100

HGX H100 8-GPU

NVLink-???? ??? ??? HGX H100 8-GPU

?? ?? ?? ? ????? ??

HGX H100 4-GPU

AI ? HPC? ?? ?? ???

Tags

??? ??

??

?? ??? ?? ??

Related posts

?? ?? ???? LLM ???? ? ??? ??? ???? NVIDIA GB200 NVL72

NVIDIA Jetson?? ???? ? ? ???? ?? ????

MONAI ? RAPIDS? ?? ??? ?? ???? ??? ??

Oracle Cloud Infrastructure(OCI)? ??? NVIDIA DGX ????? ??? ????

NVIDIA ??? ?? AI? ?? ?? ?? ?? ??