??? ?? ??? ???? ??

Reading Time: 8 minutes

??? AI? ?? ? ? ?? ??? ??? ???? ???????. ??? ??? ??? ??? ??? ?????? ? ??? ???? ?? ????? ???? ????? ?? ??? ??? ??? ? ????.

??? ??? ???? ??? ?? ??(LLM)? ????. LLM? ??? ??? ????? ?? ?? ????, ??? ??? ? ????? ?? ?? ??? ?? ??? ? ????. ???? ??, ??, ?? ??, ?? ?? ?? ? ?? ?? ?????.

?? ?? ?? ??? ?? ??? ??? ? ?? ??? ??? ??? AI ????? ????? ????, ?? ?? ??? ???? ?? ? ?? ???? ?? ????? ?? ??? ??? ? ????. ?? ?? AI ????? ??, ?? ? ???? ??? ????? ?????.

??? ??? ?? ?? ?? ??

???? ???? ???, ?? ?? ???? ?? ?(Zero-shot) ?? ?? ?(Few-shot) ??? ?? LLM? ?? ???? ??? ??? ????. ?? ?? ? ??? ??? ?? ??? ? ??? ??? ????. ??? ?? ? ??? ??? ?? ????? ?? ? ?????, ?? ?? ?? ?????.

GPT? ????? ??? ???, ????? ??? ???? ? ???? ???? ???? ??? ??? ?? ???? ? ??? ?? ? ????. ??, ?? ? ??? ????? ? ?? ??? ??? ? ?? ???.

? ??? ???? ?? ????? ????? ?? ???? ??? ???????. ???? ??? ??? ?? ? ???, ??? ?? ???? ??? ?????. ??? ?? ??? ?? ??? ????? ?? ??? ??? ???? ???? ? ?? ??, LLM ????? ???? ????.

? ?????? ?? ??? ??, ?????? ? ???? ?? ?? ?????? NVIDIA NeMo ?????? ???? LLM? ???????? ????? ?????.

NVIDIA NeMo? ??? ???? ??(Prompt learning)

NeMo? ???? ???? ??? ??? ??? ??? ?? ? ?? ???? ???? ?? ?? ??? ?????. ??? ??? ??? ????? ?? ??? ?? P-?? ??? ?????.

???? ???? ??? ???? ???? 2D ????? ??????. ? ???? ??? ??? 2D ??? ????? ????. ???(Task)? ?? ?? ?? ?? ????? ???? ????. ?? LLM ????? ???? ? ???? ??? ????? ?? ?? ???????. NeMo ???? ?? ??? ???? ??? ???? ??? ?? ???? ?? ???? ???.
p-????? ?? ?? ???? ???? ?? LSTM ?? ?? “???? ???”? ?????. LSTM ????? p-??? ??? ? ???? ??????. ?? LLM ????? ???? ???, ? ???? ???? LSTM ???? ???????. LSTM ????? ??? p-???? ?? ?? ?? ?????, LSTM ??? ? ??? ?? ??? ?? ?? ???? ?????. NeMo p-?? ??? GPT? ????? ???? ???.

? ??? ??? ??? NeMo ???? ? ?? ?? ?? ?? ??, ? NeMo OSS ??? ?? NeMo ??? ?????.

??? GPT-3 345M ???? ??? ?? ???? ?? ????? GitHub? NeMo ????? ???? ? P-?? ????? ??? ???? ????. ? ??????? ??? ???? ? ???, ?? ????, ???? ?? ?? ??, ? ?? ??????? ?? ?? ? ???? ??? ????? ????? ?? ???.

?? ????? ?? ???? ????? ?? ??? ?????. ?? ?? ? ???? ???? ? ? NeMo ??? ?? ??? ??? ?????.

?? ??

NeMo Docker ????? ?? NeMo? ??? ? ????. ? ????? NeMo? ??? ? ?? ??????? ?? ??? ??? ?????. NeMo ????? ???? ? P-?? ????? NeMo 23.02 ????? ??????? ??? ????? ?? ???? ??? ? ? ????. ?? ????? ???? ? ????? ?????? ?????:

docker run -u $(id -u ${USER}):$(id -g ${USER}) --rm -it --net=host nvcr.io/nvidia/nemo:23.02 bash

?? ?? ???? ??? bash ?? ??? Jupyter Lab? ?????:

cd /workspace
jupyter lab --ip 0.0.0.0 --allow-root --port=8888

Jupyter lab? /workspace/nemo/tutorials/nlp/Multitask_Prompt_and_PTuning.ipynb?? ??? ??? ???? ??? NeMo ??? ?? ? ????.

?? ? ?? 5B ? 1.3B GPT-3 ??? ????? ??? GPU? ????, 4?? ?? ?? ??(TP)? ?? 20B ??? ????? 4?? NVIDIA Ampere ???? ?? NVIDIA Hopper ???? GPU? ?????.

??? ??

? ???? SQuAD ?? ?? ??? ?? ??? ?? ? ??? ??? ?????.

??? ??? JSON ?? ??? ???? .jsonl ????? ???. ? JSON ???? ??? ??? ???? ??? ??? ???? ?? ?? ??? ????? ???. ?? ???? ??? ????? ?? ??? ???? ??? ?? ?? ???? ???. ?? ?? 1? ?????.

???? ???

????? ??? ? ??? ???? ?? ???? ???. ? ??? ???? ?????? ?? ?? ??? ?? ?????. ??? ??? ?? ??? ????.

{
"taskname": "squad",
"prompt_template": "<|VIRTUAL_PROMPT_0|> Context: {context}\n\nQuestion: {question}\n\nAnswer:{answer}",
"total_virtual_tokens": 10,
"virtual_token_splits": [10],
"truncate_field": "context",
"answer_only_loss": True,
"answer_field": "answer",
}

?????? ??? 10?? ?? ??? ?? ????, ? ???? ????, ??, ??? ????? ??? ?????. ?? ??? JSON ??? ?? ??? ? ???? ???? ???? ??? ?? ??? ?????. NeMo? ?? ?? ?? ??? ???? ?? ?? ??? ???? ??? ?????(????? HuggingFace GPT-2 ??? ??? ???? Nemo ?? ??? ?? 2,048?? ??).

??

?? NeMo ???? ?? ??? yaml ??? ????, GitHub? NVIDIA/NeMo? ?? ??? ? ????. ???? ? yaml ??? ??? ?? 345M GPT ??? ?? ???? ??? ??????. NeMo p-??? ?? ?? ??? ??? ??? ? ????. NeMo? PyTorch Lightning ?????? ????? trainer.fit(model) ?? ???? ???? ???? ??? ??? ? ????.

??

?????, ??? ???? model.generate(inputs=test_examples) ?? ???? ??? ??? ?? ??("answer_field" ??)? ??? ??? ? ????.

??? ??? ?? ??? ??

????? ??? ???? 345M GPT-3 ?? ????? ?? 13B GPT-3 ? 5B GPT-3? ??? ?? NeMo GPT-3 ??? ??? ? ????. ? ??? ???? NVIDIA V100, NVIDIA A100, NVIDIA H100? ?? ??? ??? ??? ?? ?? GPU? ?????. ??? ????? ?, ?? ?? ?? ?? ??? ?????:

# Download the model from NGC gpt_file_name = "megatron_gpt_345m.nemo" !wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/megatron_gpt_345m.nemo -

NGC?? 345M GPT ??? ?????? ??, HuggingFace? ??? ?? 1.3B GPT-3 ?? 5B GPT-3 ??? ????? ?? gpt_file_name ??? .nemo ?? ??? ?????.

5B ??? ?? TP ??? 1? ??(nemo_gpt5B_fp16_tp1.nemo)? TP=2? ??(nemo_gpt5B_fp16_tp2.nemo, nemo_gpt5B_bf16_tp2.nemo) ? ??? ??? ?? ???????. ???? TP=1 ??? ??? ? ????. ?? ?? ?? ???? ?? ???? ??? ???? ?? ? ??? ??? ? ????.

?? GPU ???? ??

Jupyter ??? ??? ???? ?? ???? ?? ???? ?? GPU ????? ?????. ? ?? ??? TP(?: 20B GPT-3? ?? 4?, 5B GPT-3? ?? ?? ??? ?? 2?)? ???? ? ? ??? ?? GPU ??? ????? ?? NeMo ???? ?? ????? ???? ???. ? ????? ?? ????? ???? ?? ? ?? config ???? ?????.

??

? ????? ???? ?? ???? ??? ?????? ?? ??? ?? ??? ???? ?? GPU? ???? ??? ??? ???? ???? ??? ?????.

TP=2? 5B GPT ??(nemo_gpt5B_fp16_tp2.nemo) ?? TP=4? 20B GPT-3 ??? ????? ? ????. ??? ??? .nemo zip ????? ?????. ?? ?? ??? ?? ???? ?? ??? ??? ?? ? ?? ??? ??? NeMo ???? ?????. ?? ????? ?????:

tar -xvf nemo_gpt5B_fp16_tp2.nemo -C nemo_gpt5B_fp16_tp2.nemo.extracted

?? ?? ??? ? ???? nemo_gpt5B_fp16_tp2.nemo.extracted? NeMo config?? ?????.

??

?? ??? ??(??? ? ?? ?? ??????)? ??? ?? ??? ??? ????:

name: megatron_virtual_prompt_gpt

trainer:
  devices: 2
  accelerator: gpu
  num_nodes: 1
  precision: 16
  logger: False # logger provided by exp_manager
  enable_checkpointing: False
  replace_sampler_ddp: False
  max_epochs: 25 # min 25 recommended
  max_steps: -1 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches
  log_every_n_steps: 10 # frequency with which training steps are logged 
  val_check_interval: 1.0 # If is an int n > 1, will run val every n training steps, if a float 0.0 - 1.0 will run val every epoch fraction, e.g. 0.25 will run val every quarter epoch
  gradient_clip_val: 1.0
  resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
  benchmark: False


exp_manager:
  explicit_log_dir: null
  exp_dir: null
  name: ${name}
  create_wandb_logger: False
  wandb_logger_kwargs:
    project: null
    name: null
  resume_if_exists: True
  resume_ignore_no_checkpoint: True
  create_checkpoint_callback: True
  checkpoint_callback_params:
    monitor: val_loss
    save_top_k: 2
    mode: min
    save_nemo_on_train_end: False # Should be false, correct prompt learning model file is saved at model.nemo_path set below, 
    filename: 'megatron_gpt_prompt_tune--{val_loss:.3f}-{step}'
    model_parallel_size: ${model.tensor_model_parallel_size}
    save_best_model: True

model:
  seed: 1234
  nemo_path: ${name}.nemo # .nemo filename/absolute path to where the virtual prompt model parameters will be saved
  virtual_prompt_style: 'p-tuning' # one of 'prompt-tuning', 'p-tuning', or 'inference'
  tensor_model_parallel_size: 1 # intra-layer model parallelism
  pipeline_model_parallel_size: 1 # inter-layer model parallelism
  global_batch_size: 8
  micro_batch_size: 4

  restore_path: null # Path to an existing p-tuned/prompt tuned .nemo model you wish to add new tasks to or run inference with
  language_model_path: ??? # Path to the GPT language model .nemo file, always required
  save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training. 
  existing_tasks: [] # List of tasks the model has already been p-tuned/prompt-tuned for, needed when a restore path is given
  new_tasks: ['squad'] # List of new tasknames to be prompt-tuned
  


  ## Sequence Parallelism
  # Makes tensor parallelism more memory efficient for LLMs (20B+) by parallelizing layer norms and dropout sequentially
  # See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details.
  sequence_parallel: False

  ## Activation Checkpoint 
  activations_checkpoint_granularity: null # 'selective' or 'full' 
  activations_checkpoint_method: null # 'uniform', 'block', not used with 'selective'
  # 'uniform' divides the total number of transformer layers and checkpoints the input activation
  # of each chunk at the specified granularity
  # 'block' checkpoints the specified number of layers per pipeline stage at the specified granularity
  activations_checkpoint_num_layers: null # not used with 'selective'

  task_templates: # Add more/replace tasks as needed, these are just examples
  - taskname: "squad"    
    prompt_template: "<|VIRTUAL_PROMPT_0|> Context: {context}\n\nQuestion: {question}\n\nAnswer:{answer}"
    total_virtual_tokens: 10
    virtual_token_splits: [10]
    truncate_field: null
    answer_only_loss: False
    "answer_field": "answer"

  prompt_tuning: # Prompt tunin specific params
    new_prompt_init_methods: ['text'] # List of 'text' or 'random', should correspond to tasks listed in new tasks
    new_prompt_init_text: ['some init text goes here'] # some init text if init method is text, or None if init method is random

  p_tuning: # P-tuning specific params
    encoder_type: "tpmlp" # ['tpmlp', 'lstm', 'biglstm', 'mlp'] 
    dropout: 0.0
    num_layers: 2  # number of layers for MLP or LSTM layers. Note, it has no effect for tpmlp currently as it always assumes it is two layers.
    encoder_hidden: 2048 # encoder hidden for biglstm and tpmlp
    init_std: 0.023  # init std for tpmlp layers

  data:
    train_ds: ???
    validation_ds: ???
    add_eos: True
    shuffle: True
    num_workers: 8
    pin_memory: True
    train_cache_data_path: null  # the path to the train cache data 
    validation_cache_data_path: null  # the path to the validation cache data 
    test_cache_data_path: null  # the path to the test cache data 
    load_cache: False  # whether to load from the cache data


  optim:
    name: fused_adam
    lr: 1e-4
    weight_decay: 0.01 
    betas: 
    - 0.9
    - 0.98
    sched:
      name: CosineAnnealing
      warmup_steps: 50
      min_lr: 0.0 # min_lr must be 0.0 for prompt learning when pipeline parallel > 1
      constant_steps: 0 # Constant steps should also be 0 when min_lr=0
      monitor: val_loss
      reduce_on_plateau: false

Jupyter Lab ?????? ???? ? ???? ??? ???? /workspace/nemo/exples/nlp/language_modeling/conf/megatron_gpt_prompt_learning_squad.yaml? ?????.

config ???? ?? ??? ?? ?? ??? ???? ??????:

prompt_template: "<|VIRTUAL_PROMPT_0|> Context: {context}\n\nQuestion: {question}\n\nAnswer:{answer}"
    total_virtual_tokens: 10
    virtual_token_splits: [10]
    truncate_field: null
    answer_only_loss: False
    "answer_field": "answer"

???? 10?? ?? ???? ??? ? ?? ?? ??? ??? ?? ?????.

????

??? ????? Jupyter ? ?????(?? → ?? ??? → ???)?? ??? ?? ???. ?? ?? bash ??? ?????:

python /workspace/nemo/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py \
    	--config-name=megatron_gpt_prompt_learning_squad.yaml \
    	trainer.devices=2 \
    	trainer.num_nodes=1 \
    	trainer.max_epochs=25 \
    	trainer.precision=bf16 \
    	model.language_model_path=/workspace/nemo/tutorials/nlp/nemo-megatron-gpt-5B/nemo_gpt5B_fp16_tp2.nemo.extracted \
    	model.nemo_path=/workspace/nemo/examples/nlp/language_modeling/squad.nemo \
    	model.tensor_model_parallel_size=2 \
    	model.pipeline_model_parallel_size=1 \
    	model.global_batch_size=16 \
    	model.micro_batch_size=1 \
    	model.optim.lr=1e-4 \
    	model.data.train_ds=[/workspace/nemo/tutorials/nlp/data/SQuAD/squad_train.jsonl] \
    	model.data.validation_ds=[/workspace/nemo/tutorials/nlp/data/SQuAD/squad_val.jsonl]

?? ??? ?????:

model.tensor_model_parallel_size? 5B GPT ??(nemo_gpt5B_fp16_tp2.nemo)? ?? 2?, 20B GPT-3 ??? ?? 4? ???? ???.
trainer.devices? TP ?? ??? ???? ???. 5B ??? ?? 4? ??, ?? ? ?? GPU? ???? ? ?? ??? ?? ??? ????.
model.language_model_path? ?? ?? ????? ?? ??? ???? ???.
model.data.train_ds, model.data.validation_ds? ?? ? ?? ???? ??? ???? ???.

??

????? ??? ???? ?? ????? ???? NeMo?? ??? ?????:

python /workspace/nemo/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py \
            virtual_prompt_model_file=/workspace/nemo/examples/nlp/language_modeling/intent_n_slot.nemo \
            gpt_model_file=/workspace/nemo/tutorials/nlp/nemo-megatron-gpt-5B/nemo_gpt5B_fp16_tp2.nemo.extracted  \
            inference.greedy=True \
            inference.add_BOS=False \
            inference.tokens_to_generate=128 \
            trainer.devices=2 \
            trainer.num_nodes=1 \
            tensor_model_parallel_size=2 \
            pipeline_model_parallel_size=1 \
            data_paths=["/workspace/nemo/tutorials/nlp/data/SQuAD/squad_test.jsonl"] \
            pred_file_path="test-results.txt"

?? ??? ?????:

model.tensor_model_parallel_size? 5B GPT ??(nemo_gpt5B_fp16_tp2.nemo)? ?? 2?, 20B GPT-3 ??? ?? 4? ???? ???.
trainer.devices? TP ?(?)? ??? ???? ???.
pred_file_path? ??? ??? ??? ???, ??? ??? ? ????.

NeMo? ???? ?? ?? ??? ?? ????

? ?????? NeMo? ???? ??? ?? ??? ???? ?? ?? ??? ?? LLM? ??? ???? ????? ??????. ?? ?? ??????? ??? ??? ???? ????? ??? ???? ????? ?? ??? NLP ??????? ??? ? ????.

LLM ??????? ????? GitHub? NVIDIA/NeMo? ?????. ?? ???? ????? ? ????.

?? ???

GTC ??: An Introduction to Developing with Project Mellon (Spring 2023)
GTC ??: Scaling Large Language Model Training with PAX on GPUs (Spring 2023)
GTC ??: Deep Learning, LLM’s & Generative Models for Computer Games and Creative Industries (Spring 2023)
SDK: NeMo Megatron
SDK: NeMo LLM Service
SDK: Nsight Deep Learning Designer

? ???? ??? SDK? ???? ?? ???, ?? ???, ?? ??, ??, ?? ??, ???? NVIDIA ??? ???? ??? ??? ??? ??? ? ????. ?? ??? ???? NVIDIA? ?? ????? ???? ? ??? ??? ??? ?????? ???? ??? ??? ???.

??? ?? ??? ???? ??

??? ??? ?? ?? ?? ??

NVIDIA NeMo? ??? ???? ??(Prompt learning)

?? ??

??? ??

???? ???

??

??

??? ??? ?? ??? ??

?? GPU ???? ??

??

??

????

??

NeMo? ???? ?? ?? ??? ?? ????

?? ???

Tags

??? ??

??

?? ??? ?? ??

Related posts

NVIDIA NIM?? ??? ?? ?? ?? AI ???? ????

NVIDIA NIM, ?? ??? ???? ???? ??? ?????.

NVIDIA NeMo? ??? ???? ??? LLM ????, 1?

NVIDIA TensorRT-LLM ? NVIDIA Triton Inference Server? Meta Llama 3 ?? ??

8-bit ??? ???? ???? ???? ??? 2? ? ??? ????? NVIDIA TensorRT