在 Diamond Light Source 使用 NVIDIA Holoscan 加速印刷工作流程

Diamond Light Source?是英國知名的同步輻射加速器設施，它為科學家們提供了密集的 X 射線、紅外線以及其他形式的光線，用于研究材料和生物結構。該設施設有 30 多個實驗站或束線，并承載著一些世界上最先進、最復雜的科學研究項目。

I08-1 是 Diamond Light Source 的軟 X 射線光束線，提供了一種稱為分版成像的先進高分辨率成像技術，可提供納米級分辨率的圖像。分版成像使用一種計算成像方法，根據 X 射線束與樣本相互作用產生的測量結果或衍射圖案，以納米級分辨率重建樣本圖像。

這對于許多生物結構(例如細胞中的線粒體和細胞器)以及材料科學樣本中的內部結構和缺陷的納米級特征成像至關重要。這種重建圖像的過程非常強大，但可能會導致測量數據和查看圖像之間存在顯著差距。

I08-1 檢測器每秒可處理 25 幀圖像，而每秒可處理數千幀圖像的檢測器即將投入使用。這些傳感器儀器需要在邊緣進行加速計算。

更快的掃描速度能夠研究更動態的工作流程。它們提高了實驗的吞吐量，實時處理為用戶提供實時反饋，以調整實驗樣本、檢測器設置和探索樣本以發現有趣的科學結果。

本文討論了我們使用 I08-1 通過重構數據分析工作流來加快波束線實驗數據的實時處理速度的工作。它還解決了一些關鍵挑戰，例如目前以串行方式運行的通用分版印刷工作流，其中圖像以 25 Hz 的幀率寫入磁盤。

掃描完成后，啟動實時處理流程，處理應用程序（PtyPy）可以處理完整的數據集。PtyPy 應用程序已針對 GPU 加速進行了優化，但 I/O 通信仍是實現更高吞吐量的主要瓶頸。

Ptychographic pipeline showing sensor, preprocessing, data loading and 2D reconstruction, and Image display pipeline steps with each step handing off to the next through a written image file. The delay and idle time grows with each step and is shown on the vertical axis. — *圖 1.引入 NVIDIA Holoscan 之前基于串行文件的印刷版制作流程*

為了加速分版印刷工作流程，我們推出了 NVIDIA Holoscan，這是一個專為傳感器處理設計的 SDK。它讓科學家、研究人員和開發者能夠更輕松地優化和擴展他們的傳感器處理工作流程（如圖 2 所示）。例如，在 Holoscan Operator 中使用 JAX 庫來加速圖像預處理。

借助 Holoscan，研究人員和開發者能夠開發高性能、低延遲的傳感器處理應用程序，這些應用程序可以使用熟悉語言的參考示例更輕松地進行擴展。

The Holoscan SDK diagram shows a block at the top for Holohub containing reference applications and a block for Model Zoo containing NGC and Monai. The Holoscan SDK block contains blocks for Python, JAX, CuPy, RAPIDS, C++, or Graph Composer. The Holoscan SDK block also contains operator blocks with re-usable code segments with APIs for IO, AI Inference, Visualization, and Customer functions. The Holoscan software stack sits on top of blocks representing I/O libraries with DPDK and Rivermax, AI libraries such as TensorRT and Triton, and Visualization libraries such as Vulcan. NVIDIA Acceleration Libraries is shown as foundation block underlying everything comprising the Holoscan SDK to deliver accelerated computing. Icons at the bottom of the figure denote it is available to run on appliances, workstations, servers, or the cloud. — *圖 2. NVIDIA Holoscan SDK 專為手術、衛星等傳感器應用而設計*

I08-1 的軟 X 射線分版攝影儀使用 sCMOS 攝像頭為分版攝影實驗收集衍射數據。原始數據以形狀幀的形式提供(2048, 2048)和類型uint16.在將這些數據輸入迭代分版求解器應用 (PtyPy) 之前，我們會在每一原始幀中執行以下常規預處理任務：

背景暗流圖像的減法
圍繞中心裁剪并重新融合以減少重建時間

第一項任務需要獲得清晰的衍射圖像，而另外兩項任務則根據實驗情況提供無損壓縮水平。理想情況下，所有這些步驟都盡可能在靠近源的地方執行(例如，片上或使用 FPGA)。遺憾的是，在此場景中使用的特定攝像頭都沒有這些選項。

JAX 用于顯著加快單線程 Python 腳本執行上述任務的速度，并盡可能減少對代碼的更改。由于原始幀處理代碼是用 NumPy 編寫的，因此 JAX JIT 能夠將處理例程融合到單個 GPU 內核中，與原始 NumPy 版本相比，單張圖像的速度提高了 2000 倍以上（忽略了從主機到設備所需的數據傳輸）。即使考慮到數據傳輸，速度也比原始基于 CPU 的 NumPy 實現提高了 40 倍以上。

雖然獲取數據的速度相對較快，但將數據重建為研究人員或研究人員可以解讀的圖像可能很容易需要幾分鐘或幾十分鐘。在顯示掃描結果之前，掃描和圖像之間的這種無操作時間效率很低，會影響研究人員確定儀器設置是否正確，或者所掃描的樣本區域是否有趣的能力。

通過將 ptychography 應用程序構建為應用程序和 Holoscan 運算符片段的集合，開發者可以利用 PtyPy ptychography 代碼、Holoscan AI 推理和 Holoscan 網絡運算符，相對更快地對實時處理 ptychography 應用程序的新 GPU 加速版本進行原型設計和測試。

The figure shows an application made of two fragments, each composed of chained operators that are serially connected. One example shows two operators connected in parallel. — *圖 3.Holoscan 中的應用是運算符的有向無環圖*

An operator consists of input ports and output ports and contact re-usable algorithm logic inside. — *圖 4.運算符提取輸入數據，然后在輸出端口上處理和發布*

The table of Core Holoscan Operators are listed under the categories of I/O, AI Inference, and Visualization. I/O operators are V4L2Source, AJASource, EmergentSource, BasicNetworkRx/Tx, and VideoStreamReplayer. AI inference operators are TensorRTInference and MultiAIInference. Visualization operators are Holoviz, OpenGLRenderer, and SegmentationVisualizer. — *圖 5.Holoscan SDK 參考應用和核心運算符促進流邊緣 HPC 開發*

挑戰的下一部分是如何加快波束線 I-08 拼接成像工作流程的實時處理幀率，以應對當前和未來的 sCMOS 幀率。通過重疊串行工作流程步驟并使用 Holoscan，此波束線應該能夠提供與傳感器的幀率相匹配的實時處理。這將使波束線用戶能夠實時觀察以拼接方式重建的樣本圖像。

The ptychography workflow has been redone replacing all of the file-based IO stages between steps with streaming IO. The steps are Sensor processing with EPICS, preprocessing which is a Python script, data loading and 2D reconstruction which are embodied in the ptychography software, and Image display which provides the user interface. The total time to run is much lower that the original file-based workflow and is shown on the vertical axis with smaller and fewer idle time periods. — *圖 6.使用 Holoscan 后基于套接字的實時印刷版重建工作流*

On the left are 16 out of 1257 diffraction patterns, which are ptychographically processed to reconstruct a 2D image of a butterfly wing. On the right is an image of a butterfly wing at the one-micrometer level of resolution. — *圖 7.從 1257 個衍射圖進行 2D 重建，以創建一個高分辨率的“翼”翼圖像*

?	之前	之后
數據收集	57 秒	57 秒
預處理	94 秒	58 秒
已加載的 PtyPy 數據	119 秒	61 秒
重建 PtyPy 數據	128 秒	72 秒
用戶等待時間	約 71 秒	15 秒

表 1.除用戶等待時間(相對于掃描結束時間)外，所有時間均為相對于掃描開始時間的時間

e figure shows the before and after times for the ptychography reconstruction pipeline. The before pipeline does not use Holoscan and has file I/O stage between steps. The times match the numbers in Table 1 above. The after pipeline uses Holoscan for streaming network IO between steps and reduced user waiting time to only 15 seconds. — *圖 8.使用 Holoscan 的基于文件和流式傳輸 ptychography 工作流比較*

3D 重建會產生一個更大的問題，對實時處理具有相同的要求。擴展多 GPU 和多節點處理以提供許多掃描的重疊并行處理，可能是滿足實時處理的處理和 I/O 要求的一種方法。

我們的合作旨在使用兩個 NVIDIA A2 GPU 在本地邊緣服務器上測試各種工作流配置，其中預處理在一個 GPU 上運行，圖像重建在第二個 GPU 上運行。這種方法能夠專注于定制的分版印刷代碼，同時利用邊緣網絡 I/O 運算符和 AI 加速庫，這些庫可以輕松重復使用，并在必要時擴展到多節點以用于生產用途。

Holoscan 支持創建端到端數據串流工作流，可在 I08-1 光束線上進行實時印刷圖像處理，從而顯著豐富整體用戶交互。如前所述，其他 Diamond 光源波束線在千赫茲檢測范圍內運行，但沒有一條能夠以該速率執行實時處理。

On the left, the x-ray sensor scans all positions and produces diffractions data. Ptychography software runs 1 to N times to process scans to construct the sample image. — *圖 9.斷層成像、光譜學或其他多維掃描由 1 到 N 次運行的傳統 Ptychography 軟件組成*

Figure 10 shows the x-ray beam-line instrument on the left. It is shown to generate diffraction patterns and the raw data is shown transmitted to a GPU-accelerated edge inference server running a trained AI surrogate model with many parts of the algorithm running in parallel to compute phase and amplitude information. A line shows training data and image outputs are sent to the local supercomputing facility cluster that trains the ptychography reconstruction model. The Edge inference server fine tunes this AI surrogate model using the image estimates. — *圖 10.在超級計算機設施訓練用于斷層成像、光譜學或其他多維掃描的 AI 模型*

在掃描數據上訓練 AI 模型，然后使用模型在 GPU 上以波束線運行推理，是一種有望以千赫茲速度實現實時處理的方法。

總結

傳感器處理流程(如本文中描述的分版印刷流程)將重要的處理和 I/O 要求整合到單個應用程序中。隨著傳感器分辨率和刷新率的提高，基于文件的方法不再可行，這促使處理重新設計使用實時流式傳輸工作流。

這需要適當考慮端到端性能，這立即凸顯出整個流程中的 I/O 瓶頸。 NVIDIA GPU 邊緣系統結合使用 JAX、CuPy 和 CUDA 加速預處理和重建操作，從而獲得必要的性能。但是，根據我們的端到端分析，這只會放大 I/O 瓶頸的影響。

Holoscan 提供了用于構建流處理軟件流程的工具，這些工具還可以利用硬件的功能。這包括操作人員直接從網絡將數據提取到 GPU 中(反向)，以更好地輸入 GPU 并提高其利用率。在純流處理(如前面突出顯示的預處理步驟)期間，這種利用率可能特別低。由于 Holoscan 的設計易于模塊化，因此它還支持其他功能，例如所討論的深度學習或實時可視化功能。

通過將 GPU 加速計算與 Holoscan 結合使用，I08-1 可以顯著縮短處理 X 射線顯微鏡數據所需的時間，并加速圖像處理的幀率。邊緣節點配備了高性能計算 (HPC)硬件，包括安裝在 CPU 服務器上的 NVIDIA GPU。這些邊緣設備或服務器旨在加速圖像處理和機器學習。

為實現實時處理，Diamond Light Source 采用分布式計算架構，其中包括多個邊緣節點和一個中央數據處理設施。

The chart depicts a canonical architecture for edge processing. On the left starting with data preparation, the data center supercomputer is employed to train the AI model which can be deployed at scale to edge sites next to sensor instruments such as light sheet microscopes, x-ray beam lines, radio telescopes, etc. The streaming data from the sensor instrument on the right is analyzed in edge servers that run AI at the edge. The edge server sends data back to the data center to assist with retraining the model. — *圖 11.數據中心到邊緣計算工作流在邊緣運行 AI*

邊緣節點位于 X 射線束線附近，負責處理生成的數據。然后，處理后的數據將發送到中央數據處理設施，在那里進行進一步分析和存儲。

基于 NVIDIA GPU 進行 AI 處理的邊緣 HPC 可以多種方式加速 X 射線顯微鏡數據處理。GPU 在并行處理大量圖像數據方面非常高效。在 X 射線顯微鏡檢查中， NVIDIA GPU 可用于加速降噪、圖像配準和圖像分割等任務。

機器學習算法(例如深度學習神經網絡)可用于分析 X 射線顯微鏡數據并提取有意義的信息。 NVIDIA GPU 非常適合加速這些算法的訓練和推理階段。

通過在流式 AI 框架中使用 NVIDIA Holoscan 進行圖像處理、機器學習、斷層成像重建和數據壓縮，邊緣 AI 處理可以顯著減少處理 X 射線顯微鏡數據所需的時間和資源，并加快實現科學突破。

借助 HPC 邊緣處理，Diamond Light Source 為科學家提供做出實時決策和加速研究所需的工具，從而朝著科學民主化邁出了重要的一步。

在 Diamond Light Source 使用 NVIDIA Holoscan 加速印刷工作流程

總結

相關資源

標簽

關于作者

在 Diamond Light Source 使用 NVIDIA Holoscan 加速印刷工作流程

總結

相關資源

標簽

關于作者

相關文章

NVIDIA Holoscan 和 RTI Connext 共同推動 AI 醫療設備的發展前景

相關文章

借助 NVIDIA DriveOS LLM SDK 簡化自動駕駛汽車應用的 LLM 部署

NVIDIA JetPack 6.2 為 NVIDIA Jetson Orin Nano 和 Jetson Orin NX 模塊引入超級模式

利用最新的 NVIDIA Isaac 版本推進機器人學習、感知和操控技術

AI 視覺技術助力綠色回收工廠智能化

2024 年 NVIDIA 6G 開發者日的 5 大關鍵收獲