• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Data Science

    NVIDIA RAPIDS 24.10: ?? ?? ?? ???? NetworkX, UMAP ? cuDF-Pandas ???? ??

    Reading Time: 5 minutes

    RAPIDS v24.10 ??? ??? ???? ????? ?? ???? ????, ?? ??? ??? ??? ?? ? ?? ??????. ? ??? ?????? ??? ?? ??? ??? ?? ????? ????:

    • ?? ?? ?? ???? NetworkX: ?? ?? ??(GA) ??
    • Polars GPU ??: ?? ?? ?? ??
    • GPU ????? ? ??? ??? UMAP ??
    • NumPy ? PyArrow?? cuDF ??? ??? ??
    • GitHub ?? CI ???? GPU ??? ?? ??
    • Python 3.12 ? NumPy 2.x? ?? RAPIDS? ?? ??

    ?? ?? ?? ???? NetworkX

    NetworkX ???? ???? cuGraph? NetworkX 3.4?? v24.10 ????? GA? ?????. ?? ???? GPU ?? ??? ??, ??? ??? ??, ??? ??? ???? ?? ???????.

    ??? ?? ???? NetworkX ????? ??? ??? ???, ?? CPU? GPU ?? ???? ??? ??? ? ?? ??? ??? ??????? ?????.

    NX_CUGRAPH_AUTOCONFIG ?? ??? True? ???? NetworkX ??? ??? ???? ? ????.

    %env NX_CURGAPH_AUTOCONFIG=True
     
    import pandas as pd
    import networkx as nx
     
    url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv"
    df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32")
    G = nx.from_pandas_edgelist(df, source="src", target="dst")
     
    %time result = nx.betweenness_centrality(G, k=10)

    ????? ???? ?? betweenness centrality, PageRank ?? ?? ????? ? ? ????? ?? 10?, 50?, ??? 500??? ??? ?? ? ????.

    ?? 1. ?? ??? ?? ???(?? 400? ?, ?? 1600? ?)?? ???? PageRank ????? CPU?? NetworkX?? 70? ????. SW: NetworkX 3.4.1, cuGraph/nx-cugraph 24.10; GPU: NVIDIA A100 80GB; CPU: Intel Xeon w9-3495X (56 ??) 250GB
    ?? 2. ??? ?? ?? ????(5?? ??, 6?9?? ??)?? ???? ?? ??? ????? ?? ?(k)? 100?? ??? ?? CPU?? NetworkX?? 485? ????. SW: NetworkX 3.4.1, cuGraph/nx-cugraph 24.10; GPU: NVIDIA A100 80GB; CPU: Intel Xeon w9-3495X (56 ??) 250GB

    cuGraph? ???? NetworkX? ?? ? ??? ??? ???? ??? ? ???, ?? ???? ??? ???? ?????.

    ?? ?? ?? ???? Polars ?? ??

    cuDF? ???? Polars GPU ??? 9?? ?? ??? ???????. ?? ?? ???? ?? ?? ??? CPU ?? ???? ?? 13? ? ?? ?????? ??? ? ????.

    ?? 3. PDS-H ?????? 22?? ?? ??? ?? ?? ?? ?? ?? 4?????. RAPIDS cuDF ?? Polars GPU ??? ??? ??? ? ?? ??? ?? ???? CPU? ?? ?? 13? ?? ??? ?????.

    PDS-H ???? ??? ?? 80 | GPU: NVIDIA H100 | CPU: Intel Xeon W9-3495X (Sapphire Rapids) | ????: ?? NVMe. ??: PDS-H? TPC-H?? ??????, ? ??? TPC-H ??? ??? ? ????.

    Polars Lazy API? ?? ????, ???? ??? ??? ? `engine` ???? ???? GPU? `collect`??? Polars? ??? ? ????.

    import polars as pl
     
    df = pl.LazyFrame({"a": [1.242, 1.535]})
    q = df.select(pl.col("a").round(1))
    result = q.collect(engine="gpu")

    ??? ??? NVIDIA? Polars ?? ???? Polars GPU ?? ??? ????? Google Colab ????? ?? ???? ???.

    GPU ????? ? ??? ??? UMAP ??

    v24.10 ????? cuML? UMAP ????? ?? ???? ??? ?? ??? ????? GPU ????? ? ??? ?? ??? ?????. ??? ?? ?? ?? ?? ????? ???? ????? ?? ??? ??? CPU ???? ??????, ??? ??? GPU?? ???? ?? ??? ????? ?? KNN ???? ??? ? ????.

    ???? ??? `nnd_n_clusters` ???? 1?? ? ?(???)?? ???? (??? ??) `data_on_host=True` ???? `fit` ?? `fit_transform`? ???? ? ??? ??? ??? ??? ? ????.

    from cuml.manifold import UMAP
    import numpy as np
     
    # Generate synthetic data using numpy (random float32 matrix)
    X = np.random.rand(n_samples, n_features).astype(np.float32)
     
    # UMAP parameters
    num_clusters = 4  # Number of clusters for NN Descent batching, 1 means no clustering
    data_on_host = True  # Whether the data is stored on the host (CPU)
     
    # UMAP model configuration
    reducer = UMAP(
        n_neighbors=10,
        min_dist=0.01,
        build_algo="nn_descent",
        build_kwds={"nnd_n_clusters": num_clusters},
    )
     
    # Fit and transform the data
    embeddings = reducer.fit_transform(X, data_on_host=data_on_host)

    ???? n_clusters? ???(?: 4)?? ???? GPU ??? ???? ???? ? ??? ?? ?? ? ????. ?? ?? ?? ???? ??? ??? ?? ???? ?? ?? ????? ??? ? ????, ??? ?? ??? ?? ??? GPU ???? ???? ??? ?? ?? ????.

    cuDF ?? pandas ??? ??? ??

    ?? ??? ??

    cuDF? pandas ?? ??? ?? NumPy ??? ???? ?????. ???? Python isinstance ??? ???? cuDF pandas? ??? ?? NumPy ??? ?? False? ????? ?? pandas? ??? ?? True? ??????. ??? ???? ?? ??? ???? ??? ?? ??? ?????? ???? ???? ?? ?? ??? ?????.

    v24.10??, ??? ??? ????? ?? ???? DataFrame ?? ?? ??? ????? ? ?, cudf.pandas? ?? ????? ??? NumPy ??? ???? ? ??? ?????. ?? ?? ??? ????.

    %load_ext cudf.pandas
    import pandas as pd
    import numpy as np
     
    arr = pd.Series([1, 2, 3]).values # now returns a true numpy array
    isinstance(arr, np.ndarray) # returns True

    ? ???? ?? NumPy C API? ???? ??? cuDF pandas? ???? ??? ? ?? ?????.

    Arrow ??? ??

    cuDF? ?? ??? PyArrow ??? ?????. Arrow ???? cuDF ????? ???? ????????. ???? cuDF? ?? ???? Arrow C++ API ??? ??? ?? ???? ??? ?? ?? ??? ?? ??? Arrow ???? ???? ?????.

    ?? ?????? ??? ??? Arrow C ??? ????? ???? ?? ???? Arrow C++ ??? ??? ??? ? ?????. ??? ??? ??, cuDF Python? ?? PyArrow 14 ??? ?? PyArrow ??? ??? ? ?? ?????.

    GitHub ?? CI ???? GPU? ???? ?? ??

    ?????? GPU? GitHub ?? CI ???? ???? ???? ???? ??? ?? ?? ???? ??? ?????. scikit-learn ?? ??? ???? ?? ????? ???? ?? ??? ??? RAPIDS Deployment ??? ???????.

    GitHub Actions? ?? ???? GPU ??? ?????. ??? GitHub? ?? ????? ???? ?? CI ?? ???? NVIDIA GPU? ??? ? ??? ?? ?????. ??? ????? RAPIDS ?????? ???? ?? ??? ?? GPU ???? ?? ????? ????? ?? ?? ??????.

    GPU ??? ??? GitHub Action ?? ??? ???? ?? ????. GPU? ???? ??? ????? ?? ? ??? ??? ??, ????? ??? ???? ?? ?? ?? ??? ??? ? ????.

    GPU ??? ????? ?? ??? GitHub Actions ???? ???? ? ??? ?????. ?? ?? NVIDIA ??? ???? ???? ??? GPU ?? VM?? ???? ??? GPU? ?????.

    ?? 4. ??? GitHub Actions GPU ??? ???? ?? ?? ?????.

    ?? ?? runs-on ??? ???? ??? ??? ????? ?????? ??? ? ????.

    name: GitHub Actions GPU Demo
    run-name: ${{ github.actor }} is testing out GPU GitHub Actions
    on: [push]
    jobs:
      gpu-workflow:
        runs-on: linux-nvidia-gpu
        steps:
          - name: Check GPU is available
            run: nvidia-smi

    GitHub Actions GPU ?? ????? ??? ?? ??? ??? RAPIDS ?? ??? ?????. ? ???? GPU CI? ??? ?? ?? ??? ???? ????.

    scikit-learn ????? ?? GitHub Actions? GPU ??? ???? ???? ???? ??? PR?? GPU ?????? ???? ???????. ??? ???? ???? ??? ??? ?????.

    RAPIDS ??? ????

    2024? 10?, RAPIDS ???? ?? ?? ??? ?????? ?? ??? ?? ??? ? ??? ? ?? ??? ????? ??????. ? ???? ?? Python 3.10-3.12? NumPy 1.x ? 2.x? ?????. ??, ?? fmt 11? spdlog 1.14? ?????. ? ????? ??? ?? conda-forge? ????? ???? ????. ??? ??? ????, ?? ?????? Python 3.9 ?? NCCL 2.19 ?? ??? ?? ??? ?????.

    ??

    RAPIDS 24.10 ???? ??? ???? ????? ?? ???? ?? ?? ??? ? ??? ?? NVIDIA? ??? ? ?? ? ??? ? ?? ?????.

    RAPIDS? ?? ????? ??? ? ???? ???? ??? ???.

    ?? ??

    Discuss (0)
    0

    Tags

    人人超碰97caoporen国产