To achieve state-of-the-art machine learning (ML) solutions, data scientists often build complex ML models. However, these techniques are computationally expensive, and until recently required extensive background knowledge, experience, and human effort. Recently, at GTC 21, AWS Senior Data Scientist Nick Erickson gave a session sharing how the combination of AutoGluon, RAPIDS…
]]>Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming. However with the growth in importance, the growth in scale of industry datasets, and more sophisticated models, the bar has been raised for computational resources required for recommendation systems. To meet the computational demands…
]]>Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming. However with the growth in importance, the growth in scale of industry datasets, and more sophisticated models, the bar has been raised for computational resources required for recommendation systems. To meet the computational demands…
]]>Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming. However with their growth in importance, the growth in scale of industry datasets, and more sophisticated models, the bar has been raised for computational resources required for recommendation systems. After NVIDIA introduced Merlin…
]]>With the growing interest in deep learning (DL), more and more users are using DL in production environments. Because DL requires intensive computational power, developers are leveraging GPUs to do their training and inference jobs. Recently, as part of a major Apache Spark initiative to better unify DL and data processing on Spark, GPUs became a schedulable resource in Apache Spark 3.
]]>Kaggle is an online community that allows data scientists and machine learning engineers to find and publish data sets, learn, explore, build models, and collaborate with their peers. Members also enter competitions to solve data science challenges. Kaggle members earn the following medals for their achievements: Novice, Contributor, Expert, Master, and Grandmaster. The quality and quantity of…
]]>Apache Spark has emerged as the standard framework for large-scale, distributed, data analytics processing. NVIDIA worked with the Apache Spark community to accelerate the world’s most popular data analytics framework and to offer revolutionary GPU acceleration on several leading platforms, including Google Cloud, Databricks, and Cloudera. Now, Amazon EMR joins the list of leading platforms…
]]>At GTC Spring 2020, Adobe, Verizon Media, and Uber each discussed how they used Spark 3.0 with GPUs to accelerate and scale ML big data pre-processing, training, and tuning pipelines. There are multiple challenges when it comes to the performance of large-scale machine learning (ML) solutions: huge datasets, complex data preprocessing and feature engineering pipelines…
]]>Apache Spark continued the effort to analyze big data that Apache Hadoop started over 15 years ago and has become the leading framework for large-scale distributed data processing. Today, hundreds of thousands of data engineers and scientists are working with Spark across 16,000+ enterprises and organizations. One reason why Spark has taken the torch from Hadoop is because it can process data…
]]>Given the parallel nature of many data processing tasks, it’s only natural that the massively parallel architecture of a GPU should be able to parallelize and accelerate Apache Spark data processing queries, in the same way that a GPU accelerates deep learning (DL) in artificial intelligence (AI). NVIDIA has worked with the Apache Spark community to implement GPU acceleration through the…
]]>