RAPIDS in GPU
Nowadays, the increasing amount of data has made ETL (Extract, Transform, Load), also known as data analysis and processing, more complex and time-consuming. To address this issue, NVIDIA created RAPIDS
When it comes to data analysis and processing, we mostly think about Python, Pandas, SQL, Spark,… However, all these languages have a major drawback is they run on the CPU, which leads to slow data processing and inefficient use of computer resources. This is why RAPIDS was developed.
What is RAPIDS
RAPIDS is an open source software libraries and APIs give you the ability to execute end-to-end data science,analytics and machine learning pipelines entirely on GPU.
One great thing is that the syntax is completely similar to Pandas, NumPy, scikit-learn, and others.
In RAPIDS, the main libraries include:
- cuDF: like pandas but run in GPU
- cuML: like Sklearn but run in GPU
- cuGraph: like NetworkX but run in GPU
- cuSpatial: like GIS but run in GPU
Through the above images, we can see that RAPIDS demonstrates a superior speed compared to other libraries, and the great thing is that the accuracy remains unchanged.
One note is that in the "GPU in AI" series, I will only guide on the two main libraries: cuDF (Pandas on GPU) and cuML (Sklearn on GPU).
Set up
In Local
Here is the link to install RAPIDS
Please select as shown above, BUT remember to check which version of the CUDA toolkit you have by using the following command:
$nvcc -V
If you haven’t installed the CUDA toolkit yet, you can refer to the guide here.
One important note is that RAPIDS only supports Python versions 3.9, 3.10, and 3.11.
After installation, you can verify by using the following commands:
import cudf
cudf.__version__
import cuml
cuml.__version__
import cugraph
cugraph.__version__
import cuspatial
cuspatial.__version__
import cuxfilter
cuxfilter.__version__
In Google Colab
Change the runtime from CPU to GPU
run these commands:
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py
After installation, you can verify by using the following commands:
import cudf
cudf.__version__
import cuml
cuml.__version__
import cugraph
cugraph.__version__
import cuspatial
cuspatial.__version__
import cuxfilter
cuxfilter.__version__