ยง2025-01-30
Setting up the NVIDIA Jetson Orin Nano
$ mkdir -p build/src && cd $
$ wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-aarch64.sh
$ bash https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-aarch64.sh
- Setup conda env control
conda env remove --name <env_name>
$ conda create --name RStudio -c conda-forge r-base=4.2.3 python=3.10.16
- check jetpack 6.2 was installed correctly
(RStudio) alexlai@jetson:~/build/src$ apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Source: nvidia-jetpack (6.2)
Version: 6.2+b77
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.2+b77), nvidia-jetpack-dev (= 6.2+b77)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.2+b77_arm64.deb
Size: 29298
SHA256: 70553d4b5a802057f9436677ef8ce255db386fd3b5d24ff2c0a8ec0e485c59cd
SHA1: 9deab64d12eef0e788471e05856c84bf2a0cf6e6
MD5sum: 4db65dc36434fe1f84176843384aee23
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8
Package: nvidia-jetpack
Source: nvidia-jetpack (6.1)
Version: 6.1+b123
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.1+b123), nvidia-jetpack-dev (= 6.1+b123)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.1+b123_arm64.deb
Size: 29312
SHA256: b6475a6108aeabc5b16af7c102162b7c46c36361239fef6293535d05ee2c2929
SHA1: f0984a6272c8f3a70ae14cb2ca6716b8c1a09543
MD5sum: a167745e1d88a8d7597454c8003fa9a4
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8
(RStudio) alexlai@jetson:~/build/src$ ncc --version
bash: ncc: command not found
(RStudio) alexlai@jetson:~/build/src$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:14:07_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0
$ dpkg-query -l | grep cudnn
ii libcudnn9-cuda-12 9.3.0.75-1 arm64 cuDNN runtime libraries for CUDA 12.6
ii libcudnn9-dev-cuda-12 9.3.0.75-1 arm64 cuDNN development headers and symlinks for CUDA 12.6
ii libcudnn9-samples 9.3.0.75-1 all cuDNN samples
$ dpkg-query -l | grep tensorrt
ii tensorrt 10.3.0.30-1+cuda12.5 arm64 Meta package for TensorRT
ii tensorrt-libs 10.3.0.30-1+cuda12.5 arm64 Meta package for TensorRT runtime libraries
$ nvidia-smi
Thu Jan 30 10:34:15 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.4.0 Driver Version: 540.4.0 CUDA Version: 12.6 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Orin (nvgpu) N/A | N/A N/A | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | N/A N/A |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
utput of nvidia-smi:
- Driver Version: 540.4.0
- CUDA Version: 12.6
- This confirms that the JetPack installation has included the necessary NVIDIA drivers and CUDA version. However, the output also indicates that there are no running processes currently using the GPU, which is expected if you're not running any GPU-accelerated tasks at the moment.
- We will begin with checking CUDA:
$ cd ~/build
$ git clone https://github.com/NVIDIA/cuda-samples.git
$ cd cuda-samples/
$ make # had error but I did not copy it
$ cd bin/aarch64/linux/release/
$ ls
alignedTypes cdpQuadtree cuDLAHybridMode freeImageInteropNPP memMapIpc_kernel64.ptx simpleCUFFT
bandwidthTest cdpSimplePrint cuDLALayerwiseStatsHybrid globalToShmemAsyncCopy MersenneTwisterGP11213 simpleCUFFT_2d_MGPU
batchCUBLAS cdpSimpleQuicksort cuSolverDn_LinearSolver graphConditionalNodes MonteCarloMultiGPU simpleCUFFT_callback
batchedLabelMarkersAndLabelCompressionNPP conjugateGradient cuSolverRf graphMemoryFootprint nbody_opengles simpleCUFFT_MGPU
bf16TensorCoreGemm conjugateGradientCudaGraphs cuSolverSp_LinearSolver graphMemoryNodes newdelete simpleGLES
binaryPartitionCG conjugateGradientMultiBlockCG cuSolverSp_LowlevelCholesky histEqualizationNPP NV12toBGRandResize simpleGLES_EGLOutput
binomialOptions conjugateGradientMultiDeviceCG cuSolverSp_LowlevelQR HSOpticalFlow p2pBandwidthLatencyTest SobolQRNG
binomialOptions_nvrtc conjugateGradientPrecond deviceQuery immaTensorCoreGemm ptxjit stereoDisparity
BlackScholes conjugateGradientUM deviceQueryDrv jacobiCudaGraphs ptxjit_kernel64.ptx tf32TensorCoreGemm
BlackScholes_nvrtc convolutionFFT2D dmmaTensorCoreGemm jitLto quasirandomGenerator transpose
boxFilterNPP cudaCompressibleMemory dxtc LargeKernelParameter simpleCUBLAS UnifiedMemoryPerf
cannyEdgeDetectorNPP cudaGraphsPerfScaling fastWalshTransform lineOfSight simpleCUBLAS_LU warpAggregatedAtomicsCG
cdpAdvancedQuicksort cudaTensorCoreGemm FDTD3d matrixMulCUBLAS simpleCUBLASXT watershedSegmentationNPP
cdpBezierTessellation cuDLAErrorReporting FilterBorderControlNPP memMapIPCDrv simpleCudaGraphs
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Orin"
CUDA Driver Version / Runtime Version 12.6 / 12.6
CUDA Capability Major/Minor version number: 8.7
Total amount of global memory: 7620 MBytes (7989977088 bytes)
(008) Multiprocessors, (128) CUDA Cores/MP: 1024 CUDA Cores
GPU Max Clock rate: 1020 MHz (1.02 GHz)
Memory Clock rate: 918 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 167936 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.6, CUDA Runtime Version = 12.6, NumDevs = 1
Result = PASS
- After checking CUDA, we will check cuDNN:
cuDNN (CUDA Deep Neural Network Library)
- A high-performance GPU-accelerated library specifically optimized for deep learning.
- Built on top of CUDA, providing highly optimized implementations of neural network operations.
- Used by deep learning frameworks like PyTorch, TensorFlow, and MXNet to optimize GPU performance.
- ๐น Think of cuDNN as a deep learning-specific library that makes CUDA even faster for AI workloads.
(RStudio) alexlai@jetson:~/build$ cp -rv /usr/src/cudnn_samples_v9/ ./
(RStudio) alexlai@jetson:~/build$ cd cudnn_samples_v9/
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9$ ls
cmake CMakeLists.txt common conv_sample mnistCUDNN multiHeadAttention RNN_v8.0 samples_common.mk
$ sudo apt install libfreeimage3 libfreeimage-dev
mkdir build
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9$ cd build/
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/build$ cmake ..
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
CMake 3.27 or higher is required. You are running version 3.22.1
-- Configuring incomplete, errors occurred!
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/build$ cmake --version
cmake version 3.22.1
upgrade cmake version dowbload
$ sudo apt purge --autoremove cmake
$ wget https://github.com/Kitware/CMake/releases/download/v3.31.5/cmake-3.31.5.tar.gz
$ cd .. && tar xvf tar xvf src/cmake-3.31.5.tar.gz && cd cmake
$ ./bootstrap
...
CMake Error at Utilities/cmcurl/CMakeLists.txt:772 (message):
Could not find OpenSSL. Install an OpenSSL development package or
configure CMake with -DCMAKE_USE_OPENSSL=OFF to build without OpenSSL.
-- Configuring incomplete, errors occurred!
---------------------------------------------
Error when bootstrapping CMake:
Problem while running initial CMake
---------------------------------------------
$ sudo apt install -y libssl-dev
$ ./bootstrap
...
-- Checking for curses support
-- Checking for curses support - Failed
-- Looking for a Fortran compiler
-- Looking for a Fortran compiler - /usr/bin/f95
-- Performing Test run_pic_test
-- Performing Test run_pic_test - Success
-- Performing Test run_inlines_hidden_test
-- Performing Test run_inlines_hidden_test - Success
-- Configuring done (73.6s)
-- Generating done (1.7s)
-- Build files have been written to: /home/alexlai/build/cmake-3.31.5
---------------------------------------------
CMake has bootstrapped. Now run gmake.
$ gmake -j 6
$ sudo make install
$ which cmake
/usr/local/bin/cmake
(base) alexlai@jetson:~$ cmake --version
cmake version 3.31.5
- again,
(RStudio) alexlai@jetson:~/build$ cp -rv /usr/src/cudnn_samples_v9/ ./
(RStudio) alexlai@jetson:~/build$ cd cudnn_samples_v9/
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9$ ls
cmake CMakeLists.txt common conv_sample mnistCUDNN multiHeadAttention RNN_v8.0 samples_common.mk
$ sudo apt install libfreeimage3 libfreeimage-dev
mkdir build
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9$ cd build/
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/build$ cmake ..
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
CMake 3.27 or higher is required. You are running version 3.22.1
-- Configuring incomplete, errors occurred!
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/build$ cmake --version
cmake version 3.22.1
$ export CUDNN_INCLUDE_DIR=/usr/include
$ export CUDNN_LIBRARY_DIR=/usr/lib/aarch64-linux-gnu
$ export LD_LIBRARY_PATH=$CUDNN_LIBRARY_DIR:$LD_LIBRARY_PATH
$ export CMAKE_PREFIX_PATH=$CUDNN_LIBRARY_DIR:$CMAKE_PREFIX_PATH
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/build$ cmake ..
-- Found cuDNN: /usr/lib/aarch64-linux-gnu
CMake Error at mnistCUDNN/CMakeLists.txt:1 (add_subdirectory):
add_subdirectory given source "FreeImage" which is not an existing
directory.
-- Configuring incomplete, errors occurred!
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9$ cd multiHeadAttention/ <-- goin into each test???
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/multiHeadAttention$ ls
attn_ref.py CMakeLists.txt fp16_emu.h Makefile multiHeadAttention.cpp multiHeadAttention.h README.txt run_ref.sh
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/multiHeadAttention$ make -j6
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9$ cd mnistCUDNN/
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/mnistCUDNN$ make -j6
(RStudio) alexlai@jetson:~/build/cudnn_samples_v9/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 90300 , CUDNN_VERSION from cudnn.h : 90300 (9.3.0)
Host compiler version : GCC 11.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 8 Capabilities 8.7, SmClock 1020.0 Mhz, MemSize (Mb) 7619, MemClock 918.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.057344 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.099168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.106624 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.151552 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.487136 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 5.600992 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.195872 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.250912 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.522112 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.673504 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.148640 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 5.534592 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.043392 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.051936 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.091840 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.154528 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.227104 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.419616 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.186112 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.213440 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.247552 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.528576 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.568448 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.021696 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.099520 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.102912 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.108640 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.263328 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.268384 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.325536 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.212128 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.246432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.250432 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.528640 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.604032 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.146752 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.088832 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.092736 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.096160 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.227008 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.257088 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.421952 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.221248 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.233888 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.351680 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.530528 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.531360 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.021088 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
- Monitoring the Board To monitor the board, we will install jetson-stats:
sudo pip3 install -U jetson-stats
# --- reboot then jtop
To check your board details and version of different software using jtop, as well as the usage across its computing resources and power consumption, there are some Python scripts that use jtop. For example jtop_properties.py is a quick way to monitor the aforementioned.
- Case
For my board, I bought the Yahboom CUBE nano case. On their page, there are also tutorials and code for setting up the case and configuring the OLED screen that comes with it. Finally, there is also a GitHub repo associated with the case.
requirements
$ sudo apt-get install libopenblas-base libopenmpi-dev libomp-dev
$ sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libopenblas-dev libavcodec-dev libavformat-dev libswscale-dev
$ conda activate RStudio
$ pip install Cython
Collecting Cython
Downloading Cython-3.0.11-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (3.2 kB)
Downloading Cython-3.0.11-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.5 MB)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 3.5/3.5 MB 9.4 MB/s eta 0:00:00
Installing collected packages: Cython
Successfully installed Cython-3.0.11
$ cd build/src/
(RStudio) alexlai@jetson:~/build/src$ wget https://nvidia.box.com/shared/static/0h6tk4msrl9xz3evft9t0mpwwwkw7a32.whl -O torch-2.1.0-cp310-cp310-linux_aarch64.whl
$ ls
Anaconda3-2024.10-1-Linux-aarch64.sh cmake-3.31.5.tar.gz torch-2.1.0-cp310-cp310-linux_aarch64.whl
(RStudio) alexlai@jetson:~/build/src$ pip install numpy torch-2.1.0-cp310-cp310-linux_aarch64.whl
...
Installing collected packages: mpmath, typing-extensions, sympy, numpy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Successfully installed MarkupSafe-3.0.2 filelock-3.17.0 fsspec-2024.12.0 jinja2-3.1.5 mpmath-1.3.0 networkx-3.4.2 numpy-2.2.2 sympy-1.13.3 torch-2.1.0 typing-extensions-4.12.2
- Finally, we install torchvision:
(RStudio) alexlai@jetson:~/build/src$ cd ~/build
(RStudio) alexlai@jetson:~/build$ git clone --branch v0.16.1 https://github.com/pytorch/vision torchvision
$ cd torchvision/
(RStudio) alexlai@jetson:~/build/torchvision$ export BUILD_VERSION=0.16.1
(RStudio) alexlai@jetson:~/build/torchvision$ python setup.py install --user
Traceback (most recent call last):
File "/home/alexlai/build/torchvision/setup.py", line 9, in <module>
import torch
File "/home/alexlai/anaconda3/envs/RStudio/lib/python3.10/site-packages/torch/__init__.py", line 235, in <module>
from torch._C import * # noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
conda install conda-forge::cudnn
` conda install conda-forge::torchvision ``
pip install Pillow
- To test PyTorch, run the following:
import torch
print(torch.__version__)
print('CUDA available: ' + str(torch.cuda.is_available()))
print('cuDNN version: ' + str(torch.backends.cudnn.version()))
a = torch.cuda.FloatTensor(2).zero_()
print('Tensor a = ' + str(a))
b = torch.randn(2).cuda()
print('Tensor b = ' + str(b))
c = a + b
print('Tensor c = ' + str(c))
$ python3 test_PyTorch.py
2.5.1.post207
CUDA available: True
cuDNN version: 90300
/home/alexlai/build/src/test_PyTorch.py:7: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at /home/conda/feedstock_root/build_artifacts/libtorch_1735172299309/work/torch/csrc/tensor/python_tensor.cpp:78.)
a = torch.cuda.FloatTensor(2).zero_()
Traceback (most recent call last):
File "/home/alexlai/build/src/test_PyTorch.py", line 7, in <module>
a = torch.cuda.FloatTensor(2).zero_()
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ```
> ` conda install pytorch=2.5.1 cudatoolkit=11.8 -c pytorch `
import torch print(torch.version) # Check PyTorch version print(torch.cuda.is_available()) # Check if CUDA is available print(torch.version.cuda) # Check CUDA version PyTorch is using