The NVIDIA Jetson Nano is one of the best entry-level edge AI boards available — 128-core Maxwell GPU, 4 GB LPDDR4, and full CUDA support, all for under $100. Pair it with YOLOv8 exported to TensorRT and you get real-time object detection at 30 FPS without a cloud dependency. This guide covers the full pipeline from a fresh JetPack install to a live camera detection stream.
Prerequisites
You'll need the following hardware and software before starting:
- NVIDIA Jetson Nano Developer Kit B01 (2 GB or 4 GB)
- MicroSD card — 64 GB minimum (Class 10 / U3 recommended)
- USB webcam or Raspberry Pi Camera v2 (CSI)
- JetPack 4.6.x installed (includes CUDA 10.2, cuDNN 8, TensorRT 8)
- Python 3.8 environment
- Active internet connection for package installation
| Component | Version Used | Notes |
|---|---|---|
| JetPack | 4.6.4 | Latest stable for Nano B01 |
| TensorRT | 8.2.1 | Bundled with JetPack 4.6.x |
| Ultralytics | 8.0.x | pip install ultralytics |
| OpenCV | 4.5.x | Pre-installed with JetPack |
| PyTorch | 1.11 (aarch64) | Jetson-specific wheel |
Step 1 — Flash JetPack to MicroSD
Download the JetPack 4.6.4 image from the NVIDIA Developer site. Flash it to your microSD using Balena Etcher (free, cross-platform).
- Open Etcher → Select image (.zip is fine, Etcher extracts automatically)
- Select your MicroSD drive — double-check the target!
- Click Flash. The process takes 5–10 minutes.
- Insert the card into the Jetson Nano, connect a monitor, keyboard, and power.
- Complete the initial Ubuntu 18.04 setup wizard.
Step 2 — Install Python Dependencies
Once booted, open a terminal and install the required packages. Start with the Jetson-specific PyTorch wheel, then install Ultralytics.
// Install PyTorch for Jetson (aarch64)# Update system packages first
sudo apt-get update && sudo apt-get upgrade -y
# Install pip and virtualenv
sudo apt-get install -y python3-pip python3-venv libopenblas-dev
# Create virtual environment
python3 -m venv ~/yolo-env
source ~/yolo-env/bin/activate
# Install PyTorch wheel for JetPack 4.6 (aarch64 CUDA 10.2)
pip install --no-cache \
https://developer.download.nvidia.com/compute/redist/jp/v46/pytorch/torch-1.11.0a0+17540c5-cp36-cp36m-linux_aarch64.whl
# Verify CUDA is available
python3 -c "import torch; print(torch.cuda.is_available())"
# Expected output: True
// Install Ultralytics YOLOv8
pip install ultralytics
# Install torchvision from source (required for Jetson)
sudo apt-get install -y libjpeg-dev zlib1g-dev
git clone --branch v0.12.0 https://github.com/pytorch/vision
cd vision
python3 setup.py install --user
cd ..
Step 3 — Export YOLOv8 to TensorRT
Ultralytics handles the TensorRT export natively. We'll export yolov8n (nano — smallest and fastest) with INT8 quantization for maximum throughput on the Jetson GPU.
from ultralytics import YOLO
# Load pretrained YOLOv8 nano model
model = YOLO("yolov8n.pt")
# Export to TensorRT with INT8 quantization
# 'data' points to a calibration dataset (COCO val subset works well)
model.export(
format="engine",
imgsz=640,
half=False, # INT8 overrides FP16
int8=True,
data="coco128.yaml", # calibration dataset
workspace=4, # GB of GPU workspace
verbose=True,
)
print("TensorRT engine saved: yolov8n.engine")
.engine file.
Step 4 — Camera Setup
The Jetson Nano supports both USB webcams (plug-and-play via V4L2) and the Raspberry Pi Camera v2 (CSI, requires GStreamer pipeline).
// camera_utils.py — Camera capture helperimport cv2
def open_camera(cam_index=0, width=640, height=480, fps=30):
"""Open USB webcam via V4L2."""
cap = cv2.VideoCapture(cam_index)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
cap.set(cv2.CAP_PROP_FPS, fps)
return cap
def open_csi_camera(width=640, height=480, fps=30, flip=0):
"""Open CSI camera via GStreamer pipeline."""
pipeline = (
f"nvarguscamerasrc ! "
f"video/x-raw(memory:NVMM), width={width}, height={height}, "
f"format=NV12, framerate={fps}/1 ! "
f"nvvidconv flip-method={flip} ! "
f"video/x-raw, width={width}, height={height}, format=BGRx ! "
f"videoconvert ! "
f"video/x-raw, format=BGR ! appsink"
)
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
return cap
Step 5 — Real-time Inference Pipeline
With the engine exported and camera confirmed, here is the full inference loop. It reads frames, runs YOLOv8 inference, draws bounding boxes, and displays the annotated frame in a window.
// detect.py — Full real-time detection loopimport cv2
import time
from ultralytics import YOLO
from camera_utils import open_camera
# Load TensorRT engine (device-specific, runs on GPU)
model = YOLO("yolov8n.engine", task="detect")
cap = open_camera(cam_index=0, width=640, height=480, fps=30)
if not cap.isOpened():
raise RuntimeError("Cannot open camera")
# Colour palette for classes
COLOURS = {
"person": (0, 255, 136),
"car": (255, 107, 0),
"default": (0, 245, 255),
}
prev_time = time.time()
while True:
ret, frame = cap.read()
if not ret:
break
# Run inference (returns list of Results objects)
results = model(frame, conf=0.5, iou=0.45, verbose=False)
# Draw detections
for result in results:
for box in result.boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0])
cls_name = model.names[int(box.cls[0])]
conf = float(box.conf[0])
color = COLOURS.get(cls_name, COLOURS["default"])
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
label = f"{cls_name} {conf:.2f}"
cv2.rectangle(frame, (x1, y1 - 18), (x1 + len(label) * 9, y1), color, -1)
cv2.putText(frame, label, (x1 + 4, y1 - 5),
cv2.FONT_HERSHEY_PLAIN, 0.9, (0, 0, 0), 1)
# FPS counter
now = time.time()
fps = 1.0 / (now - prev_time)
prev_time = now
cv2.putText(frame, f"{fps:.1f} FPS", (frame.shape[1] - 100, 24),
cv2.FONT_HERSHEY_PLAIN, 1.2, (0, 255, 136), 2)
cv2.imshow("YOLOv8 — Jetson Nano", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Step 6 — Publish Detections via MQTT
For edge-to-cloud integration, we publish detection results as JSON to an MQTT broker. This adds only ~1 ms of overhead per frame.
// mqtt_publisher.py — Send detections to brokerimport json
import time
import paho.mqtt.client as mqtt
BROKER = "your-mqtt-broker.example.com"
PORT = 1883
TOPIC = "jetson/detections"
CLIENT_ID = "jetson-nano-01"
client = mqtt.Client(client_id=CLIENT_ID)
client.connect(BROKER, PORT, keepalive=60)
client.loop_start()
def publish_detections(results, model_names, fps):
"""Build JSON payload and publish."""
detections = []
for result in results:
for box in result.boxes:
detections.append({
"class": model_names[int(box.cls[0])],
"confidence": round(float(box.conf[0]), 3),
"bbox": [round(v, 1) for v in box.xyxy[0].tolist()],
})
payload = {
"timestamp": time.time(),
"fps": round(fps, 1),
"count": len(detections),
"detections": detections,
}
client.publish(TOPIC, json.dumps(payload), qos=0)
Performance Benchmarks
Measured on Jetson Nano 4 GB B01, JetPack 4.6.4, USB webcam at 640×480.
| Model | Format | FPS | Inference (ms) | mAP50 |
|---|---|---|---|---|
| YOLOv8n | PyTorch FP32 | 7–9 | 110–140 | 0.887 |
| YOLOv8n | TensorRT FP16 | 18–22 | 45–55 | 0.885 |
| YOLOv8n | TensorRT INT8 | 28–32 | 31–36 | 0.878 |
| YOLOv8s | TensorRT INT8 | 14–18 | 55–70 | 0.917 |
INT8 quantization delivers 4× the throughput of PyTorch FP32 with only a 1% drop in mAP50 — an excellent trade-off for edge deployment.
Troubleshooting
| Problem | Likely Cause | Fix |
|---|---|---|
torch.cuda.is_available() returns False |
Wrong PyTorch wheel or CUDA path | Re-install the aarch64 wheel matching your JetPack version |
| Export fails with OOM error | GPU workspace too large | Set workspace=2 (2 GB) in the export call |
| Camera shows green frame only | GStreamer pipeline mismatch | Use USB camera with open_camera() instead of CSI pipeline |
| FPS stuck at 7–9 | Running PyTorch model, not engine | Ensure you load yolov8n.engine not yolov8n.pt |
| Jetson freezes under load | Insufficient power supply | Use 5V/4A barrel jack, set J48 jumper, enable MAXN power mode |
Next Steps
With a working real-time detection pipeline, you can extend the project in several directions:
- Custom classes: Train YOLOv8 on your own labeled dataset using Roboflow, then export to TensorRT
- RTSP streaming: Replace
cv2.imshowwith a GStreamer RTSP sink to stream annotated video over the network - Multi-camera: Run two capture threads and merge detections before publishing
- AWS Rekognition fallback: Send low-confidence crops to the cloud for a second opinion
- DeepStream: NVIDIA's DeepStream SDK gives a production-grade multi-sensor analytics pipeline with built-in TensorRT acceleration