You’re happily training your deep learning model using PyTorch. Everything seems fine, your code runs on GPU, and you’re watching batch after batch go through. Then out of nowhere…
“RuntimeError: CUDA error: device-side assert triggered”
Oof. That one hurts. It’s one of those errors that looks scary, especially the first time you see it. But don’t panic! In this article, we’ll break it down, understand what’s causing it, and work through real solutions.
💡 What Does This Error Mean?
This error happens when there’s something wrong happening on the GPU — specifically during kernel execution. It’s like PyTorch told the CUDA GPU to do something illegal or invalid, and CUDA raised a red flag.
The problem could be anything from the wrong input parameters to index errors to bad labels.
🎯 Common Causes
Let’s break down the most common villains behind this error:
- Incorrect class index in classification (like passing label 5 when your model only outputs 5 classes indexed 0-4)
- Bad tensor shapes or dimensions
- Invalid indexing into tensors
- Incorrect device placement
- Using .item() or .numpy() on GPU tensors which don’t behave nicely during debugging
Let’s understand each one with examples and how to fix them.
🧟♂️ Cause #1: Wrong Class Labels
This is the number one reason for the device-side assert error. Let’s assume you’re doing classification with CrossEntropyLoss.
PyTorch’s CrossEntropyLoss
expects class targets as integers from 0 to num_classes – 1.
For example, with 5 classes, valid labels are 0, 1, 2, 3, 4.
But imagine your labels look like this:
labels = torch.tensor([0, 1, 2, 5])
Oops. 5 is out of range. That’s when CUDA says, “Nope, not doing this!” and throws up the device-side assert.
Solution:
- Double-check your labels
- Print unique values using
print(torch.unique(labels))
- Make sure all values are within
[0, num_classes - 1]

🧪 Cause #2: Tensor Shape Mismatch
Sometimes, the input or output shapes are not what PyTorch expects. You might be passing your model’s outputs with the wrong shape into the loss function.
Example of correct shapes for classification:
- Output:
[batch_size, num_classes]
- Targets:
[batch_size]
But if you pass this:
outputs: [batch_size]
labels: [batch_size]
Boom 💥 — device-side assert strikes again!
Solution:
- Print the shapes before feeding them to loss
- Check with
print(output.shape, labels.shape)
- Ensure output shape is
[B, C]
and label shape is[B]
⚙️ Cause #3: Invalid Indexing
Let’s say you’re trying to index a tensor using a variable that was accidentally set too high.
x = torch.randn(10)
print(x[15]) # Invalid index
This will trigger an error on the GPU if you do it there.
Solution: Always validate your indices before using them.
- Use
if index < tensor.size(0)
to check - Add assertions to catch bad conditions early
🪛 Cause #4: .item() or .numpy() on GPU Tensors
Trying to convert a GPU tensor directly to a numpy array or scalar can cause problems, especially during debugging a failing model.
value = torch.tensor([10.0], device="cuda")
value.numpy() # Will throw an error!!
Solution: Always move tensors to CPU before calling numpy or item:
value = value.cpu().numpy()
🦺 How to Debug Better
One of the hardest parts about this error is that the actual line causing it isn’t always clear.
Once an assert is triggered on the GPU, CUDA just stops working and PyTorch may not show the exact traceback.
But there’s a trick! Run your code on CPU temporarily.
Step-by-step debug method:
- Set
device = torch.device('cpu')
- Re-run your training step
- You’ll get a much clearer traceback
The CPU will helpfully tell you things like “expected class in range of 0 to num_classes – 1” or “mismatch in shapes” etc.
🧼 Clean Coding Practices to Avoid This Error
Want to avoid this error entirely? Here are some habits that help:
- Validate your datasets before training
- Use asserts or try/except to test data shapes and ranges
- Log shapes at key steps in your model and training loop
- Don’t assume — print/check everything at least once!
- Start on CPU and move to GPU only after things run smoothly
🛡️ Utility Functions You Can Use
Here are some quick helpers to make your life easier:
def check_labels(labels, num_classes):
labels = labels.detach().cpu()
unique_vals = torch.unique(labels)
assert torch.all((unique_vals >= 0) & (unique_vals < num_classes)), "Labels out of range!"
You can call it in your training loop like this:
check_labels(labels, num_classes=5)
It’s better to have AssertionError than a mysterious CUDA crash!
🙌 Final Thoughts
This error may look intimidating, but it’s actually a helpful friend. It tells you something is off — usually your labels or shapes.
To summarize:
- Check your class labels — are they in range?
- Are your output and label shapes correct?
- Are you indexing tensors safely?
- Run your code on CPU to catch the real error
The next time you see RuntimeError: CUDA Error: Device-Side Assert Triggered, smile. You’ve got this.
Keep calm, debug smart, and happy coding! 🚀