Visualizing ResNet18 Attention with Class Activation Mapping (CAM) in PyTorch

1 minute read

Published:

Understanding why deep learning models make certain predictions has become just as important as achieving high accuracy. As Computer Vision systems continue to influence decisions in healthcare, autonomous driving, security, and edge AI, the demand for transparency has never been greater. Yet, modern architectures like ResNet18—designed for depth, abstraction, and performance—often operate as opaque black boxes.

This “Black Box” challenge sits at the heart of Explainable AI (XAI). While traditional models such as Logistic Regression or Decision Trees expose their decision boundaries explicitly, Deep Neural Networks learn rich, hierarchical representations that are mathematically powerful but difficult to interpret. Convolutional Neural Networks, for instance, transform an image through multiple layers of filtering, downsampling, and nonlinear activation, projecting it into a high-dimensional feature space. We see the pixels going in and the class scores coming out—yet the internal reasoning remains hidden.

Why This Matters: The Risk of Spurious Correlations

Without interpretability, models can latch onto shortcuts rather than true semantic cues. This issue—known as spurious correlation or shortcut learning—poses a real risk in practical deployments.

A well-known example is the Wolf vs. Husky classification case. A model confidently labeled wolves not because it learned animal morphology, but because most wolf images in the dataset contained snow. In effect, the model became a background classifier instead of an object detector.

To avoid such pitfalls, we need tools that let us peek inside the network’s decision-making process.

Enter CAM: A Window into Model Attention

Class Activation Mapping (CAM) provides exactly that. As a post-hoc interpretability technique, CAM projects the learned weights of the final classification layer back onto the convolutional feature maps. The result is a heatmap overlay highlighting where the model is focusing its attention for a given prediction.

In practice, this means we can visualize which regions of the image contributed most to ResNet18’s output—revealing whether the model is truly looking at meaningful features or being misled by the background.

Paired with PyTorch’s flexible API and modular architecture, implementing CAM becomes remarkably straightforward, making it a powerful addition to any computer vision practitioner’s toolkit.

👉 Read more: Visualizing ResNet18 Attention with CAM in PyTorch