Data Structures

In this section, we will look at CNN architectures that have significantly advanced the field of image classification, but what is a Convolutional Neural Network (CNN)?

Imagine you're looking at a massive Where's Waldo? picture. Your eyes don't look at the whole image at once, you scan small patches, looking for Waldo's striped shirt, glasses or hat. CNNs work in a similar way with images.

CNNs are a type of neural network especially good at understanding images. They do this by scanning small sections (patches) of an image, looking for patterns (like edges, colors or shapes), and then combining these patterns to recognize more complex features (like faces, cars or cats).

Think of a CNN as a team of detectives:

  • The first detective looks for simple clues (edges, lines).
  • The next detective uses those clues to find more complex features (corners, shapes).
  • Later detectives piece together those features to recognize objects (a cat's face, a car's wheel).

Why are CNNs special?

Traditional neural networks treat every pixel as equally important, like reading a book by looking at every letter individually, very inefficient! CNNs, on the other hand, focus on local patterns and then build up to the big picture, which is much more like how humans see.

Let's compare some famous CNN models. Think of them as different "detective teams", each with their own style:

AlexNet

  • The Pioneer: Like the first detective to use a magnifying glass.
  • Why it matters: Won a big image recognition contest in 2012, proving CNNs could beat older methods.
  • How: Uses several layers to find patterns, with some tricks like “dropout” to avoid overfitting.

VGG

  • The Methodical Detective: Always uses the same small magnifying glass, but stacks many layers.
  • Why it matters: Showed that making networks deeper (more layers) could improve accuracy.
  • How: Uses lots of simple 3x3 filters (patches), making the network deep and uniform.

ResNet

  • The Detective with a Shortcut: Sometimes skips steps if the clues are already clear.
  • Why it matters: Solved the problem where deeper networks became harder to train (they forgot what they learned).
  • How: Introduced “skip connections” or “residuals,” allowing the network to bypass some layers if needed.

MobileNet

  • The Detective on a Bicycle: Fast, lightweight, and efficient—perfect for mobile devices.
  • Why it matters: Designed to run on phones and embedded devices, not just big computers.
  • How: Uses clever tricks to reduce the number of calculations, making it small and fast.
Model Key Feature Analogy Best For
AlexNet First deep CNN success Magnifying glass pioneer General image recognition
VGG Deep, uniform layers Methodical stacking High accuracy, research
ResNet Skip connections Detective with shortcuts Very deep networks
MobileNet Lightweight, efficient Detective on a bicycle Phones, embedded devices

Why do you care about this in Python?

When you use Python libraries for AI (like TensorFlow or PyTorch), you can use these models as building blocks, just like picking the right detective team for your case!

Working with Dictionaries

In the dictionary below, how can we find the model with the highest accuracy and store the name in a variable called best_model?

model_accuracies = { "ResNet": 0.91, "AlexNet": 0.85, "VGG": 0.88, "Inception": 0.92, "MobileNet": 0.89 }

One solution might be to go for list comprehension, let's break down what that might look like and see if it's the best way.

best_model = "".join( [k for k, v in models.items() if v == max(models.values())] )

[k for k, v in models.items() if v == max(models.values())] creates a list of all keys (model names) whose value equals the maximum value in the dictionary and " ".join([...]) concatenates these keys into a single string.

If two or more models have the same highest accuracy, we'll get their names stuck together as one long string, like "ModelA ModelB".

It's also inefficient to call max(models.values()) inside the list comprehension for every item (we'll dig deeper on the next page).

A clearer and more efficient way is to use max() with the key argument:

best_model = max(models, key=models.get)

This returns the key (model name) with the highest value (accuracy). If there are ties, it returns the first one found.

Why does this work?

Think of your dictionary as a group of runners (models) with their finish times (accuracies). You want the name of the runner who finished fastest. max(models, key=models.get) says, "look at each runner's time, and hand me the name of the one with the best time."

What if we have nested dictionaries, how do we find the heaviest element in the dictionary below?

models = { "ResNet": { "layers": 50, "accuracy": 0.91, "type": "CNN", "is_lightweight": False }, "MobileNet": { "layers": 28, "accuracy": 0.89, "type": "CNN", "is_lightweight": True } }

The max(models, key=modelss.get) won't work here, because models.get just returns the inner dictionary, not the accuracy itself.

We can think of the key function as a little robot we send in to fetch the value we want to compare. In this case, we want the robot to go two steps deep, first grab the inner dictionary, then grab the "accuracy" from that.

best_model = max( models, key=lambda model: models[model]["accuracy"] )

Why does this work?

max() goes through each key (the model names). For each key, the lambda function looks up the model's dictionary, then grabs the "accuracy" value and max() picks the key where this value is the largest.

Imagine you have a row of boxes (the models), each with a smaller box inside (the dictionary), and inside that, a gold coin with a value. You send your robot to open each big box, look inside the small box and report the value on the coin. You want the box with the hightest value.

If you want the actual dictionary for the best model, you can do: models[best_model].