Control Flow
Managing Learning Rates
and updating Model Parameters
are essential tasks in Machine Learning, especially when training neural networks. Let's clarify how learning rate decay works, using a simple example.
# Initialize variables
initial_learning_rate = 0.1
decay_factor = 0.9
epochs = 5
current_lr = initial_learning_rate
current_epoch = 0
while current_epoch < epochs:
current_epoch += 1
current_lr *= decay_factor
print(current_lr)
The learning rate is usually decayed multiplicatively after each epoch. So, after each epoch, the current learning rate becomes the previous learning rate multiplied by the decay factor.
Here's how it plays out, step by step:
- Epoch 0 (start): LR = 0.1
- Epoch 1: LR = 0.1 × 0.9 = 0.09
- Epoch 2: LR = 0.09 × 0.9 = 0.081
- Epoch 3: LR = 0.081 × 0.9 = 0.0729
- Epoch 4: LR = 0.0729 × 0.9 = 0.06561
- Epoch 5: LR = 0.06561 × 0.9 = 0.059049
The learning rate after n epochs is: \( \text{LR } = \text{Initial LR } \times \text{DF}^n \)
So, after 5 epochs: \( \text{LR } = 0.1 \times 0.9^5 \approx 0.05905 \)
Let's continue to apply gradient updates to model parameters, simulating the weight adjustment process that occurs during neural network training.
Why do we update parameters?
Think of training a neural network like hiking down a mountain in the fog, trying to reach the lowest point (the minimum of the loss function). The parameters are your current position, and the gradients are like a compass telling you which direction is uphill, so you want to step in the opposite direction, downhill.
The learning rate is how big a step you take each time. Too big, and you might trip and fall past the valley, too small, and you'll inch along forever.
List out what we have:
- Model Parameters = [0.5, 1.5, -0.5]
- Gradients = [0.1, -0.2, 0.05]
- Learning Rate = 0.01
Each parameter has a corresponding gradient and the classic update rule for each parameter is:
\( \text{New Parameter} = \text{Current Parameter} - \text{Learning Rate} \times \text{Gradient} \)
Why the minus? Because the gradient points uphill, and you want to go downhill.
Imagine you're adjusting the knobs on a radio to tune into the clearest signal (the lowest loss). Each knob (parameter) has a hint (gradient) about which way to turn it, and you decide how much to turn (learning rate). You nudge each knob a little, based on the advice from the gradient and repeat until you get the best sound.
# Model Parameters
parameters = [0.5, 1.5, -0.5]
# Gradients
gradients = [0.1, -0.2, 0.05]
# Learning Rate
learning_rate = 0.01
# Update each parameter
for i in range(len(parameters)):
parameters[i] -= learning_rate*gradients[i]
print(parameters)
# Output: [0.499, 1.502, -0.5005]
Before we continue with our modelsf we want to create a dictionary including the count of Oscar nominations for each director in a nominations list. Let's explore how it might look and why it works and what its trade-offs are. We start with a simple solution and then see how we might make it more "Pythonic" and robust, will be useful, especially as our data and ambitions grow.
nominated = {
1938: [
'Frank Capra',
'Michael Curtiz',
'Norman Taurog',
'King Vidor',
'Michael Curtiz'
],
1939: [
'Sam Wood',
'Frank Capra',
'John Ford',
'William Wyler',
'Victor Fleming'
],
1940: [
'John Ford',
'Sam Wood',
'William Wyler',
'George Cukor',
'Alfred Hitchcock'
]
}
# Initialize an empty dict
nom_count = {}
# Loop through the values (list of names)
for names in nominated.values():
for name in names:
nom_count[name] = nom_count.get(name, 0) + 1
This iterates over every year's nominee list and counts each director's nominations using a dictionary. As a result we get a dictionary mapping each director to their total nomination count.
Is the nested loop really a problem?
- Performance: For small data (like this), nested loops are fine. They're clear and explicit.
- Readability: For beginners, this is very readable.
But as you grow, you'll see that Python offers tools to make this kind of counting both faster to write and easier to read, especially for larger data.
Let's look at a way to elevate this solution using collections.Counter
, which is the go-to tool for counting things in Python. It's fast, clear, and built for exactly this job.
from collections import Counter
# Flatten all nominee lists into one big list
all_nominees = [name for names in nominated.values() for name in names]
# Count each director's nominations
nom_count_dict = dict(Counter(all_nominees))
Why is this better?
- Expressiveness: It says exactly what you mean: "count these things."
- Performance: Counter is optimized in C under the hood.
- Scalability: If you have thousands of years of data, this approach won't break a sweat.
Since we're building our data science toolkit, let's see how a dataframe approach might looks like by using pandas
for some data science fluency.
import pandas as pd
# Build a flat list of (year, director) pairs
records = [(year, name) for year, names in nominated.items() for name in names]
df = pd.DataFrame(records, columns=['Year', 'Director'])
# Count nominations per director
nom_count_dict = df['Director'].value_counts().to_dict()
Why this works well?
- Scalability: Easily expand to more columns (e.g., wins).
- Analysis: You can now group, filter, or visualize by year, director, etc.
- Real-World: This is how most finance/data science workflows operate.
Think of our original nested loop as counting coins by hand, one by one. Using Counter
is like dumping them in a coin-sorting machine, it's simply built for the task, faster and less error-prone.
Summary Table
Approach | Readability | Performance | Scalability | Best For |
---|---|---|---|---|
Nested Loops | Simple | Good | OK | Small, simple scripts |
Counter | Excellent | Excellent | Excellent | Counting anything |
Pandas | Excellent | Great | Excellent | Data science workflows |
Let's continue with another example. We have multiple models evaluated over several experiments, and we want to determine which model or models have the highest average performance.
The result is going to be a list with the best-performing model or models. It's going to be a list because there might be more than one model tied to the highest average performance.
model_performance = {
'Experiment 1': {
'Model A': 0.85,
'Model B': 0.9,
'Model C': 0.88,
'Model D': 0.92,
'Model E': 0.87
},
'Experiment 2': {
'Model A': 0.91,
'Model B': 0.89,
'Model C': 0.93,
'Model D': 0.94,
'Model E': 0.86
},
'Experiment 3': {
'Model A': 0.87,
'Model B': 0.9,
'Model C': 0.86,
'Model D': 0.95,
'Model E': 0.84
},
'Experiment 4': {
'Model A': 0.88,
'Model B': 0.85,
'Model C': 0.89,
'Model D': 0.93,
'Model E': 0.87
},
'Experiment 5': {
'Model A': 0.89,
'Model B': 0.88,
'Model C': 0.91,
'Model D': 0.92,
'Model E': 0.85
}
}
# Initialize two dicts
total_perf_count = {}
num_tests = {}
# Sum performance per model
for experiment, model_perf in model_performance.items():
for model, perf in model_perf.items():
total_perf_count[model] = total_perf_count.get(model, 0) + perf
num_tests[model] = num_perf.get(model, 0) + 1
# Create average performance dict
avg_perf_count = {
model: perf/num_perf[model] for model, perf in total_perf_count.items()
}
# Get the highest average performance
highest_avg_perf = max(avg_perf_count.values())
# Create the best models list
best_models = [
model for model, perf in avg_perf_count.items() if perf == highest_avg_perf
]
What if we use the condition if perf == max(avg_perf_count.values())
inside the loop (instead of calculating it outside) for each model? If there are n models, we compute the max n times, meaning O(n2)
time for large datasets.
So, whenever you see a function call within a loop condition, ask yourself if it must be recalculated each time, or can you do it once, outside the loop?
Imagine you're sorting through a box of trophies to find the tallest one.
- Inefficient way: Every time you pick up a trophy, you measure the height of all the trophies again to check if it's the tallest.
- Efficient way: Measure all the trophies once, remember the tallest, and then just compare each trophy to that remembered height.
In this case we computed the max(avg_perf_count.values())
outside the loop and collect all avgerage performances with a single count. It's clear, concise and efficient (O(n)
time for large datasets).