PyTorch freeze part of the layers

Jimmy (xiaoke) Shen
5 min readJun 17, 2020

In PyTorch we can freeze the layer by setting the requires_grad to False. The weight freeze is helpful when we want to apply a pretrained model.

Here I’d like to explore this process.

Build a toy model

import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
class Net(nn.Module):

def __init__(self):
super().__init__()
self.fc1 = nn.Linear(2, 4)
self.fc2 = nn.Linear(4, 3)
self.out = nn.Linear(3, 1)
self.out_act = nn.Sigmoid()

def forward(self, inputs):
a1 = self.fc1(inputs)
a2 = self.fc2(a1)
a3 = self.out(a2)
y = self.out_act(a3)
return y

Explore in terminal step by step

Define the model

>>> import torch.nn as nn
>>> from torch.autograd import Variable
>>> import torch.optim as optim
>>> class Net(nn.Module):
...
... def __init__(self):
... super().__init__()
... self.fc1 = nn.Linear(2, 4)
... self.fc2 = nn.Linear(4, 3)
... self.out = nn.Linear(3, 1)
... self.out_act = nn.Sigmoid()
...
... def forward(self, inputs):
... a1 = self.fc1(inputs)
... a2 = self.fc2(a1)
... a3 = self.out(a2)
... y = self.out_act(a3)
... return y
...

Output the parameters

>>> net = Net()
>>> for name, para in net.named_parameters():
... print("-"*20)
... print(f"name: {name}")
... print("values: ")
... print(para)
...
--------------------
name: fc1.weight
values:
Parameter containing:
tensor([[ 0.2131, 0.3480],
[ 0.2090, -0.5149],
[ 0.3874, 0.5557],
[ 0.5799, -0.1398]], requires_grad=True)
--------------------
name: fc1.bias
values:
Parameter containing:
tensor([ 0.5810, -0.6059, 0.5854, -0.3162], requires_grad=True)
--------------------
name: fc2.weight
values:
Parameter containing:
tensor([[ 0.0708, -0.0415, 0.3984, -0.1483],
[-0.2510, -0.0583, 0.4639, 0.0440],
[-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values:
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values:
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values:
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see all the requires_grad is True for those parameters.

Set requires_grad to False

>>> for name, para in net.named_parameters():
... para.requires_grad = False
... print("-"*20)
... print(f"name: {name}")
... print("values: ")
... print(para)
...
--------------------
name: fc1.weight
values:
Parameter containing:
tensor([[ 0.2131, 0.3480],
[ 0.2090, -0.5149],
[ 0.3874, 0.5557],
[ 0.5799, -0.1398]])
--------------------
name: fc1.bias
values:
Parameter containing:
tensor([ 0.5810, -0.6059, 0.5854, -0.3162])
--------------------
name: fc2.weight
values:
Parameter containing:
tensor([[ 0.0708, -0.0415, 0.3984, -0.1483],
[-0.2510, -0.0583, 0.4639, 0.0440],
[-0.3923, -0.1058, -0.2382, -0.0739]])
--------------------
name: fc2.bias
values:
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772])
--------------------
name: out.weight
values:
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]])
--------------------
name: out.bias
values:
Parameter containing:
tensor([-0.5268])

We can see when setting the parameter’s require_grad as False, there is no output of “requires_grad=True” when printing the parameter. I believe this should be related to the PyTorch’s printing method: when require_grad is False, just not show.

Set requires_grad back to True

>>> for name, para in net.named_parameters():
... para.requires_grad = True
... print("-"*20)
... print(f"name: {name}")
... print("values: ")
... print(para)
...
--------------------
name: fc1.weight
values:
Parameter containing:
tensor([[ 0.2131, 0.3480],
[ 0.2090, -0.5149],
[ 0.3874, 0.5557],
[ 0.5799, -0.1398]], requires_grad=True)
--------------------
name: fc1.bias
values:
Parameter containing:
tensor([ 0.5810, -0.6059, 0.5854, -0.3162], requires_grad=True)
--------------------
name: fc2.weight
values:
Parameter containing:
tensor([[ 0.0708, -0.0415, 0.3984, -0.1483],
[-0.2510, -0.0583, 0.4639, 0.0440],
[-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values:
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values:
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values:
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see the parameter values does not change and “requires_grad=True” is back when printing the parameter.

Freeze part of the parameter

For example, only freeze the fc1 layer.

  • Step 1: get params keys
>>> params = net.state_dict()
>>> params.keys()
odict_keys(['fc1.weight', 'fc1.bias', 'fc2.weight', 'fc2.bias', 'out.weight', 'out.bias'])
  • Step 2: Set related layer’s require grads to False (a naive way)
>>> keys = list(params.keys())
>>> keys[0]
'fc1.weight'
>>> net.fc1.weight.requires_grad = False
>>> for name, para in net.named_parameters():
... print("-"*20)
... print(f"name: {name}")
... print("values: ")
... print(para)
...
--------------------
name: fc1.weight
values:
Parameter containing:
tensor([[ 0.2131, 0.3480],
[ 0.2090, -0.5149],
[ 0.3874, 0.5557],
[ 0.5799, -0.1398]])
--------------------
name: fc1.bias
values:
Parameter containing:
tensor([ 0.5810, -0.6059, 0.5854, -0.3162], requires_grad=True)
--------------------
name: fc2.weight
values:
Parameter containing:
tensor([[ 0.0708, -0.0415, 0.3984, -0.1483],
[-0.2510, -0.0583, 0.4639, 0.0440],
[-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values:
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values:
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values:
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see that the fc1.weight’s require_grad is False now.

A better way to freeze

We can identify the parameter by name[2]:

Filter and control the requires_grad by filtering through the parameter names.

Suppose we want to freeze the layer contains name of “fc1”. We use a print net parameter function to simple the code

def print_net_parameters(net):
for name, para in net.named_parameters():
print("-"*20)
print(f"name: {name}")
print("values: ")
print(para)
>>> for name, param in net.named_parameters():
... if param.requires_grad and 'fc1' in name:
... param.requires_grad = False
...
>>> print_net_parameters(net)
--------------------
name: fc1.weight
values:
Parameter containing:
tensor([[ 0.2131, 0.3480],
[ 0.2090, -0.5149],
[ 0.3874, 0.5557],
[ 0.5799, -0.1398]])
--------------------
name: fc1.bias
values:
Parameter containing:
tensor([ 0.5810, -0.6059, 0.5854, -0.3162])
--------------------
name: fc2.weight
values:
Parameter containing:
tensor([[ 0.0708, -0.0415, 0.3984, -0.1483],
[-0.2510, -0.0583, 0.4639, 0.0440],
[-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values:
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values:
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values:
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see that both fc1’s weight and bias’s require_grad are set to False.

Last one more step

We haven’t done yet as even the required grad is set to False, we still can update the weights

>>> net.fc1.weight -= 0.1*net.fc1.weight
>>> print_net_parameters(net)
--------------------
name: fc1.weight
values:
Parameter containing:
tensor([[ 0.1918, 0.3132],
[ 0.1881, -0.4634],
[ 0.3487, 0.5002],
[ 0.5220, -0.1258]])
--------------------
name: fc1.bias
values:
Parameter containing:
tensor([ 0.5810, -0.6059, 0.5854, -0.3162])
--------------------
name: fc2.weight
values:
Parameter containing:
tensor([[ 0.0708, -0.0415, 0.3984, -0.1483],
[-0.2510, -0.0583, 0.4639, 0.0440],
[-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values:
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values:
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values:
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see that the net’s fc1 weights can still be updated as the values are changed.

So we should filter the parameters to only those requires_grad ones by using this code[1]

>>> non_frozen_parameters = [p for p in net.parameters() if p.requires_grad]
>>> optimizer = optim.SGD(non_frozen_parameters, lr=0.1)

Then we finish the frozen of all the “fc1” parameters.

Quick summary

we can use

  • net.state_dict() to get the key information of all parameters and we can print it out to help us figure out which layers that we want to freeze
  • If we know our target layer to be frozen, we can then freeze the layers by names

Key code using the “fc1” as example

for name, param in net.named_parameters():
if param.requires_grad and 'fc1' in name:
param.requires_grad = False
non_frozen_parameters = [p for p in net.parameters() if p.requires_grad]
optimizer = optim.SGD(non_frozen_parameters, lr=0.1)

Thanks for reading and I hope it helps.

An example of using the code in a research project

In the MoLGNN paper, we used this method to freeze part of the network layer. Here is the related code for reference


# This is for a Graph Neural Network based on the GIN paper
_FREEZE_KEY = {'0': ['ginlayers.0','linears_prediction_classification.0'],
'1': ['ginlayers.1','linears_prediction_classification.1'],
'2': ['ginlayers.2','linears_prediction_classification.2'],
'3': ['ginlayers.3','linears_prediction_classification.3'],
'4': ['ginlayers.4','linears_prediction_classification.4'],
}
def freeze_model_weights(model, freeze_key_id="0"):
"""
freeze the model weights based on the layer names
for example, if the layers contain '0', then if the name of the parameter contains _FREEZE_KEY['0'] will be frozen
"""
print('Going to apply weight frozen')
print('before frozen, require grad parameter names:')
for name, param in model.named_parameters():
if param.requires_grad:print(name)
freeze_keys = _FREEZE_KEY[freeze_key_id]
print('freeze_keys', freeze_keys)
for name, para in model.named_parameters():
if para.requires_grad and any(key in name for key in freeze_keys):
para.requires_grad = False
print('after frozen, require grad parameter names:')
for name, para in model.named_parameters():
if para.requires_grad:print(name)
return model

model = freeze_model_weights(model, freeze_key_id = "0")
non_frozen_parameters = [p for p in net.parameters() if p.requires_grad]
optimizer = optim.Adam(non_frozen_parameters, lr=0.001)

Reference

[1]https://discuss.pytorch.org/t/how-the-pytorch-freeze-network-in-some-layers-only-the-rest-of-the-training/7088

[2]https://discuss.pytorch.org/t/how-to-print-models-parameters-with-its-name-and-requires-grad-value/10778

--

--