PyTorch freeze part of the layers

5 min readJun 17, 2020

In PyTorch we can freeze the layer by setting the requires_grad to False. The weight freeze is helpful when we want to apply a pretrained model.

Here I’d like to explore this process.

Build a toy model

import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
class Net(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(2, 4)
        self.fc2 = nn.Linear(4, 3)
        self.out = nn.Linear(3, 1)
        self.out_act = nn.Sigmoid()
        
    def forward(self, inputs):
        a1 = self.fc1(inputs)
        a2 = self.fc2(a1)
        a3 = self.out(a2)
        y = self.out_act(a3)
        return y

Explore in terminal step by step

Define the model

>>> import torch.nn as nn
>>> from torch.autograd import Variable
>>> import torch.optim as optim
>>> class Net(nn.Module):
...     
...     def __init__(self):
...         super().__init__()
...         self.fc1 = nn.Linear(2, 4)
...         self.fc2 = nn.Linear(4, 3)
...         self.out = nn.Linear(3, 1)
...         self.out_act = nn.Sigmoid()
...         
...     def forward(self, inputs):
...         a1 = self.fc1(inputs)
...         a2 = self.fc2(a1)
...         a3 = self.out(a2)
...         y = self.out_act(a3)
...         return y
...

Output the parameters

>>> net = Net()
>>> for name, para in net.named_parameters():
...     print("-"*20)
...     print(f"name: {name}")
...     print("values: ")
...     print(para)
... 
--------------------
name: fc1.weight
values: 
Parameter containing:
tensor([[ 0.2131,  0.3480],
        [ 0.2090, -0.5149],
        [ 0.3874,  0.5557],
        [ 0.5799, -0.1398]], requires_grad=True)
--------------------
name: fc1.bias
values: 
Parameter containing:
tensor([ 0.5810, -0.6059,  0.5854, -0.3162], requires_grad=True)
--------------------
name: fc2.weight
values: 
Parameter containing:
tensor([[ 0.0708, -0.0415,  0.3984, -0.1483],
        [-0.2510, -0.0583,  0.4639,  0.0440],
        [-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values: 
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values: 
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values: 
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see all the requires_grad is True for those parameters.

Set requires_grad to False

>>> for name, para in net.named_parameters():
...     para.requires_grad = False
...     print("-"*20)
...     print(f"name: {name}")
...     print("values: ")
...     print(para)
... 
--------------------
name: fc1.weight
values: 
Parameter containing:
tensor([[ 0.2131,  0.3480],
        [ 0.2090, -0.5149],
        [ 0.3874,  0.5557],
        [ 0.5799, -0.1398]])
--------------------
name: fc1.bias
values: 
Parameter containing:
tensor([ 0.5810, -0.6059,  0.5854, -0.3162])
--------------------
name: fc2.weight
values: 
Parameter containing:
tensor([[ 0.0708, -0.0415,  0.3984, -0.1483],
        [-0.2510, -0.0583,  0.4639,  0.0440],
        [-0.3923, -0.1058, -0.2382, -0.0739]])
--------------------
name: fc2.bias
values: 
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772])
--------------------
name: out.weight
values: 
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]])
--------------------
name: out.bias
values: 
Parameter containing:
tensor([-0.5268])

We can see when setting the parameter’s require_grad as False, there is no output of “requires_grad=True” when printing the parameter. I believe this should be related to the PyTorch’s printing method: when require_grad is False, just not show.

Set requires_grad back to True

>>> for name, para in net.named_parameters():
...     para.requires_grad = True
...     print("-"*20)
...     print(f"name: {name}")
...     print("values: ")
...     print(para)
... 
--------------------
name: fc1.weight
values: 
Parameter containing:
tensor([[ 0.2131,  0.3480],
        [ 0.2090, -0.5149],
        [ 0.3874,  0.5557],
        [ 0.5799, -0.1398]], requires_grad=True)
--------------------
name: fc1.bias
values: 
Parameter containing:
tensor([ 0.5810, -0.6059,  0.5854, -0.3162], requires_grad=True)
--------------------
name: fc2.weight
values: 
Parameter containing:
tensor([[ 0.0708, -0.0415,  0.3984, -0.1483],
        [-0.2510, -0.0583,  0.4639,  0.0440],
        [-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values: 
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values: 
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values: 
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see the parameter values does not change and “requires_grad=True” is back when printing the parameter.

Freeze part of the parameter

For example, only freeze the fc1 layer.

Step 1: get params keys

>>> params = net.state_dict()
>>> params.keys()
odict_keys(['fc1.weight', 'fc1.bias', 'fc2.weight', 'fc2.bias', 'out.weight', 'out.bias'])

Step 2: Set related layer’s require grads to False (a naive way)

>>> keys = list(params.keys())
>>> keys[0]
'fc1.weight'
>>> net.fc1.weight.requires_grad = False
>>> for name, para in net.named_parameters():
...     print("-"*20)
...     print(f"name: {name}")
...     print("values: ")
...     print(para)
... 
--------------------
name: fc1.weight
values: 
Parameter containing:
tensor([[ 0.2131,  0.3480],
        [ 0.2090, -0.5149],
        [ 0.3874,  0.5557],
        [ 0.5799, -0.1398]])
--------------------
name: fc1.bias
values: 
Parameter containing:
tensor([ 0.5810, -0.6059,  0.5854, -0.3162], requires_grad=True)
--------------------
name: fc2.weight
values: 
Parameter containing:
tensor([[ 0.0708, -0.0415,  0.3984, -0.1483],
        [-0.2510, -0.0583,  0.4639,  0.0440],
        [-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values: 
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values: 
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values: 
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see that the fc1.weight’s require_grad is False now.

A better way to freeze

We can identify the parameter by name[2]:

Filter and control the requires_grad by filtering through the parameter names.

Suppose we want to freeze the layer contains name of “fc1”. We use a print net parameter function to simple the code

def print_net_parameters(net):
    for name, para in net.named_parameters():
        print("-"*20)
        print(f"name: {name}")
        print("values: ")
        print(para)

>>> for name, param in net.named_parameters():
...     if param.requires_grad and 'fc1' in name:
...         param.requires_grad = False
... 
>>> print_net_parameters(net)
--------------------
name: fc1.weight
values: 
Parameter containing:
tensor([[ 0.2131,  0.3480],
        [ 0.2090, -0.5149],
        [ 0.3874,  0.5557],
        [ 0.5799, -0.1398]])
--------------------
name: fc1.bias
values: 
Parameter containing:
tensor([ 0.5810, -0.6059,  0.5854, -0.3162])
--------------------
name: fc2.weight
values: 
Parameter containing:
tensor([[ 0.0708, -0.0415,  0.3984, -0.1483],
        [-0.2510, -0.0583,  0.4639,  0.0440],
        [-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values: 
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values: 
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values: 
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see that both fc1’s weight and bias’s require_grad are set to False.

Last one more step

We haven’t done yet as even the required grad is set to False, we still can update the weights

>>> net.fc1.weight -= 0.1*net.fc1.weight
>>> print_net_parameters(net)
--------------------
name: fc1.weight
values: 
Parameter containing:
tensor([[ 0.1918,  0.3132],
        [ 0.1881, -0.4634],
        [ 0.3487,  0.5002],
        [ 0.5220, -0.1258]])
--------------------
name: fc1.bias
values: 
Parameter containing:
tensor([ 0.5810, -0.6059,  0.5854, -0.3162])
--------------------
name: fc2.weight
values: 
Parameter containing:
tensor([[ 0.0708, -0.0415,  0.3984, -0.1483],
        [-0.2510, -0.0583,  0.4639,  0.0440],
        [-0.3923, -0.1058, -0.2382, -0.0739]], requires_grad=True)
--------------------
name: fc2.bias
values: 
Parameter containing:
tensor([ 0.2756, -0.0547, -0.4772], requires_grad=True)
--------------------
name: out.weight
values: 
Parameter containing:
tensor([[ 0.4947, -0.5356, -0.5736]], requires_grad=True)
--------------------
name: out.bias
values: 
Parameter containing:
tensor([-0.5268], requires_grad=True)

We can see that the net’s fc1 weights can still be updated as the values are changed.

So we should filter the parameters to only those requires_grad ones by using this code[1]

>>> non_frozen_parameters = [p for p in net.parameters() if p.requires_grad]
>>> optimizer = optim.SGD(non_frozen_parameters, lr=0.1)

Then we finish the frozen of all the “fc1” parameters.

Quick summary

we can use

net.state_dict() to get the key information of all parameters and we can print it out to help us figure out which layers that we want to freeze
If we know our target layer to be frozen, we can then freeze the layers by names

Key code using the “fc1” as example

for name, param in net.named_parameters():
    if param.requires_grad and 'fc1' in name:
        param.requires_grad = False
non_frozen_parameters = [p for p in net.parameters() if p.requires_grad]
optimizer = optim.SGD(non_frozen_parameters, lr=0.1)

Thanks for reading and I hope it helps.

An example of using the code in a research project

In the MoLGNN paper, we used this method to freeze part of the network layer. Here is the related code for reference


# This is for a Graph Neural Network based on the GIN paper
_FREEZE_KEY = {'0': ['ginlayers.0','linears_prediction_classification.0'],
               '1': ['ginlayers.1','linears_prediction_classification.1'],
               '2': ['ginlayers.2','linears_prediction_classification.2'],
               '3': ['ginlayers.3','linears_prediction_classification.3'],
               '4': ['ginlayers.4','linears_prediction_classification.4'],
               }
def freeze_model_weights(model, freeze_key_id="0"):
    """
    freeze the model weights based on the layer names
    for example, if the layers contain '0', then if the name of the parameter contains _FREEZE_KEY['0'] will be frozen
    """
    print('Going to apply weight frozen')
    print('before frozen, require grad parameter names:')
    for name, param in model.named_parameters():
        if param.requires_grad:print(name)
    freeze_keys = _FREEZE_KEY[freeze_key_id]
    print('freeze_keys', freeze_keys)
    for name, para in model.named_parameters():
        if para.requires_grad and any(key in name for key in freeze_keys):
            para.requires_grad = False
    print('after frozen, require grad parameter names:')
    for name, para in model.named_parameters():
        if para.requires_grad:print(name)
    return model

model = freeze_model_weights(model, freeze_key_id = "0")
non_frozen_parameters = [p for p in net.parameters() if p.requires_grad]
optimizer = optim.Adam(non_frozen_parameters, lr=0.001)

Reference

[1]https://discuss.pytorch.org/t/how-the-pytorch-freeze-network-in-some-layers-only-the-rest-of-the-training/7088

[2]https://discuss.pytorch.org/t/how-to-print-models-parameters-with-its-name-and-requires-grad-value/10778