Tutorial of Graph Classification by DGL

4 min readJun 8, 2020

The official tutorial

The official tutorial can be found HERE.

Comment about the tutorial

The tutorial is pretty nice. After reading the tutorial, we can get pretty much all the detail about the classification process. However, for myself, I am not satisfied with the detail of the input and output of the classifier g in the bellow figure.

What are the details before g and after g

The code for the classifier is shown here:

class Classifier(nn.Module):
    def __init__(self, in_dim, hidden_dim, n_classes):
        super(Classifier, self).__init__()
        self.conv1 = GraphConv(in_dim, hidden_dim)
        self.conv2 = GraphConv(hidden_dim, hidden_dim)
        self.classify = nn.Linear(hidden_dim, n_classes)

    def forward(self, g):
        # Use node degree as the initial node feature. For undirected graphs, the in-degree
        # is the same as the out_degree.
        h = g.in_degrees().view(-1, 1).float()
        # Perform graph convolution and activation function.
        h = F.relu(self.conv1(g, h))
        h = F.relu(self.conv2(g, h))
        g.ndata['h'] = h
        # Calculate graph representation by averaging all the node representations.
        hg = dgl.mean_nodes(g, 'h')
        return self.classify(hg)

If we output the size of h and hg by using the modified code below:

class Classifier(nn.Module):
    def __init__(self, in_dim, hidden_dim, n_classes):
        super(Classifier, self).__init__()
        self.conv1 = GraphConv(in_dim, hidden_dim)
        self.conv2 = GraphConv(hidden_dim, hidden_dim)
        self.classify = nn.Linear(hidden_dim, n_classes)
        print(f"n_classes, {n_classes}")
    def forward(self, g):
        # Use node degree as the initial node feature. For undirected graphs, the in-degree
        # is the same as the out_degree.
        h = g.in_degrees().view(-1, 1).float()
        # Perform graph convolution and activation function.
        h = F.relu(self.conv1(g, h))
        h = F.relu(self.conv2(g, h))
        g.ndata['h'] = h
        print(f"h.size, {h.size()}")
        # Calculate graph representation by averaging all the node representations.
        hg = dgl.mean_nodes(g, 'h')
        print(f"hg.size, {hg.size()}")
        return self.classify(hg)

We can get output like this:

n_classes, 8
h.size, torch.Size([476, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([420, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([451, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([463, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([464, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([450, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([463, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([448, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([426, 256])
hg.size, torch.Size([32, 256])
h.size, torch.Size([452, 256])
hg.size, torch.Size([32, 256])
Epoch 0, loss 2.0047

What we can get from the output?

each batch has 32 graphs, and since each graph may have a different number of nodes, the h.size[0] is not always the same.
each graph is aggregated to a 1 by x vector, sometimes we call this as READOUT. For example, if we have 10 nodes for graph A and the raw output of the graph network is 10 by 256, then after the readout the output will be 1 by 256.
Since the batch size is 32, it means we will have 32 graphs for each batch. After the READOUT, we will have a fixed output shape which is 32 by 256.
the 32 by 256 will be the input data to the classifier g. At this step, the input and output of the classifier g should be clear.

What is the output of the classifier g in the official tutorial?

From the code below:

self.classify = nn.Linear(hidden_dim, n_classes)

It is only an output of FC layer. Since there is no activation function, the output can be any number between -inf to +inf.

loss_func = nn.CrossEntropyLoss()
prediction = model(bg)        
loss = loss_func(prediction, label)

If you read this post, you will know that from PyTorch, the nn.CrossEntropyLoss will first apply the softmax to the input and then apply the cross-entropy.

In your example, you are treating output [0,0,0,1] as probabilities as required by the mathematical definition of cross-entropy. But PyTorch treats them as outputs, that don’t need to sum to 1 and need to be first converted into probabilities for which it uses the softmax function.
So H(p,q) becomes:
H(p, softmax(output))
Translating the output [0,0,0,1] into probabilities:
softmax([0,0,0,1])= [0.1749,0.1749,0.1749,0.4754]
whence:
-log(0.4754) = 0.7437

This also explains why in the evaluation process, we should apply softmax to the output as shown in the code here:

probs_Y = torch.softmax(model(test_bg), 1)

The Pytorch nn.CrossEntropyLoss formular

Pytorch nn.CrossEntropyLoss formular

The readout process

The readout code is here:

hg = dgl.mean_nodes(g, 'h')

And the API of dgl.mean_nodes function can be found here

Notes
Return a stacked tensor with an extra first dimension whose size equals batch size of the input graph. The i-th row of the stacked tensor contains the readout result of the i-th graph in the batch. If a graph has no nodes, a zero tensor with the same shape is returned at the corresponding row.

So far, the whole process should be clear.

Source code of a toy example

The source code is available on the official tutorial website and the modified version for this post can be found on my github.

Graph classification source code

Using GIN to do the graph classification code can be found here

It is based on the official DGL library from HERE.