Summary of “GraphTER”
The Paper
This is a CVPR 2020 paper.
Who
Researchers from Peking University and Futurewei
Motivation
- GCNN’s recent success motivated authors of this paper to try to propose an unsupervised based Representations based on GCNN.
- The main framework is adopted based on Auto Encoder.
Main Contribution
In the paper, 3 contributions are listed. I summarize two of there here:
- Propose Graph Transformation Equivariant Representation (GraphTER) learning to extract adequate graph signal feature representations in an unsupervised fashion
- outperforms the state-of-the-art methods in unsupervised graph feature learning
GraphTER architecture
The architecture is pretty clear illustrated. The upper one is an autoencoder similar network architecture. The bottom one is used to finish the image classification and part segmentation tasks. Instead of CNN, Econv is used (See detail about what is E conv).
Main steps
Graph Transformations
- Baisc
Recall the representation of a graph.
G={V,E}, Vmeans the vertex and E means edge. The number of V is N.
For each node, we have C dimensional graph signal. We denote this N by C data as X.
To characterize the similarities (and thus the graph structure) among node signals, an adjacency matrix A is defined on G.
2. Graph Signal Transformation
The authors define a graph transformation on the signals X as node-wise filtering on X.
“The filter t is applied to each node individually, which can be either node-invariant or node-variant. We will call the graph transformation isotropic (anisotropic) if it is node-invariant (variant).
3. Node-wise Graph Signal Transformation
In this paper, the authors focus on node-wise graph signal transformation. each node has its own transformation, either isotropically or anisotropically.
An example of the node wise transformation is shown in the figure above.
From the paper, the benefits of node wise transformation are listed:
- The node-wise transformations allow us to use node sampling to study different parts of graphs under various transformations.
- By decoding the node-wise transformations, we will be able to learn the representations of individual nodes. Moreover, these node-wise representations will not only capture the local graph structures under these transformations but also contain global information about the graph when these nodes are sampled into different groups over iterations during training.
Apply node wise transformation to auto encode based GCNN network
To learn the applied node-wise transformations, authors designed a full graph-convolutional auto-encoder network as illustrated before. Among various paradigms of GCNNs, EdgeCon is chosen based on the paper “Dynamic Graph CNN for Learning on Point Clouds”
Encoder
- The encoder E takes the signals of an original graph X and the transformed counterparts ̃X as input. E encodes node-wise features of X and ̃X through a Siamese encoder network with shared weights
- Multiple layers of regular edge convolutions are stacked to form the final encoder.
Decoder
Node-wise features of the original and transformed graphs are then concatenated at each node, which is then fed into the transformation decoder. The decoder consists of severalEdgeConv blocks to aggregate the representations of both the original and transformed graphs to predict the node-wise transformations.
Loss
mean squared error (MSE) between the ground truth and estimated transformation parameters at each sampled nod
Results
Classification
Part Segmentation
Details
What is Econv?
EdgeCon is chosen based on the paper “Dynamic Graph CNN for Learning on Point Clouds”
The code of Econv is from here:
class EdgeConvolution(nn.Module):
def __init__(self, k, in_features, out_features):
super(EdgeConvolution, self).__init__()
self.k = k
self.conv = nn.Conv2d(
in_features * 2, out_features, kernel_size=1, bias=False
)
self.bn = nn.BatchNorm2d(out_features)
self.relu = nn.LeakyReLU(negative_slope=0.2)def forward(self, x):
x = utils.get_edge_feature(x, k=self.k)
x = self.relu(self.bn(self.conv(x)))
x = x.max(dim=-1, keepdim=False)[0]
return x
I am not quite clear about the implementation of Econv. From the code, it seems it has some connections to the CNN. I will be back when I am clear about the implementation here. TK
Question about this paper
Unclear about the data size used to train the network during the unsupervised stage. If a relatively larger dataset is used, the performance may be better.
Take away
By combining the unsupervised learning and GCNN, we can achieve some baselines based on unsupervised learning + supervised learning on computer vision tasks. We can try to see whether it works on the object detection.
Code
Code is based on pytorch and can be found here.