How to read TUD Datasets to DGL?
TUD Datasets
1 min readApr 24, 2020
TUD datasets are ‘A collection of benchmark datasets for graph classification and regression’.
Raw data format
The file format follows[1]
n
= total number of nodesm
= total number of edgesN
= number of graphs
DS_A.txt
(m
lines): sparse (block diagonal) adjacency matrix for all graphs, each line corresponds to(row, col)
resp.(node_id, node_id)
. All graphs are undirected. Hence,DS_A.txt
contains two entries for each edge.DS_graph_indicator.txt
(n
lines): column vector of graph identifiers for all nodes of all graphs, the value in the i-th line is thegraph_id
of the node withnode_id i
DS_graph_labels.txt
(N
lines): class labels for all graphs in the data set, the value in the i-th line is the class label of the graph withgraph_id i
DS_node_labels.txt
(n
lines): column vector of node labels, the value in the i-th line corresponds to the node withnode_id i
How to read the data into Graph neural Network library such as DGL?
In this article, we will find a way to process the TUD Dataset similar format to DGL.
Solution
I finally wrote python code to feed the raw data to DGL and the problem is solved.
Reference
[1] https://chrsmrrs.github.io/datasets/docs/format/
[2] DGL