How to read TUD Datasets to DGL?

TUD Datasets

Jimmy (xiaoke) Shen
1 min readApr 24, 2020

TUD datasets are ‘A collection of benchmark datasets for graph classification and regression’.

Raw data format

The file format follows[1]

  • n = total number of nodes
  • m = total number of edges
  • N = number of graphs
  1. DS_A.txt (m lines): sparse (block diagonal) adjacency matrix for all graphs, each line corresponds to (row, col) resp. (node_id, node_id). All graphs are undirected. Hence, DS_A.txt contains two entries for each edge.
  2. DS_graph_indicator.txt (n lines): column vector of graph identifiers for all nodes of all graphs, the value in the i-th line is the graph_id of the node with node_id i
  3. DS_graph_labels.txt (N lines): class labels for all graphs in the data set, the value in the i-th line is the class label of the graph with graph_id i
  4. DS_node_labels.txt (n lines): column vector of node labels, the value in the i-th line corresponds to the node with node_id i

How to read the data into Graph neural Network library such as DGL?

In this article, we will find a way to process the TUD Dataset similar format to DGL.

Solution

I finally wrote python code to feed the raw data to DGL and the problem is solved.

Reference

[1] https://chrsmrrs.github.io/datasets/docs/format/

[2] DGL

--

--

No responses yet