How train - test split works for Graph Neural Networks

Question

How train - test split works for Graph Neural Networks

Sourajit

2021年10月17日 17:22

I have recently started studying GNN's. I have covered GCN and GraphSage so far. But I am confused regarding the process when testing occurs.

Now suppose in the graph above I am using the nodes as train and test set as shown in the figure. Suppose I am using the GraphSage model for a supervised node-classification task , now during training I am providing the sub-graph with blue nodes and the weights(parameters) gets calculated using the neighbourhood information of the nodes within this sub-graph(blue).
But during testing I want to find the labels of the green nodes. So during this time the forward propagation of GraphSage will be performed using the weights calculated during training and using the neighbourhood information of the test nodes.
My Doubt : So the part where I am confused is that during testing does the algorithm consider as neighbourhood only the green nodes(test set) or it also considers the blue node's (Since it is connected as can be seen in the figure) information during the forward propagation step to compute the node embedding?
Below is the attached forward propagation algorithm of Graphsage as mentioned in the paper.

It might be a silly question, but since I am new to this I am having difficulty in understanding the neighbourhood definition during train and test times. Do correct me if I have wrongly stated any point.

Topic graph-neural-network training inference

Category Data Science

Rafael · Accepted Answer · 2021年10月17日 17:22

Does the algorithm consider as neighbourhood only the green nodes(test set) or it also considers the blue node's?

It does consider both blue nodes and green nodes.

Note tha GNN deals with transductive learning, where the test data(nodes here) is seen (without knowing the labels) during training. What you might have in mind is inductive learning(train set and test set is completely separated).

Suppose I am using the GraphSage model for a supervised node-classification task , now during training I am providing the sub-graph with blue nodes and the weights(parameters) gets calculated using the neighbourhood information of the nodes within this sub-graph(blue).

This is not right, during training, you provide with the whole graph(both blue nodes and green nodes and all egdes) but you only provide with the labels of nodes in train set (labels of nodes in test set is unknown during training)

How train - test split works for Graph Neural Networks

About