This is my Pytorch implementation for the paper:
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua (2019). Neural Graph Collaborative Filtering, Paper in ACM DL or Paper in arXiv. In SIGIR'19, Paris, France, July 21-25, 2019.
The author's tensorflow implementation is here
The code has been tested running under Python 3.6.10. The required packages are as follows:
- pytorch == 1.5.0
- numpy == 1.18.1
- scipy == 1.4.1
- sklearn == 0.22.1
- tqdm == 4.45.0
The instruction of commands has been clearly stated in the codes (see the parser function in NGCF/utility/parser.py).
- Gowalla dataset
python NGCF_pytorch.py --dataset gowalla --regs [1e-5] --embed_size 64 --layer_size [64,64,64] --lr 0.0001 --save_flag 1 --pretrain 0 --batch_size 1024 --epoch 400 --verbose 1 --node_dropout [0.1] --mess_dropout [0.1,0.1,0.1]
- Amazon-book dataset
python NGCF_pytorch.py --dataset amazon-book --regs [1e-5] --embed_size 64 --layer_size [64,64,64] --lr 0.0005 --save_flag 1 --pretrain 0 --batch_size 1024 --epoch 200 --verbose 50 --node_dropout [0.1] --mess_dropout [0.1,0.1,0.1]
Some important arguments:
-
alg_type- It specifies the type of graph convolutional layer.
- Here we provide three options:
ngcf(by default), proposed in Neural Graph Collaborative Filtering, SIGIR2019. Usage:--alg_type ngcf.gcn, proposed in Semi-Supervised Classification with Graph Convolutional Networks, ICLR2018. Usage:--alg_type gcn.gcmc, propsed in Graph Convolutional Matrix Completion, KDD2018. Usage:--alg_type gcmc.
-
adj_type- It specifies the type of laplacian matrix where each entry defines the decay factor between two connected nodes.
- Here we provide four options:
ngcf(by default), where each decay factor between two connected nodes is set as 1(out degree of the node), while each node is also assigned with 1 for self-connections. Usage:--adj_type ngcf.plain, where each decay factor between two connected nodes is set as 1. No self-connections are considered. Usage:--adj_type plain.norm, where each decay factor bewteen two connected nodes is set as 1/(out degree of the node + self-conncetion). Usage:--adj_type norm.gcmc, where each decay factor between two connected nodes is set as 1/(out degree of the node). No self-connections are considered. Usage:--adj_type gcmc.
-
node_dropout- It indicates the node dropout ratio, which randomly blocks a particular node and discard all its outgoing messages. Usage:
--node_dropout [0.1] --node_dropout_flag 1 - Note that the arguement
node_dropout_flagalso needs to be set as 1, since the node dropout could lead to higher computational cost compared to message dropout.
- It indicates the node dropout ratio, which randomly blocks a particular node and discard all its outgoing messages. Usage:
-
mess_dropout- It indicates the message dropout ratio, which randomly drops out the outgoing messages. Usage
--mess_dropout [0.1,0.1,0.1].
- It indicates the message dropout ratio, which randomly drops out the outgoing messages. Usage
Author provides two processed datasets: Gowalla and Amazon-book. You can find it at here
-
train.txt- Train file.
- Each line is a user with her/his positive interactions with items: userID\t a list of itemID\n.
-
test.txt- Test file (positive instances).
- Each line is a user with her/his positive interactions with items: userID\t a list of itemID\n.
- Note that here we treat all unobserved interactions as the negative instances when reporting performance.
-
user_list.txt- User file.
- Each line is a triplet (org_id, remap_id) for one user, where org_id and remap_id represent the ID of the user in the original and our datasets, respectively.
-
item_list.txt- Item file.
- Each line is a triplet (org_id, remap_id) for one item, where org_id and remap_id represent the ID of the item in the original and our datasets, respectively.