Skip to content

Conversation

@1324fgg
Copy link

@1324fgg 1324fgg commented Apr 27, 2025

No description provided.

jethrocsau and others added 30 commits April 21, 2025 18:53
Jupyter notebook for arvix_data_loading_pipeline
Add some explanation for extracting normal embedding from he saved graph. And mention the problem may happen processing dataset mag
…00 nodes). This sampled mag nodes have more nodes, but less connected compared to arxiv dataset.
This is the sampled graph mag dataset. It contains paper node and paper citation edge only and "feat" is the 128 embedding of title and abstract provided by the dataset, "_ID" is the node_ID, "y" is the class labels. its structure is
Node data: dict_keys(['year', 'feat', '_ID', '_TYPE', 'y'])
Edge data: dict_keys(['reltype', '_ID', '_TYPE'])
Here is the description image of arxiv dataset, mag dataset and sampled mag dataset.
This is model trained on graphsage on sampled mag dataset. I split the 2w data into train, validatino and test, seperated by year of paper like 2013, 2015. The accuracy (47% for training 100 epoches) is higher than the 2005.00687v7 arxiv paper, it provided only 31.53% . Maybe my sampled dataset is better connected, so it is easier to predict the node label.
This is the first version, training is really faster than I thought, it only takes minutes, I will finish the other part later.
recommend to change the name for better understanding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants