This is the code repository for the paper : LinkLogic: A New Method and Benchmark for Explainable Knowledge Graph Predictions.
- The
FB13dataset from OpenKE benchmark is present atdataset/FB13 - The
dataset/fb13_resplitresplits the test data inFB13dataset for evaluation purposes mentioned in the paper. - The new
FB14dataset is present atdataset/fb14
To train the ComplEx Decoder used in the paper, we used the DGL-KE library. For easy reproducibility of the experiments mentioned in the paper, we are sharing the trained embeddings using ComplEx decode for the fb13_resplit and fb14 datasets. To download the same run the following script from the root of the project:
brew install git-lfs
git lfs pull-
Setup environment
conda create -n linklogic python=3.8 conda activate linklogic
-
Update Parameters
-
To run the experiemnts configure the params at
linklogic/params.json -
It contains two set of params
io_params: Defines the params to read the dataset and save the outputslinklogic_params: Defines the params used by linklogic to generate explanations
-
Description of the
io_paramsdata_path: Path to the dataset.save_path: Path to save the output of therun_linklogic.py
-
Description of the
linklogic_paramsdataset: Name of the dataset. Currently supportsfb13_resplit,fb14method: Name of the KGE embedding strategy. Currently supportedComplExandTransE.prob: Boolean for sigmoid transformation on the knowledge graph embedding scoresn_instances: Number of neighbors to sample to calculate variance for perturbing query embeddingstopk: Number of paths to consider per relation typeneighbor_sample_size: Number of neighbors to sample to calculate variance for perturbing query embeddingsvar_scale_head: To scale the head embedding varinace for perturbation. Defaul value = 1var_scale_tail: To scale the tail embedding variance for perturbation. Defaul value = 1seed: To reproduce the resultshop2_path_cal: Strategy to compute the 2-hop path score. Valid options -productorsqrtlogsum: Boolean for log transformation of featuers and the labelsalpha: Regularization constant to for the surrogate modelconsider_child: Boolean to remove direct inverse evidence for parents benchmarkbenchmark: Benchmark category - Currently supportsparentsorlocationbenchmark_datatype: Benchmark datatype - Currently supportsanalysisortuningr1_name_list: List of relations to consider for 1st hop in creating 2-hop pathsr2_name_list: List of relations to consider for 2nd hop in creating 2-hop pathsfeature_considerations: Wheather to cosider only 1-hop, 2-hop or all features to train the surrogate model
-
-
Run Linklogic
cd linklogic python run_linklogic --params_file params.json -
Generated results are stored in the
io_path["save_path"] -
Format of generated results
- For each
run_linklogic.pyrun results are saved in theio_params["save_path"]with the following file format:- {dataset}_{benchmark}_{benchmark_datatype}_{method}_{feature_considerations}_child_{consider_child}.pickle
- Here, for each <feature_consideration> a new .pickle file is saved.
- Contents of the .pickle files saved:
- Each file constains the metadata associated with the list of all the queries
- Output Params:
query_triple: The query triple for which the linklogic explanations are desired.prob: Boolean for sigmoid transformation on the knowledge graph embedding scoresquery_triple_kge_score: The link prediction score as identified by the KGE method used. Default method is ComplEx and the scores are between 0 and 1 ifprobisTrue.final_columns: List of all the triples identified as a feature for 1-hop, 2-hop and all.linklogic_features: List of all the triples identified as a feature for 1-hop, 2-hop and all, along with other metadata. Eg.kge_scorewhich is a dictionary of1st_hop_kge_score,path_scoreandpath_score_method,coefwhich us the coefficient score from the surrogate model, andsplitwhich contains the information if the triple is present in thetrain,validortestsplit in the graph.linklogic_metrics: Dictionary that contains two keys: 1.train_acc- Training accurary of the surrogate model, and 2.test_acc- Test accuracy of the surrogate model.category: Benchmark category - Currently supportsparentsorlocationsplit: Information on whether the query triple belongs totrain,testorvalidsplit.linklogic_explanations: A subset oflinklogic_featureswhich are linklogic explanations based on the positive coeficient score from the surrogate model.linklogic_params: A copy of theparams.jsonfile used to generate the explanations for reproducibility.
- For each
If you use LinkLogic in a scientific publication, we would appreciate citations to the following paper:
@article{kumar-singh2024linklogic,
title={LinkLogic: A New Method and Benchmark for Explainable Knowledge Graph Predictions},
author={Kumar-Singh, Niraj and Polleti, Gustavo and Paliwal, Saee and Hodos-Nkhereanye, Rachel},
journal={arXiv preprint arXiv:2406.00855},
year={2024}
}This project is licensed under the MIT License.