Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)
The Temporal dataset we use is in the fold data/.
Question: The question.Gold answer: The answer.passages: The retrieved passages from wikipedia.
- The CoT and retrieval-augmented CoT results are given in the fold
results/, where thechain_of_thought_gpt3indicates the responses.
-
For SKR_prompt and SKR_icl, we use the prompts shown in the paper to elicit the self-knowledge of the dev data directly.
-
For SKR_cls, we use the training data and train a BERT classifier to elicit the self-knowledge of the dev data. We use the settings with
lr=2e-5andepochs=10. -
For SKR_knn, the steps are as follows:
- cd
source/, collect the self-knowledge of the training data, runskr.pyand get thetrain_skr.jsonfile. - run
knn.pyto use the self-knowledge to the dev data and get thedev_skr_knn.jsonfile. - run
eval_skr.pyto evaluate the results.
- cd
@inproceedings{wang-etal-2023-self-knowledge,
title = "Self-Knowledge Guided Retrieval Augmentation for Large Language Models",
author = "Wang, Yile and Li, Peng and Sun, Maosong and Liu, Yang",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.691",
pages = "10303--10315",
}