Skip to content

gzxiong/TruthHypo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models

Preprint Dataset

News

  • Our paper is accepted to IJCAI 2025!

Table of Contents

Introduction

TruthHypo is a benchmark for assessing the capabilities of LLMs in generating truthful scientific hypotheses. This repo also contains the source code of KnowHD, a knowledge-based hallucination detector to evaluate how well hypotheses are grounded in existing knowledge. Our paper shows that LLMs struggle to generate truthful hypotheses. By analyzing hallucinations in reasoning steps, we demonstrate that the groundedness scores provided by KnowHD serve as an effective metric for filtering truthful hypotheses from the diverse outputs of LLMs.

Usage

The TruthHypo dataset is directly accessible via HuggingFace:

from datasets import load_dataset

data = load_dataset("TruthHypo/edges_test")

The processed knowledge sources for knowledge-enhanced hypothesis generation can be found at

Structure

Our repository contains the following contents:

  • data: the data of TruthHypo benchmark
    • edges_test.tsv: the test data used for LLM evaluation
  • src: the source code of agents and verifiers used in our experiments
    • agent: the LLM agents used to generated biomedical hypotheses
      • base.py: the base agent
      • cot.py: the agent using parametric knowledge only
      • kg.py: the agent using both parametric knowledge and information fromknowledge graphs
      • rag.py: the agent using both parametric knowledge and information from scientific literature
      • rag_kg.py: the agent using parametric knowledge and information from both knowledge graphs and scientific literature
    • verifier: the LLM verifiers used to measure the groundedness of generated hypotheses
      • rag_verifier.py: the verifier with scientific literature as the supporting knowledge base
      • kg_verifier.py: the verifier with knowledge graphs as the supporting knowledge base
      • rag_kg_verifier.py: the verifier with both scientific literature and knowledge graphs as the supporting knowledge base

Citation

@inproceedings{xiong2025toward,
  title     = {Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models},
  author    = {Xiong, Guangzhi and Xie, Eric and Williams, Corey and Kim, Myles and Shariatmadari, Amir Hassan and Guo, Sikun and Bekiranov, Stefan and Zhang, Aidong},
  booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on
               Artificial Intelligence, {IJCAI-25}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {James Kwok},
  pages     = {7849--7857},
  year      = {2025},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2025/873},
  url       = {https://doi.org/10.24963/ijcai.2025/873},
}

About

Official Repository of Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models (IJCAI 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages