You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge
in order to correctly answer a text question and associated image. Recent single modality text work has
shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph
embeddings, can improve performance on downstream entity-centric tasks. In this work, we empirically
study how and whether such methods, applied in a bi-modal setting, can improve an existing VQA system's
performance on the KBVQA task. We experiment with two large publicly available VQA datasets, (1) KVQA
which contains mostly rare Wikipedia entities and (2) OKVQA which is less entity-centric and more aligned
with common sense reasoning. Both lack explicit entity spans and we study the effect of different weakly
supervised and manual methods for obtaining them. Additionally we analyze how recently proposed bi-modal
and single modal attention explanations are affected by the incorporation of such entity enhanced
representations. Our results show substantial improved performance on the KBVQA task without the need
for additional costly pre-training and we provide insights for when entity knowledge injection helps
improve a model's understanding.
install dependencies
1. make conda or virtualenv for project
2. install dependencies
pip install -r requirements.txt
download data/models
# as per LXMERT codebase (https://github.com/airsplay/lxmert) get their pre-trained model
mkdir -p snap/pretrained
wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P snap/pretrained
# download data
See OKVQA and KVQA websites respectively
# OKVQA - https://okvqa.allenai.org/download.html
# KVQA - http://malllabiisc.github.io/resources/kvqa/
set paths for KVQA and OKVQA
for training models set to paths to downloaded data in the following for KVQA
1. src/tasks/kvqa_data.py ( KVQA_DATA_ROOT , KVQA_IMGFEAT_ROOT, abs_path )
2. src/pretrain/qa_answer_table.py ( abs_path )
3. src/lxrt/entry.py ( DATA_RESOURCE_DIR )
for evaluating models in integrated Bi-Modal attention explanation system, ( i refer to this as the integrated way) set the following paths.
1. Transformer-MM-Explainability-main/lxmert/lxmert/perturb_kvqa.py ( KVQA_VAL_PATH, KVQA_IMGFEAT_ROOT, KVQA_URL, load_lxmert_qa_hf)
2. Transformer-MM-Explainability-main/lxmert/lxmert/src/lxmert_lrp_ebert.py ( def load(cls, path), DATA_RESOURCE_DIR )
3. Transformer-MM-Explainability-main/lxmert/lxmert/src/tasks/kvqa_data.py ( KVQA_DATA_ROOT, KVQA_IMGFEAT_ROOT, abs_path )
# FOR OKVQA set the following:
1. src/tasks/okvqa_data.py ( OKVQA_DATA_ROOT , OKVQA_IMGFEAT_ROOT, abs_path )
2. src/pretrain/qa_answer_table.py ( abs_path )
3. src/lxrt/entry.py ( DATA_RESOURCE_DIR )
for evaluating models in integrated Bi-Modal attention explanation system, set the following paths.
1. Transformer-MM-Explainability-main/lxmert/lxmert/perturb_okvqa.py ( OKVQA_VAL_PATH, OKVQA_IMGFEAT_ROOT, OKVQA_URL, load_lxmert_qa_hf)
2. Transformer-MM-Explainability-main/lxmert/lxmert/src/lxmert_lrp_ebert.py ( def load(cls, path), DATA_RESOURCE_DIR )
3. Transformer-MM-Explainability-main/lxmert/lxmert/src/tasks/kvqa_data.py ( OKVQA_DATA_ROOT, OKVQA_IMGFEAT_ROOT, abs_path )
to train / test models on KVQA
There are 5 types of input sets you can finetune LXMERT with
1) plain Question,
2) Question with captions,
3) NERper ( yasu2 )
4) NERagro ( sep13_3few )
5) KVQAmeta ( oracle )
The final three use the input format of 2), but with additioanl E-BERT knowledge injection based on the entity sets provided.
You can specify what type of entity set linking to use ( "plain", "link", "noisy") for the E-BERT methods as well.
The --ent_set parameter defines which E-BERT model/linktype to use
[ None, "oracle_links", "oracle_noisy", <-- KVQAmeta (as is, links, noisy)
"sep13_3", "sept13_fewkb_links", "sept13_fewkb", <--- NERagro (as is, links, noisy)
"yasu2", "yasu2links", "yasu2noisy"] <--- NERper (as is, links, noisy)
KVQA has 5 data splits to choose from [0 - 4] )
#Example calls running from root below:
# finetune on KVQA split 4 ( on GPU 1 ) plain Question
bash run/kvqa_finetune.bash 1 kvqa_plain_sp4 4
# and test model
bash run/kvqa_test.bash 1 kvqa_plain_sp4_results 4 --test test_kvqa --load snap/kvqa/kvqa_plain_sp4_4/BEST
# finetune on KVQA split 0 ( on GPU 3 ) Question with captions
bash run/kvqa_finetune.bash 3 kvqa_capt_sp0 0 --incl_caption --max_len 100
# and test model
bash run/kvqa_test.bash 3 kvqa_capt_sp0_results 0 --incl_caption --max_len 100 --test test_kvqa --load snap/kvqa/kvqa_capt_sp0_0/BEST
# finetune on KVQA split 3 ( on GPU 2 ) with EBERT ( defaults to KVQAmeta "as is" )
bash run/kvqa_finetune.bash 2 kvqa_ebert_sp3 3 --use_lm ebert --max_len 100
# and test model
bash run/kvqa_test.bash 2 kvqa_ebert_sp3_results 3 --use_lm ebert --max_len 100 --test test_kvqa --load snap/kvqa/kvqa_ebert_sp3_3/BEST
# finetune on KVQA split 0 ( on GPU 2 ) with EBERT NERper with noisy linktype
bash run/kvqa_finetune.bash 2 kvqa_ebert_nerper_noisy_sp3 3 --use_lm ebert --ent_set yasu2noisy --max_len 100
# and test model
bash run/kvqa_test.bash 2 kvqa_ebert_nerper_noisy_sp3_results 3 --use_lm ebert --ent_set yasu2noisy --max_len 100 --test test_kvqa --load snap/kvqa/kvqa_ebert_nerper_noisy_sp3_3/BEST
for evaluating finetuned models with explanations from integrated Bi-Modal attention explanation system
# code base that I expanded is originally from ( https://github.com/hila-chefer/Transformer-MM-Explainability )
# the parameters to use are most identical to that of training
cd Transformer-Explainability
# standard KVQA on data split 0
CUDA_VISIBLE_DEVICES=1 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation_kvqa.py --split_num 0 --test test_kvqa --load /home/diego/adv_comp_viz21/lxmert/orig_code/lxmert/snap/kvqa/kvqa_plain_sp0_0/BEST --pred_out experiments/kvqa_plain_0_830.json
--> Done. Elapsed: 878.91 ( 15 minutes )
--> top1/top5 Acc/Raw: [47.27, 66.79, 8838, 12487, 18697]
# KVQA data with captions ( split 0 )
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation_kvqa.py --num-samples=0 --incl_caption --method ours_no_lrp --is-text-pert true --is-positive-pert true --split_num 0 --test test_kvqa --load /home/diego/adv_comp_viz21/lxmert/orig_code/lxmert/snap/kvqa/kvqa_capt_sp0_0/BEST --pred_out experiments/kvqa_capt_0_830.json
--> Done. Elapsed: 1727.004583120346 ( 20 minutes )
--> top1/top5 Acc/Raw: [48.82, 67.36, 9128, 12595, 18697]
# KVQA data with entity enhanced EBERT reps using NERagro as is
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation_kvqa.py --use_lm ebert --ent_set sep13_3 --split_num 0 --test test_kvqa --load ../snap/kvqa/kvqa_sep13_3span_0v5_0/BEST --pred_out experiments/kvqa_ebert_0_sep13_3.json
Finetune/Test/Get Explainations for OKVQA
#Same params as KVQA except ent_set is in [ None, 4k, 2p5k ] #defaults to 13k
# Example finetune OKVQA on questions without knowledge injection, test and get explanations
bash run/okvqa_finetune.bash 3 okvqa_plain_0913_10epsR4 --max_len 50
bash run/okvqa_test.bash 3 okvqa_plain_0913_10eps_resR4 --max_len 50 --test test_tv --load snap/okvqa/okvqa_plain_0913_10epsR4/LAST
CUDA_VISIBLE_DEVICES=2 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation_okvqa.py --test val --load /home/diego/adv_comp_viz21/lxmert/orig_code/lxmert_gen/snap/okvqa/okvqa_plain_0913_10epsR4/LAST --pred_out experiments/okvqa_ebert_0913_10epsR4.json
# Example: finetune OKVQA on GPU 3 using E-BERT and 13k ent set, test and get explanations
bash run/okvqa_finetune.bash 3 okvqa_ebert_13k_0918 --use_lm ebert --max_len 50
bash run/okvqa_test.bash 3 okvqa_ebert_13k_0918_results --use_lm ebert --max_len 50 --test test_tv --load snap/okvqa/okvqa_ebert_13k_0918/LAST
CUDA_VISIBLE_DEVICES=2 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation_okvqa.py --use_lm ebert --test val --load /home/diego/adv_comp_viz21/lxmert/orig_code/lxmert_gen/snap/okvqa/okvqa_ebert_13k_0918/LAST --pred_out experiments/okvqa_ebert_13k_0918.json
# Example: finetune OKVQA on GPU 2 using E-BERT and 4k ent set, test and get explanations
bash run/okvqa_finetune.bash 2 okvqa_ebert_4k_0918 --use_lm ebert --ent_set 4k --max_len 50
bash run/okvqa_test.bash 2 okvqa_ebert_4k_0918_results --use_lm ebert --ent_set 4k --max_len 50 --test test_tv --load snap/okvqa/okvqa_ebert_4k_0918/LAST
CUDA_VISIBLE_DEVICES=2 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation_okvqa.py --use_lm ebert --ent_set 4k --test val --load /home/diego/adv_comp_viz21/lxmert/orig_code/lxmert_gen/snap/okvqa/okvqa_ebert_4k_0918/LAST --pred_out experiments/okvqa_ebert_4k_0918.json
2 notebooks used for analysis of KVQA and OK-VQA tasks
# Note these two notebooks need to be cleaned up
notebooks/kvqa_streamed.ipynb
notebooks/okvqa_automated_metrics.ipynb
We'll will put up all datafiles soon!
About
code/figures for ACM WWW 2022 work on "Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection. "