This is a model for semantic dependency parsing based on the syntax-agnostic model of Marcheggiani et al (2017). See https://arxiv.org/abs/1701.02593 and https://github.com/diegma/neural-dep-srl. This model achieves state-of-the-art results by a) using a better regularized LSTM with highway layers, as in He et al (20017), and b) incorporating supertags extracted from dependency trees.
- Python 2.7
- TensorFlow 1.0
You need your own copy of the CoNLL-2009 dataset to train the model. Here are the steps to prepare the data for a given language (here, English):
- From the
tag_srldirectory, create the directoriesdata/eng/conll09/("eng" for English). - Add the CoNLL text files to the directory and name them
train.txt,dev.txt,test.txt, andood.txt, if there is an out-of-domain dataset. - Run
./scripts/preprocess.sh engto extract supertags from the data and generate vocab files. - You also need a copy of the pre-trained word embeddings used in the paper. Download the
sskip.100.vectorsfile from https://github.com/diegma/neural-dep-srl and put it indata/eng/embeddings/sskip.100.vectors. (Instructions for other languages forthcoming.) - To replicate our results, you really need to use predicted predicates and supertags and put them in
data/eng/conll09/pred/with namesdev_predicates.txt,dev_stags_model1.txt, etc. a. We used mate-tools to get predicted predicates. You need to download the SRL pipeline (srl-4.31.tgz) and the English model file (linked from here). b. Contact me to get predicted supertags.
Once you have all of the code, you can train a model by running (from the root directory)
python model/train.py
See python model/train.py --help for all of the possible options. The default arguments are all the same as the best hyperparemeters from Marcheggiani et al (2017), but I found that the model performs even better if you add dropout between layers of the bidirectional LSTM (python model/train.py --dropout 0.5).
Training on one Nvidia Tesla K80 GPU, with a batch size of 100, the model took around 13 minutes per epoch, and our best models converged after 3-6 hours of training.
Once you've trained a model, the trained model will be saved in a directory in output/models/. To test the model, run
python model/test.py output/models/model_name test_split
test_split should be dev, test, ood (out-of-domain), or train.
The code for this project is organized as follows:
model/train.pyparses command line arguments, resolves filepaths, and then callsutils/vocab.pyto get vocabulary objects for words, parts of speech, etc. AVocabobject has methods for mapping words to integer ids and back.- Then
model/train.pycallsmodel/srl.pyto initialize theSRL_Modelobject.SRL_Modelbuilds a neural network using generic neural network functions frommodel/layers.pyandmodel/lstm.py. model/srl.pyuses functions fromutil/conll_io.pyandutil/data_loader.pyto load CoNLL sentences into the model. a.util/conll_io.pydefines methods for reading and writing CoNLL-09 format sentences and storing them in a convenient format, along with predicted predicates and supertags. b.util/data_loader.pyconverts lists of CoNLL sentences to batches of integer ids that can be fed to the neural network.- The
SRL_Modelalso has methods for running a training or testing epoch. In a testing epoch, the model writes its predictions to a file (usuallyoutput/predictions/{model_name}.txt), in CoNLL format, so it can be evaluated with the CoNLL evaluation script. - After each epoch,
model/train.pycalls the CoNLL-provided perl evaluation script (from a python wrapper ineval/eval.py) and decides whether or not to stop early. model/test.pyjust rebuilds a trained model and runs a testing epoch with the specified data.- The predicate disambiguation directory,
model/disamb/, is organized in the same way, with one file,model/disamb/disamb.py, to build the network and run training and testing epochs, and one file each for training and testing the model.