This repo contains the data for the paper "Machine Translation Between High-resource Languages in a Language Documentation Setting" (@FieldMatters2022).
Specifically, it contains English--Portuguese MT datasets that can be used to benchmark models for the translation of transcriptions of informal, colloquial speech. There are development and test files, training is supposed to happen on other datasets.