This is about bulk processing pages which contain arbitrary text / language and we want tesseract to "auto-discover" the languages actually used on each page.
Sideways related to #15, but specific to using multiple language models.
1: check how tesseract does this exactly right now
2: these multi-lang ocr runs take a very long time, so the thought is similar to the (hacky) way white text on black background is currently handled: when the result-thus-far has a confidence below threshold 0.7, only then is the image inverted and OCR attempted again. Can we speed up "unknown languages" OCR runs using a similar threshold heuristic? --> when OCR results thus far have a confidence that's below threshold C, do try the next language in the set, in order from most to least important/frequent.