For me, the preserve newline behaviour isn't quite working as I expected (tested with the docx extractor).
I have text like this in a docx file:
2 downlighters; door to hall.
Hall
Double glazed window to front;
With preserveLineBreaks I get this output:
2 downlighters; door to hall. Hall
Double glazed window to front;
After outputting some stuff to the console I can see the newlines are there as expected but then they get parsed out.
Taking a look at how preserveLineBreaks is implemented I see it's a big, hairy regex, so not sure what it is doing at first glance. From my naive point of view it would be nicer to get the raw text output, if I need to filter further I can make my own mind. Or if there is a 'clean' function as a configuration option I could use it to override the default behaviour.