Skip to content

Preserve newline behaviour #59

@robations

Description

@robations

For me, the preserve newline behaviour isn't quite working as I expected (tested with the docx extractor).

I have text like this in a docx file:

2 downlighters; door to hall.

Hall
Double glazed window to front;

With preserveLineBreaks I get this output:

2 downlighters; door to hall. Hall
Double glazed window to front;

After outputting some stuff to the console I can see the newlines are there as expected but then they get parsed out.

Taking a look at how preserveLineBreaks is implemented I see it's a big, hairy regex, so not sure what it is doing at first glance. From my naive point of view it would be nicer to get the raw text output, if I need to filter further I can make my own mind. Or if there is a 'clean' function as a configuration option I could use it to override the default behaviour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions