fix: Handle UnicodeDecodeError in PlainTextConverter (#1505) #1540

s3ich4n · 2026-01-21T20:11:18Z

When charset detection samples only the first 4096 bytes and detects ascii, but the file contains UTF-8 characters beyond that point, decoding fails with UnicodeDecodeError.

Added fallback to charset_normalizer when UnicodeDecodeError occurs, allowing proper handling of files with non-ASCII characters Spanish, Korean, Japanese, Chinese, etc.)
that appear after the 4096-byte sample.

fixes #1505

When charset detection samples only the first 4096 bytes and detects 'ascii', but the file contains UTF-8 characters beyond that point, decoding fails with UnicodeDecodeError. Added fallback to charset_normalizer when UnicodeDecodeError occurs, allowing proper handling of files with non-ASCII characters Spanish, Korean, Japanese, Chinese, etc.) that appear after the 4096-byte sample.

s3ich4n · 2026-01-21T20:14:34Z

@microsoft-github-policy-service agree

…1505)

s3ich4n added 2 commits January 22, 2026 05:19

style: fix linting (microsoft#1505)

fb6596d

chore: Include charset fallback test in __main__ runner (microsoft#…

a3adc91

…1505)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Handle UnicodeDecodeError in PlainTextConverter (#1505) #1540

fix: Handle UnicodeDecodeError in PlainTextConverter (#1505) #1540

s3ich4n commented Jan 21, 2026 •

edited

Loading

Uh oh!

s3ich4n commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Handle UnicodeDecodeError in PlainTextConverter (#1505) #1540

Are you sure you want to change the base?

fix: Handle UnicodeDecodeError in PlainTextConverter (#1505) #1540

Conversation

s3ich4n commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s3ich4n commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

s3ich4n commented Jan 21, 2026 •

edited

Loading