Skip to content

Conversation

@vEnhance
Copy link
Contributor

@vEnhance vEnhance commented Jan 3, 2024

Starting on #1578, WIP

@DimitriPapadopoulos
Copy link
Collaborator

#3270 makes all typos (but not suggested fixes) lowercase. Does your implantation require #3270 as a prior?

@vEnhance
Copy link
Contributor Author

vEnhance commented Jan 3, 2024

#3270 makes all typos (but not suggested fixes) lowercase. Does your implantation require #3270 as a prior?

It should be independent. The implementation of this PR only affects the reading of -I and -L options and leaves everything else the same.

In fact, the build_dict function that parses the built-in dictionaries currently reads (this is not changed by this PR):

def build_dict(
    filename: str,
    misspellings: Dict[str, Misspelling],
    ignore_words: Set[str],
) -> None:
    with open(filename, encoding="utf-8") as f:
        translate_tables = [(x, str.maketrans(x, y)) for x, y in alt_chars]
        for line in f:
            [key, data] = line.split("->")
            # TODO for now, convert both to lower. Someday we can maybe add
            # support for fixing caps.
            key = key.lower()
            data = data.lower()
            if key not in ignore_words:
                add_misspelling(key, data, misspellings)
            # generate alternative misspellings/fixes
            for x, table in translate_tables:
                if x in key:
                    alt_key = key.translate(table)
                    alt_data = data.translate(table)
                    if alt_key not in ignore_words:
                        add_misspelling(alt_key, alt_data, misspellings)

that is, when reading the dictionaries included with codespell, actually everything is already converted into lowercase anyway.

@vEnhance vEnhance marked this pull request as ready for review January 3, 2024 19:10
DimitriPapadopoulos

This comment was marked as resolved.

@larson

This comment was marked as resolved.

@DimitriPapadopoulos
Copy link
Collaborator

@larsoner Would you like to have a look?

@larsoner
Copy link
Member

larsoner commented Jan 8, 2024

Code changes look reasonable. There is some risk that someone with custom dictionaries will get different behavior with this PR, though. Safest would be to add a switch to make this opt-in like --ignore-words-case-sensitive or similar. Less safe but maybe useful (or maybe YAGNI) would be to add a --ignore-words-case-insensitive that could get back the old behavior, but people can also get the old behavior just by modifying their dictionaries. I think I can live with the backward incompatible change but wanted to make sure we thought about it first!

@vEnhance
Copy link
Contributor Author

vEnhance commented Jan 8, 2024

There is some risk that someone with custom dictionaries will get different behavior with this PR, though.

So I believe the changes in this PR should only affect words passed by config or command line options, which is why I stated in #3272 (comment) this was independent of #3270. Similarly, there should be no backwards-incompatability with custom dictionaries either, because those are read using build_dict which has remained unchanged.

The only users who should see any changes are people who have specified, say, something like SEH in -I which, as mentioned previously (e.g. #1578 (comment)), actually did nothing. I can't actually imagine any situations where someone would add a word into an ignore list and want/expect it to have no effect.

Please let me know if I misunderstood.

@larsoner larsoner merged commit 87edf05 into codespell-project:master Jan 9, 2024
@larsoner
Copy link
Member

larsoner commented Jan 9, 2024

Thanks @vEnhance !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants