Skip to content

Regression in custom codecs + # coding: directives across 3.8, 3.9 and 3.10~3.11 #102353

@RocketRace

Description

@RocketRace

Bug report

The behavior of custom codecs used in a # coding: comment is surprising in python versions 3.10 to 3.11. Furthermore, this behavior is different in 3.8 and 3.9, where 3.8 seems to have the most "acceptable" behavior.

in a.py:

import codecs

def encode(input, errors = "strict"):
    raise NotImplementedError

def decode(input, errors = "strict"):
    print("decoding", bytes(input))
    return "x = 42", len(input)

codecs.register({"test": codecs.CodecInfo(encode, decode)}.get)

import b
print("value is", b.x)

in b.py:

# coding: test

Upon running python3 a.py, I would expect the following result:

decoding b'# coding: test\n'
value is 42

That is, the decoder should be called once, and the result of the decoder ("x = 42") should be used as the source code for b.py, hence resulting in 42 as the value of b.x.

The program has the following behavior in 3.10.10 and 3.11.2:

decoding b'# coding: test\n'
decoding b'# coding: test\n'
Traceback (most recent call last):
  File "/<path>/a.py", line 12, in <module>
    import b
  File "/<path>/b.py", line 1
    x = 42
SyntaxError: invalid syntax

The exception seems nonsensical, given x = 42 is valid syntax. Notice also that the decoder function is called twice, even though the first decode() call consumed the entire source file and the CodecInfo object has incremental decoding disabled (set to None by default).

The program also has the following behavior in 3.9.16:

decoding b'# coding: test\n'
Traceback (most recent call last):
  File "/<path>/a.py", line 12, in <module>
    import b
  File "/<path>/b.py", line 1
    # coding: test
        ^
SyntaxError: unexpected EOF while parsing

This is also surprising. The decoder is called once in this case.
And finally, it has the following behavior in 3.8.16:

decoding b'# coding: test\n'
value is 42

Note that the first decoding print is missing if the module parsing has been cached and is present in __pycache__. In that case, only value is 42 gets printed. Removing the cache directory causes the test codec to run again, causing its side effects to be executed as well. I believe this is intended behavior.

Your environment

Python versions used:

  • 3.11.2 from homebrew
  • 3.10.10 from homebrew
  • 3.9.16 from homebrew
  • 3.8.16 from homebrew

Operating system and architecture: MacOS Ventura 13.0.1, running on arm64 (M2 chip), Darwin Olivias-MacBook-Pro.local 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:52 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T8112 arm64

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions