Skip to content

Conversation

@bdbaddog
Copy link
Contributor

@bdbaddog bdbaddog commented Jul 21, 2025

Issue #4746 User with system configured for chinese locale is experiencing failures compiling as the compiler expects the tempfile to be locale encoded.

Allow user to set env['TEMPFILEENCODING'] to an alternate encoding if necessary, this should allow a work around, if not an automatic fix to the issue.

Contributor Checklist:

  • I have created a new test or updated the unit tests to cover the new/changed functionality.
  • I have updated CHANGES.txt and RELEASE.txt (and read the README.rst).
  • I have updated the appropriate documentation

….getpreferredencoding(False) instead of hardcoded utf-8
@bdbaddog bdbaddog requested a review from mwichmann July 21, 2025 03:51
@bdbaddog bdbaddog added the TEMPFILE Long-command-line handling with TEMPFILE label Jul 21, 2025
@mwichmann
Copy link
Collaborator

I think it's better to use 'oem' codec on windows than try to query. utf-8-everywhere is becoming more pervasive, next year's python will have even getpreferredencoding start lying unless you start python with special options.

@bdbaddog bdbaddog changed the title [WIP] Fix Issue #4747 by changing encoding of tempfile's to use locale.getpeferredencoding(False) instead of hardcoded utf-8 [WIP] Fix Issue #4746 by changing encoding of tempfile's to use locale.getpeferredencoding(False) instead of hardcoded utf-8 Jul 26, 2025
@mwichmann
Copy link
Collaborator

I guess the question we have to resolve to progress this is whether to "try to guess" or whether to leave a setting other than UTF-8 entirely to the developer via a construction variable.

@mwichmann mwichmann force-pushed the fix_4746_TEMPFILE_ENC_ISSUE branch from 9590769 to 6f01e94 Compare July 28, 2025 15:36
Alternate approach to avoid decoding problems: don't try to guess how
to encode the TEMPFILE.  Leave it utf-8 unless TEMPFILEENCODING is set.

Add comment to TEMPFILEDIR doc on the topic as well.

Signed-off-by: Mats Wichmann <mats@linux.com>
@mwichmann mwichmann force-pushed the fix_4746_TEMPFILE_ENC_ISSUE branch from 6f01e94 to c34c4ab Compare July 28, 2025 15:45
@mwichmann
Copy link
Collaborator

Updated based on the latter idea: don't guess, just provide a way for the user to specify an encoding.

….getpreferredencoding(False) instead of hardcoded utf-8
@bdbaddog bdbaddog force-pushed the fix_4746_TEMPFILE_ENC_ISSUE branch from c34c4ab to 0e29e4f Compare July 31, 2025 19:59
@mwichmann
Copy link
Collaborator

???

@bdbaddog bdbaddog changed the title [WIP] Fix Issue #4746 by changing encoding of tempfile's to use locale.getpeferredencoding(False) instead of hardcoded utf-8 [WIP] Fix Issue #4746 by providing env var to allow user to specify non-default encoding Aug 2, 2025
@bdbaddog bdbaddog changed the title [WIP] Fix Issue #4746 by providing env var to allow user to specify non-default encoding Provide workaround for Issue #4746 by providing env var to allow user to specify non-default encoding Aug 3, 2025
@bdbaddog bdbaddog merged commit ef5ab0b into SCons:master Aug 3, 2025
8 of 9 checks passed
@jcbrill
Copy link
Contributor

jcbrill commented Aug 3, 2025

Prior to this PR, it was unlikely that the tempfile creation would fail when writing the contents as utf-8.

That is no longer the case. Possible exceptions now include UnicodeEncodeError, LookupError, and TypeError.

In simple (fast) tests, any of these errors can cause the removal of the tempfile to fail at exit and orphan the empty files at least on windows which produces additional error messages.

The tracebacks for these exceptions are lengthy and not particularly user-friendly when raised using compiler response files as the call chain is pretty deep. Perhaps an SCons error (without traceback?) might be more friendly.

Test cases are relatively straightforward.

The following SConstruct is offered as inspiration for how tests could be added to the test suite.

UnicodeEncodeError is trivial to produce when the encoding is ascii.

This was only tested on Windows.

SConstruct:

import os
import time
import traceback
import SCons.Platform

SLEEP_WHEN_DONE = 10

DefaultEnvironment()

print("\nBegin encoding tests...")

command = 'xyz "/test€/file.c"'

for cmd, encoding, exception in [

    # expect success
    (command, 'utf-8',      None),
    (command, 'utf-8-sig',  None),
    (command, 'utf-16',     None),

    # expect failure
    (command, 'ascii',      UnicodeEncodeError),
    (command, 'badecoding', LookupError),
    (command, '',           LookupError),
    (command, None,         TypeError),

    # display traceback
    (command, 'ascii',      None),  # UnicodeEncodeError
    (command, 'badecoding', None),  # LookupError
    (command, None,         None),  # TypeError

]:

    tfm = SCons.Platform.TempFileMunge(cmd)

    env = Environment(
        tools=[],
        MAXLINELENGTH=2,
        TEMPFILEENCODING=encoding,
    )

    exception_t = tuple([exception] if exception else [])

    print(f"\n===== test: cmd={cmd!r}, encoding={encoding!r}, exception={tuple([e.__name__ for e in exception_t])!r}")
    try:
        _ = tfm(None, None, env, 0)
        print("expected pass")
    except Exception as e:
        expected = type(e) in exception_t
        if expected:
            print(f"expected fail {type(e).__name__!r}: {str(e)!r}")
        else:
            print(f"UNEXPECTED FAIL {type(e).__name__!r}: {str(e)!r}")
            traceback.print_exc()
    print(f"-----")

print("\nEnd encoding tests.\n")

if SLEEP_WHEN_DONE > 0:
    print(f"Begin sleep({SLEEP_WHEN_DONE})...")
    time.sleep(SLEEP_WHEN_DONE)
    print("End sleep.\n")

Output log:

Python 3.13.2
scons: Reading SConscript files ...

Begin encoding tests...

===== test: cmd='xyz "/test€/file.c"', encoding='utf-8', exception=()
Using tempfile S:\SCons\test-cp936\scons-tempfile\tmp_xd4o6up.lnk for command line:
xyz "/test€/file.c"
expected pass
-----

===== test: cmd='xyz "/test€/file.c"', encoding='utf-8-sig', exception=()
Using tempfile S:\SCons\test-cp936\scons-tempfile\tmp7plvixoz.lnk for command line:
xyz "/test€/file.c"
expected pass
-----

===== test: cmd='xyz "/test€/file.c"', encoding='utf-16', exception=()
Using tempfile S:\SCons\test-cp936\scons-tempfile\tmp0uvrd00q.lnk for command line:
xyz "/test€/file.c"
expected pass
-----

===== test: cmd='xyz "/test€/file.c"', encoding='ascii', exception=('UnicodeEncodeError',)
expected fail 'UnicodeEncodeError': "'ascii' codec can't encode character '\\u20ac' in position 6: ordinal not in range(128)"
-----

===== test: cmd='xyz "/test€/file.c"', encoding='badecoding', exception=('LookupError',)
expected fail 'LookupError': 'unknown encoding: badecoding'
-----

===== test: cmd='xyz "/test€/file.c"', encoding='', exception=('LookupError',)
expected fail 'LookupError': 'unknown encoding: '
-----

===== test: cmd='xyz "/test€/file.c"', encoding=None, exception=('TypeError',)
expected fail 'TypeError': "bytes() argument 'encoding' must be str, not None"
-----

===== test: cmd='xyz "/test€/file.c"', encoding='ascii', exception=()
UNEXPECTED FAIL 'UnicodeEncodeError': "'ascii' codec can't encode character '\\u20ac' in position 6: ordinal not in range(128)"
Traceback (most recent call last):
  File "S:\SCons\test-cp936\scons-tempfile\SConstruct", line 46, in <module>
    _ = tfm(None, None, env, 0)
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 288, in __call__
    os.write(fd, bytes(join_char.join(args) + "\n", encoding=encoding))
                 ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 6: ordinal not in range(128)
-----

===== test: cmd='xyz "/test€/file.c"', encoding='badecoding', exception=()
UNEXPECTED FAIL 'LookupError': 'unknown encoding: badecoding'
Traceback (most recent call last):
  File "S:\SCons\test-cp936\scons-tempfile\SConstruct", line 46, in <module>
    _ = tfm(None, None, env, 0)
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 288, in __call__
    os.write(fd, bytes(join_char.join(args) + "\n", encoding=encoding))
                 ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LookupError: unknown encoding: badecoding
-----

===== test: cmd='xyz "/test€/file.c"', encoding=None, exception=()
UNEXPECTED FAIL 'TypeError': "bytes() argument 'encoding' must be str, not None"
Traceback (most recent call last):
  File "S:\SCons\test-cp936\scons-tempfile\SConstruct", line 46, in <module>
    _ = tfm(None, None, env, 0)
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 288, in __call__
    os.write(fd, bytes(join_char.join(args) + "\n", encoding=encoding))
                 ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: bytes() argument 'encoding' must be str, not None
-----

End encoding tests.

Begin sleep(10)...
End sleep.

scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC0A3E0>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmp9ciqihzq.lnk'
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC09C60>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmpanfv_7pl.lnk'
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC09BC0>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmp3pp8gapx.lnk'
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC098A0>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmp15xvtb60.lnk'
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC09580>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmplju1e38d.lnk'
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC09260>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmpm9uqdvs1.lnk'
Exception ignored in atexit callback <function TempFileMunge.__call__.<locals>.tmpfile_cleanup at 0x000001E6CEC08E00>:
Traceback (most recent call last):
  File "S:\SCons\scons-master\scripts\..\SCons\Platform\__init__.py", line 270, in tmpfile_cleanup
    os.remove(file)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'S:\\SCons\\test-cp936\\scons-tempfile\\tmpqbjn9ykb.lnk'

@jcbrill
Copy link
Contributor

jcbrill commented Aug 3, 2025

P.S.: Reordering the code to convert the contents to bytes before creating the tempfile might alleviate the removal/orphan issue as any exceptions would be raised prior to creation.

@mwichmann
Copy link
Collaborator

encoding first seems okay. the encoding can also be done with an error handler (possibly backslahescape) to avoid exceptions. but not sure we're worse off than before. problems only happen if the encoding var is set, and at that point, up to the developer to get it right.

@bdbaddog
Copy link
Contributor Author

bdbaddog commented Aug 4, 2025

If we encode before and allow it to throw an exception, it could be handled and suggest the user set a proper TEMPFILEENCODING ? and exit gracefully?

@jcbrill
Copy link
Contributor

jcbrill commented Aug 4, 2025

If we encode before and allow it to throw an exception, it could be handled and suggest the user set a proper TEMPFILEENCODING ? and exit gracefully?

Exactly.

Prototype changes:

  • Move tempfile contents encoding before tempfile file creation and save tempfile contents in variable
  • Catch expected errors and raise SCons.Errors.UserError
  • write tempfile contents after file creation

SCons\Platform\__init__.py example:

...
        if node and hasattr(node.attributes, 'tempfile_cmdlist'):
            cmdlist = node.attributes.tempfile_cmdlist.get(cmdlist_key, None)
        if cmdlist is not None:
            return cmdlist

+       tempfile_esc_func = env.get('TEMPFILEARGESCFUNC', SCons.Subst.quote_spaces)
+       args = [tempfile_esc_func(arg) for arg in cmd[1:]]
+       join_char = env.get('TEMPFILEARGJOIN', ' ')
+       encoding = env.get('TEMPFILEENCODING', 'utf-8')
+
+       try:
+           tempfile_contents= bytes(join_char.join(args) + "\n", encoding=encoding) 
+       except (UnicodeEncodeError, LookupError, TypeError) as e:
+           errmsg = f"tempfile contents encoding error (TEMPFILEENCODING={encoding!r}):\n  {type(e).__name__}: {str(e)}"
+           raise SCons.Errors.UserError(errmsg) from None

        # Default to the .lnk suffix for the benefit of the Phar Lap
        # linkloc linker, which likes to append an .lnk suffix if
        # none is given.

...

        if 'TEMPFILEPREFIX' in env:
            prefix = env.subst('$TEMPFILEPREFIX')
        else:
            prefix = "@"

-       tempfile_esc_func = env.get('TEMPFILEARGESCFUNC', SCons.Subst.quote_spaces)
-       args = [tempfile_esc_func(arg) for arg in cmd[1:]]
-       join_char = env.get('TEMPFILEARGJOIN', ' ')
-       encoding = env.get('TEMPFILEENCODING', 'utf-8')
-       os.write(fd, bytes(join_char.join(args) + "\n", encoding=encoding))
+       os.write(fd, tempfile_contents)
        os.close(fd)

        # XXX Using the SCons.Action.print_actions value directly
        # like this is bogus, but expedient.  This class should
...

Modified SConstruct:

import traceback
import SCons.Platform
import SCons.Errors

DefaultEnvironment()

print("\nBegin encoding tests...")

command = 'xyz "/test€/file.c"'

for cmd, encoding in [

    # expect success
    (command, 'utf-8'),
    (command, 'utf-8-sig'),
    (command, 'utf-16'),

    # expect failure
    (command, 'ascii'),
    (command, 'badecoding'),
    (command, ''),
    (command, None),

]:

    tfm = SCons.Platform.TempFileMunge(cmd)

    env = Environment(
        tools=[],
        MAXLINELENGTH=2,
        TEMPFILEENCODING=encoding,
    )

    print(f"\n===== test: cmd={cmd!r}, encoding={encoding!r})")
    try:
        _ = tfm(None, None, env, 0)
    except SCons.Errors.UserError as e:
        print(f"{type(e).__name__}: {e.args[0]}")
    except Exception as e:
        print(f"{type(e).__name__}: {e.args[0]}")
        traceback.print_exc()
    print(f"-----")

print("\nEnd encoding tests.\n")

Modified Output:

Python 3.13.2
scons: Reading SConscript files ...

Begin encoding tests...

===== test: cmd='xyz "/test€/file.c"', encoding='utf-8')
Using tempfile S:\SCons\test-cp936\scons-tempfile\tmp541uz07a.lnk for command line:
xyz "/test€/file.c"
-----

===== test: cmd='xyz "/test€/file.c"', encoding='utf-8-sig')
Using tempfile S:\SCons\test-cp936\scons-tempfile\tmpe0lb9nzg.lnk for command line:
xyz "/test€/file.c"
-----

===== test: cmd='xyz "/test€/file.c"', encoding='utf-16')
Using tempfile S:\SCons\test-cp936\scons-tempfile\tmp1m85owmm.lnk for command line:
xyz "/test€/file.c"
-----

===== test: cmd='xyz "/test€/file.c"', encoding='ascii')
UserError: tempfile contents encoding error (TEMPFILEENCODING='ascii'):
  UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 6: ordinal not in range(128)
-----

===== test: cmd='xyz "/test€/file.c"', encoding='badecoding')
UserError: tempfile contents encoding error (TEMPFILEENCODING='badecoding'):
  LookupError: unknown encoding: badecoding
-----

===== test: cmd='xyz "/test€/file.c"', encoding='')
UserError: tempfile contents encoding error (TEMPFILEENCODING=''):
  LookupError: unknown encoding: 
-----

===== test: cmd='xyz "/test€/file.c"', encoding=None)
UserError: tempfile contents encoding error (TEMPFILEENCODING=None):
  TypeError: bytes() argument 'encoding' must be str, not None
-----

End encoding tests.

scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.

@mwichmann
Copy link
Collaborator

We should move this elsewhere, bad form to continue a discussion in a merged/closed PR.

I don't think we need to account for user stupidity setting an invalid codec - just look for the encoding error.

@jcbrill
Copy link
Contributor

jcbrill commented Aug 4, 2025

I don't think we need to account for user stupidity setting an invalid codec - just look for the encoding error.

LookupError can be caused by a legitimate codec name that happens to not be installed on the target machine. Not sure that qualifies user stupidity (e.g., third-party library).

The os.remove of the temporary file should probably suppress any exceptions.

With the exception of the temporary file removal errors, the remaining topics were mentioned ad nauseum in the issue thread.

We should move this elsewhere, bad form to continue a discussion in a merged/closed PR.

Fair enough.

All of this would/could have been discovered with unit tests which there are none. These one-line fixes never seem to be as clean as intended.

@mwichmann
Copy link
Collaborator

There are quite a few TEMPFILE end-to-end tests. Unit tests are trickier because, well, the effects of this function aren't standalone ("unit") - it's setup for something that happens elsewhere. Sure, you can mock some of it, as you have done. I always had the impression the whole thing was a kind of emergency hack from when Windows users started having their builds fail and wasn't architected as carefully. But whatever... that's long in the past. Yes, the removal should be done "safely" and not kill scons while it's shutting down.

@bdbaddog
Copy link
Contributor Author

bdbaddog commented Aug 4, 2025

Locking the PR please move discussion to #4752

@SCons SCons locked as resolved and limited conversation to collaborators Aug 4, 2025
@mwichmann mwichmann added this to 4.10 Aug 11, 2025
@mwichmann mwichmann moved this to Done in 4.10 Aug 11, 2025
@mwichmann mwichmann added this to the NextRelease milestone Aug 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

TEMPFILE Long-command-line handling with TEMPFILE

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

PlatformIO STM32 Build Fails on Windows Chinese Version: TMP File Encoding Issue

4 participants