-
-
Notifications
You must be signed in to change notification settings - Fork 340
Provide workaround for Issue #4746 by providing env var to allow user to specify non-default encoding #4749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
….getpreferredencoding(False) instead of hardcoded utf-8
|
I think it's better to use 'oem' codec on windows than try to query. utf-8-everywhere is becoming more pervasive, next year's python will have even getpreferredencoding start lying unless you start python with special options. |
|
I guess the question we have to resolve to progress this is whether to "try to guess" or whether to leave a setting other than UTF-8 entirely to the developer via a construction variable. |
9590769 to
6f01e94
Compare
Alternate approach to avoid decoding problems: don't try to guess how to encode the TEMPFILE. Leave it utf-8 unless TEMPFILEENCODING is set. Add comment to TEMPFILEDIR doc on the topic as well. Signed-off-by: Mats Wichmann <mats@linux.com>
6f01e94 to
c34c4ab
Compare
|
Updated based on the latter idea: don't guess, just provide a way for the user to specify an encoding. |
….getpreferredencoding(False) instead of hardcoded utf-8
c34c4ab to
0e29e4f
Compare
|
??? |
Add consvar for TEMPFILE encoding
|
Prior to this PR, it was unlikely that the tempfile creation would fail when writing the contents as That is no longer the case. Possible exceptions now include In simple (fast) tests, any of these errors can cause the removal of the tempfile to fail at exit and orphan the empty files at least on windows which produces additional error messages. The tracebacks for these exceptions are lengthy and not particularly user-friendly when raised using compiler response files as the call chain is pretty deep. Perhaps an SCons error (without traceback?) might be more friendly. Test cases are relatively straightforward. The following SConstruct is offered as inspiration for how tests could be added to the test suite.
This was only tested on Windows. SConstruct: import os
import time
import traceback
import SCons.Platform
SLEEP_WHEN_DONE = 10
DefaultEnvironment()
print("\nBegin encoding tests...")
command = 'xyz "/test€/file.c"'
for cmd, encoding, exception in [
# expect success
(command, 'utf-8', None),
(command, 'utf-8-sig', None),
(command, 'utf-16', None),
# expect failure
(command, 'ascii', UnicodeEncodeError),
(command, 'badecoding', LookupError),
(command, '', LookupError),
(command, None, TypeError),
# display traceback
(command, 'ascii', None), # UnicodeEncodeError
(command, 'badecoding', None), # LookupError
(command, None, None), # TypeError
]:
tfm = SCons.Platform.TempFileMunge(cmd)
env = Environment(
tools=[],
MAXLINELENGTH=2,
TEMPFILEENCODING=encoding,
)
exception_t = tuple([exception] if exception else [])
print(f"\n===== test: cmd={cmd!r}, encoding={encoding!r}, exception={tuple([e.__name__ for e in exception_t])!r}")
try:
_ = tfm(None, None, env, 0)
print("expected pass")
except Exception as e:
expected = type(e) in exception_t
if expected:
print(f"expected fail {type(e).__name__!r}: {str(e)!r}")
else:
print(f"UNEXPECTED FAIL {type(e).__name__!r}: {str(e)!r}")
traceback.print_exc()
print(f"-----")
print("\nEnd encoding tests.\n")
if SLEEP_WHEN_DONE > 0:
print(f"Begin sleep({SLEEP_WHEN_DONE})...")
time.sleep(SLEEP_WHEN_DONE)
print("End sleep.\n")Output log: |
|
P.S.: Reordering the code to convert the contents to bytes before creating the tempfile might alleviate the removal/orphan issue as any exceptions would be raised prior to creation. |
|
encoding first seems okay. the encoding can also be done with an error handler (possibly |
|
If we encode before and allow it to throw an exception, it could be handled and suggest the user set a proper TEMPFILEENCODING ? and exit gracefully? |
Exactly. Prototype changes:
...
if node and hasattr(node.attributes, 'tempfile_cmdlist'):
cmdlist = node.attributes.tempfile_cmdlist.get(cmdlist_key, None)
if cmdlist is not None:
return cmdlist
+ tempfile_esc_func = env.get('TEMPFILEARGESCFUNC', SCons.Subst.quote_spaces)
+ args = [tempfile_esc_func(arg) for arg in cmd[1:]]
+ join_char = env.get('TEMPFILEARGJOIN', ' ')
+ encoding = env.get('TEMPFILEENCODING', 'utf-8')
+
+ try:
+ tempfile_contents= bytes(join_char.join(args) + "\n", encoding=encoding)
+ except (UnicodeEncodeError, LookupError, TypeError) as e:
+ errmsg = f"tempfile contents encoding error (TEMPFILEENCODING={encoding!r}):\n {type(e).__name__}: {str(e)}"
+ raise SCons.Errors.UserError(errmsg) from None
# Default to the .lnk suffix for the benefit of the Phar Lap
# linkloc linker, which likes to append an .lnk suffix if
# none is given.
...
if 'TEMPFILEPREFIX' in env:
prefix = env.subst('$TEMPFILEPREFIX')
else:
prefix = "@"
- tempfile_esc_func = env.get('TEMPFILEARGESCFUNC', SCons.Subst.quote_spaces)
- args = [tempfile_esc_func(arg) for arg in cmd[1:]]
- join_char = env.get('TEMPFILEARGJOIN', ' ')
- encoding = env.get('TEMPFILEENCODING', 'utf-8')
- os.write(fd, bytes(join_char.join(args) + "\n", encoding=encoding))
+ os.write(fd, tempfile_contents)
os.close(fd)
# XXX Using the SCons.Action.print_actions value directly
# like this is bogus, but expedient. This class should
...Modified SConstruct: import traceback
import SCons.Platform
import SCons.Errors
DefaultEnvironment()
print("\nBegin encoding tests...")
command = 'xyz "/test€/file.c"'
for cmd, encoding in [
# expect success
(command, 'utf-8'),
(command, 'utf-8-sig'),
(command, 'utf-16'),
# expect failure
(command, 'ascii'),
(command, 'badecoding'),
(command, ''),
(command, None),
]:
tfm = SCons.Platform.TempFileMunge(cmd)
env = Environment(
tools=[],
MAXLINELENGTH=2,
TEMPFILEENCODING=encoding,
)
print(f"\n===== test: cmd={cmd!r}, encoding={encoding!r})")
try:
_ = tfm(None, None, env, 0)
except SCons.Errors.UserError as e:
print(f"{type(e).__name__}: {e.args[0]}")
except Exception as e:
print(f"{type(e).__name__}: {e.args[0]}")
traceback.print_exc()
print(f"-----")
print("\nEnd encoding tests.\n")Modified Output: |
|
We should move this elsewhere, bad form to continue a discussion in a merged/closed PR. I don't think we need to account for user stupidity setting an invalid codec - just look for the encoding error. |
LookupError can be caused by a legitimate codec name that happens to not be installed on the target machine. Not sure that qualifies user stupidity (e.g., third-party library). The With the exception of the temporary file removal errors, the remaining topics were mentioned ad nauseum in the issue thread.
Fair enough. All of this would/could have been discovered with unit tests which there are none. These one-line fixes never seem to be as clean as intended. |
|
There are quite a few TEMPFILE end-to-end tests. Unit tests are trickier because, well, the effects of this function aren't standalone ("unit") - it's setup for something that happens elsewhere. Sure, you can mock some of it, as you have done. I always had the impression the whole thing was a kind of emergency hack from when Windows users started having their builds fail and wasn't architected as carefully. But whatever... that's long in the past. Yes, the removal should be done "safely" and not kill scons while it's shutting down. |
|
Locking the PR please move discussion to #4752 |
Issue #4746 User with system configured for chinese locale is experiencing failures compiling as the compiler expects the tempfile to be locale encoded.
Allow user to set env['TEMPFILEENCODING'] to an alternate encoding if necessary, this should allow a work around, if not an automatic fix to the issue.
Contributor Checklist:
CHANGES.txtandRELEASE.txt(and read theREADME.rst).