Skip to content

Commit 91bde90

Browse files
authored
Use UTF-8 as default with ScriptEngine (#962)
* Use UTF-8 as default with ScriptEngine * Restore changes in PythonContext
1 parent f2e8032 commit 91bde90

File tree

3 files changed

+15
-9
lines changed

3 files changed

+15
-9
lines changed

Src/IronPythonTest/EngineTest.cs

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -423,17 +423,22 @@ public void ScenarioCodePlex20472() {
423423
#if NETCOREAPP
424424
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
425425
#endif
426-
426+
// This test file is encoded in Windows codepage 1251 (Cyrilic) but lacks a magic comment (PEP-263)
427+
string fileName = Path.Combine(Path.Combine(Common.ScriptTestDirectory, "encoded_files"), "cp20472.py");
427428
try {
428-
string fileName = Path.Combine(Path.Combine(Common.ScriptTestDirectory, "encoded_files"), "cp20472.py");
429-
_pe.CreateScriptSourceFromFile(fileName, Encoding.GetEncoding(1251)).Compile();
429+
_pe.CreateScriptSourceFromFile(fileName).Compile();
430430

431-
//Disabled. The line above should have thrown a syntax exception or an import error,
432-
//but does not.
431+
// The line above should have thrown a syntax exception or an import error
432+
// because the default source file encoding is UTF-8 and there are decoding errors
433433

434-
//throw new Exception("ScenarioCodePlex20472");
434+
throw new Exception("ScenarioCodePlex20472");
435435
}
436-
catch (IronPython.Runtime.Exceptions.ImportException) { }
436+
catch (SyntaxErrorException) { }
437+
438+
// Opening the file with explicitly specifying the correct encoding should work
439+
CompiledCode prog = _pe.CreateScriptSourceFromFile(fileName, Encoding.GetEncoding(1251)).Compile();
440+
prog.Execute();
441+
Assert.AreEqual(prog.DefaultScope.GetVariable("s"), "\u041F\u0436\u0451");
437442
}
438443

439444
[Test]

Tests/encoded_files/cp20472.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
pi = 3.14 #This module is completely broken because of: Ïæ¸
2-
#AKA encoding 1251
2+
#AKA encoding 1251
3+
s = "Ïæ¸"

WhatsNewInPython30.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Text Vs. Data Instead Of Unicode Vs. 8-bit
3636
- [ ] Filenames are passed to and returned from APIs as (Unicode) strings. This can present platform-specific problems because on some platforms filenames are arbitrary byte strings. (On the other hand, on Windows filenames are natively stored as Unicode.) As a work-around, most APIs (e.g. `open()` and many functions in the `os` module) that take filenames accept bytes objects as well as strings, and a few APIs have a way to ask for a `bytes` return value. Thus, `os.listdir()` returns a list of `bytes` instances if the argument is a `bytes` instance, and `os.getcwdb()` returns the current working directory as a `bytes` instance. Note that when `os.listdir()` returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising `UnicodeError`.
3737
- [ ] Some system APIs like `os.environ` and `sys.argv` can also present problems when the bytes made available by the system is not interpretable using the default encoding. Setting the `LANG` variable and rerunning the program is probably the best approach.
3838
- [x] [PEP 3138][]: The `repr()` of a string no longer escapes non-ASCII characters. It still escapes control characters and code points with non-printable status in the Unicode standard, however.
39-
- [ ] [PEP 3120][]: The default source encoding is now UTF-8.
39+
- [x] [PEP 3120][]: The default source encoding is now UTF-8.
4040
- [x] [PEP 3131][]: Non-ASCII letters are now allowed in identifiers. (However, the standard library remains ASCII-only with the exception of contributor names in comments.)
4141
- [x] The `StringIO` and `cStringIO` modules are gone. Instead, import the `io` module and use `io.StringIO` or `io.BytesIO` for text and data respectively.
4242

0 commit comments

Comments
 (0)