* [PATCH v2] decodetree: Open files with encoding='utf-8'
@ 2021-01-08 18:09 Philippe Mathieu-Daudé
2021-01-08 18:58 ` Eduardo Habkost
0 siblings, 1 reply; 3+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-08 18:09 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Eduardo Habkost, Richard Henderson,
Philippe Mathieu-Daudé,
Yonggang Luo, Cleber Rosa, Paolo Bonzini
When decodetree.py was added in commit 568ae7efae7, QEMU was
using Python 2 which happily reads UTF-8 files in text mode.
Python 3 requires either UTF-8 locale or an explicit encoding
passed to open(). Now that Python 3 is required, explicit
UTF-8 encoding for decodetree source files.
To avoid further problems with the user locale, also explicit
UTF-8 encoding for the generated C files.
Explicit both input/output are plain text by using the 't' mode.
This fixes:
$ /usr/bin/python3 scripts/decodetree.py test.decode
Traceback (most recent call last):
File "scripts/decodetree.py", line 1397, in <module>
main()
File "scripts/decodetree.py", line 1308, in main
parse_file(f, toppat)
File "scripts/decodetree.py", line 994, in parse_file
for line in f:
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
ordinal not in range(128)
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
v2: utf-8 output too (Peter)
explicit default text mode.
---
scripts/decodetree.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d1..d3857066cfc 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -1304,7 +1304,7 @@ def main():
for filename in args:
input_file = filename
- f = open(filename, 'r')
+ f = open(filename, 'rt', encoding='utf-8')
parse_file(f, toppat)
f.close()
@@ -1324,7 +1324,7 @@ def main():
prop_size(stree)
if output_file:
- output_fd = open(output_file, 'w')
+ output_fd = open(output_file, 'wt', encoding='utf-8')
else:
output_fd = sys.stdout
--
2.26.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] decodetree: Open files with encoding='utf-8'
2021-01-08 18:09 [PATCH v2] decodetree: Open files with encoding='utf-8' Philippe Mathieu-Daudé
@ 2021-01-08 18:58 ` Eduardo Habkost
2021-01-09 5:41 ` 罗勇刚(Yonggang Luo)
0 siblings, 1 reply; 3+ messages in thread
From: Eduardo Habkost @ 2021-01-08 18:58 UTC (permalink / raw)
To: Philippe Mathieu-Daudé
Cc: Peter Maydell, Richard Henderson, qemu-devel, Yonggang Luo,
Cleber Rosa, Paolo Bonzini
On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote:
> When decodetree.py was added in commit 568ae7efae7, QEMU was
> using Python 2 which happily reads UTF-8 files in text mode.
> Python 3 requires either UTF-8 locale or an explicit encoding
> passed to open(). Now that Python 3 is required, explicit
> UTF-8 encoding for decodetree source files.
>
> To avoid further problems with the user locale, also explicit
> UTF-8 encoding for the generated C files.
>
> Explicit both input/output are plain text by using the 't' mode.
I believe the 't' is unnecessary. But it's harmless and makes it
more explicit.
>
> This fixes:
>
> $ /usr/bin/python3 scripts/decodetree.py test.decode
> Traceback (most recent call last):
> File "scripts/decodetree.py", line 1397, in <module>
> main()
> File "scripts/decodetree.py", line 1308, in main
> parse_file(f, toppat)
> File "scripts/decodetree.py", line 994, in parse_file
> for line in f:
> File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
> ordinal not in range(128)
>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
However:
> ---
> v2: utf-8 output too (Peter)
> explicit default text mode.
> ---
> scripts/decodetree.py | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..d3857066cfc 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -1304,7 +1304,7 @@ def main():
>
> for filename in args:
> input_file = filename
> - f = open(filename, 'r')
> + f = open(filename, 'rt', encoding='utf-8')
> parse_file(f, toppat)
> f.close()
>
> @@ -1324,7 +1324,7 @@ def main():
> prop_size(stree)
>
> if output_file:
> - output_fd = open(output_file, 'w')
> + output_fd = open(output_file, 'wt', encoding='utf-8')
> else:
> output_fd = sys.stdout
This will still use the user locale encoding for sys.stdout. Can
be solved with:
output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
(Based on a suggestion from Yonggang Luo)
--
Eduardo
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] decodetree: Open files with encoding='utf-8'
2021-01-08 18:58 ` Eduardo Habkost
@ 2021-01-09 5:41 ` 罗勇刚(Yonggang Luo)
0 siblings, 0 replies; 3+ messages in thread
From: 罗勇刚(Yonggang Luo) @ 2021-01-09 5:41 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Peter Maydell, Richard Henderson, Philippe Mathieu-Daudé,
qemu-level, Cleber Rosa, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 3297 bytes --]
On Fri, Jan 8, 2021 at 10:58 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>
> On Fri, Jan 08, 2021 at 07:09:52PM +0100, Philippe Mathieu-Daudé wrote:
> > When decodetree.py was added in commit 568ae7efae7, QEMU was
> > using Python 2 which happily reads UTF-8 files in text mode.
> > Python 3 requires either UTF-8 locale or an explicit encoding
> > passed to open(). Now that Python 3 is required, explicit
> > UTF-8 encoding for decodetree source files.
> >
> > To avoid further problems with the user locale, also explicit
> > UTF-8 encoding for the generated C files.
> >
> > Explicit both input/output are plain text by using the 't' mode.
>
> I believe the 't' is unnecessary. But it's harmless and makes it
> more explicit.
>
> >
> > This fixes:
> >
> > $ /usr/bin/python3 scripts/decodetree.py test.decode
> > Traceback (most recent call last):
> > File "scripts/decodetree.py", line 1397, in <module>
> > main()
> > File "scripts/decodetree.py", line 1308, in main
> > parse_file(f, toppat)
> > File "scripts/decodetree.py", line 994, in parse_file
> > for line in f:
> > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> > return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
80:
> > ordinal not in range(128)
> >
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>
> However:
>
> > ---
> > v2: utf-8 output too (Peter)
> > explicit default text mode.
> > ---
> > scripts/decodetree.py | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> > index 47aa9caf6d1..d3857066cfc 100644
> > --- a/scripts/decodetree.py
> > +++ b/scripts/decodetree.py
> > @@ -1304,7 +1304,7 @@ def main():
> >
> > for filename in args:
> > input_file = filename
> > - f = open(filename, 'r')
> > + f = open(filename, 'rt', encoding='utf-8')
> > parse_file(f, toppat)
> > f.close()
> >
> > @@ -1324,7 +1324,7 @@ def main():
> > prop_size(stree)
> >
> > if output_file:
> > - output_fd = open(output_file, 'w')
> > + output_fd = open(output_file, 'wt', encoding='utf-8')
I misunderstand the cause, this is a better way
> > else:
> > output_fd = sys.stdout
>
> This will still use the user locale encoding for sys.stdout. Can
> be solved with:
>
> output_fd = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
For output to console/terminal. I suggest to use
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,
encoding=sys.stdout.encoding, errors="ignore")
When the console/terminal encoding still can not represent the char in the
decodetree, still won't
cause script failure. And that failure can not be fixed by other means.
errors="ignore" are important, from my experince, even there is `char`
can not represent
in utf8
>
> (Based on a suggestion from Yonggang Luo)
>
> --
> Eduardo
>
--
此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo
[-- Attachment #2: Type: text/html, Size: 4498 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-09 5:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-08 18:09 [PATCH v2] decodetree: Open files with encoding='utf-8' Philippe Mathieu-Daudé
2021-01-08 18:58 ` Eduardo Habkost
2021-01-09 5:41 ` 罗勇刚(Yonggang Luo)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.