All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] Fix compilation with python-3 if en_US.UTF-8 is unavailable
@ 2018-06-15  4:40 Matthias Maier
  2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca Matthias Maier
  2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py Matthias Maier
  0 siblings, 2 replies; 10+ messages in thread
From: Matthias Maier @ 2018-06-15  4:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel P . Berrange, Eduardo Habkost

Hi,

This new version of the patch is now also fully python2 compatible...

Original message:

  This patch series,
   - removes the PYTHON_UTF8 workaround introduced in d4e5ec877ca
   - adds a different workaround that avoids the locale problem altogether by
     opening files in binary read/write mode and setting encoding/decoding
     (in utf-8) explicitly

  The problem with setting

    LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8

  is that the en_US.UTF-8 locale might not be available. In this case setting
  above locales results in build errors even though another UTF-8 locale was
  originally set [1].

  We propose a different approach to fix the locale dependent encode/decode
  problem in common.py utilizing the binary read/write mode [2,3] and
  decode/encode with explicit UTF-8 encoding arguments [4].

  This approach is preferred over the fix in commit d4e5ec877ca because it is
  (a) locale independent, and (b) does not depend on the en_US.UTF_8 locale
  to be available.

  Best,
  Matthias and Arfrever


[1] https://bugs.gentoo.org/657766
[2] https://docs.python.org/3.6/library/stdtypes.html#bytes.decode
[3] https://docs.python.org/3.6/library/stdtypes.html#str.encode
[4] https://docs.python.org/3/howto/unicode.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca
  2018-06-15  4:40 [Qemu-devel] [PATCH v2 0/2] Fix compilation with python-3 if en_US.UTF-8 is unavailable Matthias Maier
@ 2018-06-15  4:40 ` Matthias Maier
  2018-06-15  9:42   ` Daniel P. Berrangé
  2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py Matthias Maier
  1 sibling, 1 reply; 10+ messages in thread
From: Matthias Maier @ 2018-06-15  4:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Eduardo Habkost, Matthias Maier,
	Arfrever Frehtes Taifersar Arahesis

This commit removes the PYTHON_UTF8 workaround. The problem with setting

  LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8

is that the en_US.UTF-8 locale might not be available. In this case
setting above locales results in build errors even though another UTF-8
locale was originally set [1]. The only stable way of fixing the
encoding problem is by explicitly annotating encoding/decoding in the
python script.

[1] https://bugs.gentoo.org/657766

Signed-off-by: Arfrever Frehtes Taifersar Arahesis <arfrever.fta@gmail.com>
Signed-off-by: Matthias Maier <tamiko@43-1.org>
---
 Makefile               | 6 ++----
 tests/Makefile.include | 6 +++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index e46f2b625a..7ed9cc4a21 100644
--- a/Makefile
+++ b/Makefile
@@ -20,8 +20,6 @@ ifneq ($(wildcard config-host.mak),)
 all:
 include config-host.mak
 
-PYTHON_UTF8 = LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8 $(PYTHON)
-
 git-submodule-update:
 
 .PHONY: git-submodule-update
@@ -576,7 +574,7 @@ qga/qapi-generated/qga-qapi-commands.h qga/qapi-generated/qga-qapi-commands.c \
 qga/qapi-generated/qga-qapi-doc.texi: \
 qga/qapi-generated/qapi-gen-timestamp ;
 qga/qapi-generated/qapi-gen-timestamp: $(SRC_PATH)/qga/qapi-schema.json $(qapi-py)
-	$(call quiet-command,$(PYTHON_UTF8) $(SRC_PATH)/scripts/qapi-gen.py \
+	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-gen.py \
 		-o qga/qapi-generated -p "qga-" $<, \
 		"GEN","$(@:%-timestamp=%)")
 	@>$@
@@ -676,7 +674,7 @@ qapi/qapi-introspect.h qapi/qapi-introspect.c \
 qapi/qapi-doc.texi: \
 qapi-gen-timestamp ;
 qapi-gen-timestamp: $(qapi-modules) $(qapi-py)
-	$(call quiet-command,$(PYTHON_UTF8) $(SRC_PATH)/scripts/qapi-gen.py \
+	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-gen.py \
 		-o "qapi" -b $<, \
 		"GEN","$(@:%-timestamp=%)")
 	@>$@
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 607afe5bed..b5121f2851 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -674,13 +674,13 @@ tests/test-qapi-events.c tests/test-qapi-events.h \
 tests/test-qapi-introspect.c tests/test-qapi-introspect.h: \
 tests/test-qapi-gen-timestamp ;
 tests/test-qapi-gen-timestamp: $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json $(qapi-py)
-	$(call quiet-command,$(PYTHON_UTF8) $(SRC_PATH)/scripts/qapi-gen.py \
+	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-gen.py \
 		-o tests -p "test-" $<, \
 		"GEN","$(@:%-timestamp=%)")
 	@>$@
 
 tests/qapi-schema/doc-good.test.texi: $(SRC_PATH)/tests/qapi-schema/doc-good.json $(qapi-py)
-	$(call quiet-command,$(PYTHON_UTF8) $(SRC_PATH)/scripts/qapi-gen.py \
+	$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-gen.py \
 		-o tests/qapi-schema -p "doc-good-" $<, \
 		"GEN","$@")
 	@mv tests/qapi-schema/doc-good-qapi-doc.texi $@
@@ -938,7 +938,7 @@ check-tests/qemu-iotests-quick.sh: tests/qemu-iotests-quick.sh qemu-img$(EXESUF)
 .PHONY: $(patsubst %, check-%, $(check-qapi-schema-y))
 $(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: $(SRC_PATH)/%.json
 	$(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts \
-		$(PYTHON_UTF8) $(SRC_PATH)/tests/qapi-schema/test-qapi.py \
+		$(PYTHON) $(SRC_PATH)/tests/qapi-schema/test-qapi.py \
 		$^ >$*.test.out 2>$*.test.err; \
 		echo $$? >$*.test.exit, \
 		"TEST","$*.out")
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py
  2018-06-15  4:40 [Qemu-devel] [PATCH v2 0/2] Fix compilation with python-3 if en_US.UTF-8 is unavailable Matthias Maier
  2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca Matthias Maier
@ 2018-06-15  4:40 ` Matthias Maier
  2018-06-15 15:31   ` Markus Armbruster
  1 sibling, 1 reply; 10+ messages in thread
From: Matthias Maier @ 2018-06-15  4:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Eduardo Habkost, Matthias Maier,
	Arfrever Frehtes Taifersar Arahesis

This is a different approach to fix the locale dependent encode/decode
problem in common.py utilizing the binary read/write mode [1,2] and
decode/encode with explicit UTF-8 encoding arguments [3].

This approach is preferred over the fix in commit d4e5ec877ca because it
is (a) locale independent, and (b) does not depend on the en_US.UTF_8
locale to be available.

[1] https://docs.python.org/3.6/library/stdtypes.html#bytes.decode
[2] https://docs.python.org/3.6/library/stdtypes.html#str.encode
[3] https://docs.python.org/3/howto/unicode.html

Signed-off-by: Arfrever Frehtes Taifersar Arahesis <arfrever.fta@gmail.com>
Signed-off-by: Matthias Maier <tamiko@43-1.org>
---
 scripts/qapi/common.py | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 2462fc0291..44270cd703 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -16,6 +16,7 @@ import errno
 import os
 import re
 import string
+import sys
 from collections import OrderedDict
 
 builtin_types = {
@@ -259,6 +260,8 @@ class QAPISchemaParser(object):
         previously_included.append(os.path.abspath(fp.name))
         self.incl_info = incl_info
         self.src = fp.read()
+        if sys.version_info[0] >= 3:
+            self.src = self.src.decode("UTF-8")
         if self.src == '' or self.src[-1] != '\n':
             self.src += '\n'
         self.cursor = 0
@@ -340,7 +343,7 @@ class QAPISchemaParser(object):
             return None
 
         try:
-            fobj = open(incl_fname, 'r')
+            fobj = open(incl_fname, 'rb')
         except IOError as e:
             raise QAPISemError(info, '%s: %s' % (e.strerror, incl_fname))
         return QAPISchemaParser(fobj, previously_included, info)
@@ -1492,7 +1495,7 @@ class QAPISchemaEvent(QAPISchemaEntity):
 class QAPISchema(object):
     def __init__(self, fname):
         self._fname = fname
-        parser = QAPISchemaParser(open(fname, 'r'))
+        parser = QAPISchemaParser(open(fname, 'rb'))
         exprs = check_exprs(parser.exprs)
         self.docs = parser.docs
         self._entity_list = []
@@ -2006,9 +2009,11 @@ class QAPIGen(object):
                 if e.errno != errno.EEXIST:
                     raise
         fd = os.open(pathname, os.O_RDWR | os.O_CREAT, 0o666)
-        f = os.fdopen(fd, 'r+')
+        f = os.fdopen(fd, 'r+b')
         text = (self._top(fname) + self._preamble + self._body
                 + self._bottom(fname))
+        if sys.version_info[0] >= 3:
+            text = text.encode("UTF-8")
         oldtext = f.read(len(text) + 1)
         if text != oldtext:
             f.seek(0)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca
  2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca Matthias Maier
@ 2018-06-15  9:42   ` Daniel P. Berrangé
  2018-06-15 13:20     ` Matthias Maier
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel P. Berrangé @ 2018-06-15  9:42 UTC (permalink / raw)
  To: Matthias Maier
  Cc: qemu-devel, Eduardo Habkost, Arfrever Frehtes Taifersar Arahesis

On Thu, Jun 14, 2018 at 11:40:41PM -0500, Matthias Maier wrote:
> This commit removes the PYTHON_UTF8 workaround. The problem with setting
> 
>   LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8
> 
> is that the en_US.UTF-8 locale might not be available. In this case

What platform are you using where  UTF8 locale is not available ?

Indeed I would ideally like to make the entire of QEMU build with an
explicit en_US.UTF-8 or C.UTF-8 locale, to ensure that we get reliably
reproducible builds, as locale differences have been known to impact
output of many tools not just python.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca
  2018-06-15  9:42   ` Daniel P. Berrangé
@ 2018-06-15 13:20     ` Matthias Maier
  2018-06-15 15:17       ` Markus Armbruster
  2018-06-15 15:20       ` Daniel P. Berrangé
  0 siblings, 2 replies; 10+ messages in thread
From: Matthias Maier @ 2018-06-15 13:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Eduardo Habkost, Arfrever Frehtes Taifersar Arahesis


On Fri, Jun 15, 2018, at 04:42 CDT, Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Thu, Jun 14, 2018 at 11:40:41PM -0500, Matthias Maier wrote:
>> This commit removes the PYTHON_UTF8 workaround. The problem with setting
>> 
>>   LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8
>> 
>> is that the en_US.UTF-8 locale might not be available. In this case
>
> What platform are you using where  UTF8 locale is not available ?

For example, neither Debian (and for that matter Ubuntu) nor Gentoo
guarantee that the en_US.UTF-8 locale is available.

We in particular encounter build problems on Gentoo when users have only
set very specific, non en_US locales, for example de_DE.UTF-8 (or
similar).

> Indeed I would ideally like to make the entire of QEMU build with an
> explicit en_US.UTF-8 or C.UTF-8 locale, to ensure that we get reliably
> reproducible builds, as locale differences have been known to impact
> output of many tools not just python.

We face the same problem in Gentoo and usually advice users to set
LC_ALL=C when submitting bug reports. (It is frustrating that glibc
upstream doesn't get their act together fixing and merging the current
C.UTF-8 proposal.)

So what about making the build system more robust (by merging the
patches, or a variant) and either setting C.UTF-8, or C globally
(depending on availability)?

Best,
Matthias

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca
  2018-06-15 13:20     ` Matthias Maier
@ 2018-06-15 15:17       ` Markus Armbruster
  2018-06-15 15:51         ` Matthias Maier
  2018-06-15 15:20       ` Daniel P. Berrangé
  1 sibling, 1 reply; 10+ messages in thread
From: Markus Armbruster @ 2018-06-15 15:17 UTC (permalink / raw)
  To: Matthias Maier
  Cc: Daniel P. Berrangé,
	Arfrever Frehtes Taifersar Arahesis, qemu-devel, Eduardo Habkost

"Partially revert"?  Which part isn't reverted?

Matthias Maier <tamiko@43-1.org> writes:

> On Fri, Jun 15, 2018, at 04:42 CDT, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
>> On Thu, Jun 14, 2018 at 11:40:41PM -0500, Matthias Maier wrote:
>>> This commit removes the PYTHON_UTF8 workaround. The problem with setting
>>> 
>>>   LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8
>>> 
>>> is that the en_US.UTF-8 locale might not be available. In this case

The workaround is from

commit d4e5ec877ca698a87dabe68814c6f93668f50c60
Author: Daniel P. Berrange <berrange@redhat.com>
Date:   Tue Jan 16 13:42:11 2018 +0000

    qapi: force a UTF-8 locale for running Python
    
    Python2 did not validate locale correctness when reading input data, so
    would happily read UTF-8 data in non-UTF-8 locales. Python3 is strict so
    if you try to read UTF-8 data in the C locale, it will raise an error
    for any UTF-8 bytes that aren't representable in 7-bit ascii encoding.
    e.g.

    More background on this can be seen in
    
      https://www.python.org/dev/peps/pep-0538/
    
    Many distros support a new C.UTF-8 locale that is like the C locale,
    but with UTF-8 instead of 7-bit ASCII. That is not entirely portable
    though. This patch thus sets the LANG to "C", but overrides LC_CTYPE
    to be en_US.UTF-8 locale. This gets us pretty close to C.UTF-8, but
    in a way that should be portable to everywhere QEMU builds.
    
    This patch only forces UTF-8 for QAPI scripts, since that is the one
    showing the immediate error under Python3 with C locale, but potentially
    we ought to force this for all python scripts used in the build process.

It's still used only for running QAPI generators.

As far as I can tell, the only non-ASCII input characters are:

* qapi/trace.json

  # Copyright (C) 2011-2016 Lluís Vilanova <vilanova@ac.upc.edu>

* tests/qapi-schema/escape-too-big.json

  # { 'command': 'é' }

* tests/qapi-schema/unicode-str.json

  { 'command': 'é' }

I believe these characters made Dan put in the workaround.  We could get
rid of them if the fix is too onerous (I haven't really looked, yet).

>> What platform are you using where  UTF8 locale is not available ?
>
> For example, neither Debian (and for that matter Ubuntu) nor Gentoo
> guarantee that the en_US.UTF-8 locale is available.
>
> We in particular encounter build problems on Gentoo when users have only
> set very specific, non en_US locales, for example de_DE.UTF-8 (or
> similar).
>
>> Indeed I would ideally like to make the entire of QEMU build with an
>> explicit en_US.UTF-8 or C.UTF-8 locale, to ensure that we get reliably
>> reproducible builds, as locale differences have been known to impact
>> output of many tools not just python.
>
> We face the same problem in Gentoo and usually advice users to set
> LC_ALL=C when submitting bug reports. (It is frustrating that glibc
> upstream doesn't get their act together fixing and merging the current
> C.UTF-8 proposal.)
>
> So what about making the build system more robust (by merging the
> patches, or a variant) and either setting C.UTF-8, or C globally
> (depending on availability)?

"git-grep -Fi .utf-8" coughs up

    ui/gtk.c:    setlocale(LC_CTYPE, "C.UTF-8");

This runs whenever you pick -display gtk.  Errors are ignored, though.

commit 27b224a61f97faabbd20bdf72c0c1a3dbe400cd1
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Tue Jan 31 11:09:45 2017 +0100

    gtk: Hardcode LC_CTYPE as C.utf-8
    
    Commit 2cb5d2a4 removed setlocale() for everything except LC_MESSAGES in
    order to avoid unwanted side effects such as using the wrong decimal
    separator in generated JSON objects. However, the problem that unsetting
    LC_CTYPE caused is that non-ASCII characters are considered
    non-printable now and therefore the GTK menus display question marks for
    accented letters, Chinese characters etc.
    
    A first attempt to fix this [1] was rejected because even just setting
    LC_CTYPE to the user's locale (and thereby modifying the semantics of
    the ctype.h functions) could have unwanted effects that we're not aware
    of yet.
    
    Recently, however, glibc introduced a new locale "C.utf-8" that just
    uses UTF-8 as its charset, but otherwise leaves the semantics alone.
    Just setting the right character set is enough for our use case, so we
    can just hardcode this one without having to be afraid of nasty side
    effects.
    
    Older systems that don't have the new locale will continue displaying
    question marks, but this should fix the problem for most users.
    
    [1] https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03591.html
        ('Re: gtk: use setlocale() for LC_MESSAGES only')
    
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Message-id: 20170131100945.8189-1-kwolf@redhat.com
    
    [ kraxel: change C.utf-8 to C.UTF-8 ]
    
    Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Looks like your frustration about upstream glibc is quite stale :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca
  2018-06-15 13:20     ` Matthias Maier
  2018-06-15 15:17       ` Markus Armbruster
@ 2018-06-15 15:20       ` Daniel P. Berrangé
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2018-06-15 15:20 UTC (permalink / raw)
  To: Matthias Maier
  Cc: qemu-devel, Eduardo Habkost, Arfrever Frehtes Taifersar Arahesis

On Fri, Jun 15, 2018 at 08:20:36AM -0500, Matthias Maier wrote:
> 
> On Fri, Jun 15, 2018, at 04:42 CDT, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Thu, Jun 14, 2018 at 11:40:41PM -0500, Matthias Maier wrote:
> >> This commit removes the PYTHON_UTF8 workaround. The problem with setting
> >> 
> >>   LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8
> >> 
> >> is that the en_US.UTF-8 locale might not be available. In this case
> >
> > What platform are you using where  UTF8 locale is not available ?
> 
> For example, neither Debian (and for that matter Ubuntu) nor Gentoo
> guarantee that the en_US.UTF-8 locale is available.
> 
> We in particular encounter build problems on Gentoo when users have only
> set very specific, non en_US locales, for example de_DE.UTF-8 (or
> similar).
> 
> > Indeed I would ideally like to make the entire of QEMU build with an
> > explicit en_US.UTF-8 or C.UTF-8 locale, to ensure that we get reliably
> > reproducible builds, as locale differences have been known to impact
> > output of many tools not just python.
> 
> We face the same problem in Gentoo and usually advice users to set
> LC_ALL=C when submitting bug reports. (It is frustrating that glibc
> upstream doesn't get their act together fixing and merging the current
> C.UTF-8 proposal.)
> 
> So what about making the build system more robust (by merging the
> patches, or a variant) and either setting C.UTF-8, or C globally
> (depending on availability)?

Yes, if we could figure out a way to check for existance of locales, we
could make configure check for C.UTF-*, en_US.UTF-8, C  in that order,
using the first it finds to work. Fun fact, on macOS 'C' is always
UTF-8, so its valid to use just 'C'.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py
  2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py Matthias Maier
@ 2018-06-15 15:31   ` Markus Armbruster
  2018-06-15 21:55     ` Matthias Maier
  0 siblings, 1 reply; 10+ messages in thread
From: Markus Armbruster @ 2018-06-15 15:31 UTC (permalink / raw)
  To: Matthias Maier
  Cc: qemu-devel, Eduardo Habkost, Arfrever Frehtes Taifersar Arahesis

Matthias Maier <tamiko@43-1.org> writes:

> This is a different approach to fix the locale dependent encode/decode
> problem in common.py utilizing the binary read/write mode [1,2] and
> decode/encode with explicit UTF-8 encoding arguments [3].

Why can't we simply pass encoding='utf-8' to open()?

> This approach is preferred over the fix in commit d4e5ec877ca because it
> is (a) locale independent, and (b) does not depend on the en_US.UTF_8
> locale to be available.
>
> [1] https://docs.python.org/3.6/library/stdtypes.html#bytes.decode
> [2] https://docs.python.org/3.6/library/stdtypes.html#str.encode
> [3] https://docs.python.org/3/howto/unicode.html
>
> Signed-off-by: Arfrever Frehtes Taifersar Arahesis <arfrever.fta@gmail.com>
> Signed-off-by: Matthias Maier <tamiko@43-1.org>
> ---
>  scripts/qapi/common.py | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index 2462fc0291..44270cd703 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -16,6 +16,7 @@ import errno
>  import os
>  import re
>  import string
> +import sys
>  from collections import OrderedDict
>  
>  builtin_types = {
> @@ -259,6 +260,8 @@ class QAPISchemaParser(object):
>          previously_included.append(os.path.abspath(fp.name))
>          self.incl_info = incl_info
>          self.src = fp.read()
> +        if sys.version_info[0] >= 3:
> +            self.src = self.src.decode("UTF-8")

If I understand 7.2.3. Standard Encodings[*] correctly, the canonical
name is "utf-8".  Let's use that.  Wait, it's the default, no need to
pass an argument.

>          if self.src == '' or self.src[-1] != '\n':
>              self.src += '\n'
>          self.cursor = 0
> @@ -340,7 +343,7 @@ class QAPISchemaParser(object):
>              return None
>  
>          try:
> -            fobj = open(incl_fname, 'r')
> +            fobj = open(incl_fname, 'rb')
>          except IOError as e:
>              raise QAPISemError(info, '%s: %s' % (e.strerror, incl_fname))
>          return QAPISchemaParser(fobj, previously_included, info)
> @@ -1492,7 +1495,7 @@ class QAPISchemaEvent(QAPISchemaEntity):
>  class QAPISchema(object):
>      def __init__(self, fname):
>          self._fname = fname
> -        parser = QAPISchemaParser(open(fname, 'r'))
> +        parser = QAPISchemaParser(open(fname, 'rb'))
>          exprs = check_exprs(parser.exprs)
>          self.docs = parser.docs
>          self._entity_list = []
> @@ -2006,9 +2009,11 @@ class QAPIGen(object):
>                  if e.errno != errno.EEXIST:
>                      raise
>          fd = os.open(pathname, os.O_RDWR | os.O_CREAT, 0o666)
> -        f = os.fdopen(fd, 'r+')
> +        f = os.fdopen(fd, 'r+b')
>          text = (self._top(fname) + self._preamble + self._body
>                  + self._bottom(fname))
> +        if sys.version_info[0] >= 3:
> +            text = text.encode("UTF-8")

Likewise.

>          oldtext = f.read(len(text) + 1)
>          if text != oldtext:
>              f.seek(0)

[*] https://docs.python.org/3/library/codecs.html#standard-encodings

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca
  2018-06-15 15:17       ` Markus Armbruster
@ 2018-06-15 15:51         ` Matthias Maier
  0 siblings, 0 replies; 10+ messages in thread
From: Matthias Maier @ 2018-06-15 15:51 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	Arfrever Frehtes Taifersar Arahesis, qemu-devel, Eduardo Habkost


On Fri, Jun 15, 2018, at 10:17 CDT, Markus Armbruster <armbru@redhat.com> wrote:

> "Partially revert"?  Which part isn't reverted?

Yes, it ended up being a full revert of the commit in question. I am
sorry for the sloppy wording.

> [...]
>>    Recently, however, glibc introduced a new locale "C.utf-8" that just
>>    uses UTF-8 as its charset, but otherwise leaves the semantics alone.
>>    Just setting the right character set is enough for our use case, so we
>>    can just hardcode this one without having to be afraid of nasty side
>>    effects.

> Looks like your frustration about upstream glibc is quite stale :)

Unfortunately, this statement is not correct. The corresponding glibc
bug report summarizes the current situation [1]. Fact is that a lot of
distributions ship a custom C.UTF-8 locale, for example Debian [2] (for
the currenct glibc-2.27 release). Unfortunately, not everyone applies
such custom patches. :-/

Best,
Matthias

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=17318
[2] https://sources.debian.org/patches/glibc/2.27-3/localedata/locale-C.diff/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py
  2018-06-15 15:31   ` Markus Armbruster
@ 2018-06-15 21:55     ` Matthias Maier
  0 siblings, 0 replies; 10+ messages in thread
From: Matthias Maier @ 2018-06-15 21:55 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Eduardo Habkost, Arfrever Frehtes Taifersar Arahesis


On Fri, Jun 15, 2018, at 10:31 CDT, Markus Armbruster <armbru@redhat.com> wrote:

> If I understand 7.2.3. Standard Encodings[*] correctly, the canonical
> name is "utf-8".  Let's use that.  Wait, it's the default, no need to
> pass an argument.

Roger. I will change this in v3.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-06-15 21:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-15  4:40 [Qemu-devel] [PATCH v2 0/2] Fix compilation with python-3 if en_US.UTF-8 is unavailable Matthias Maier
2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 1/2] Partially revert commit d4e5ec877ca Matthias Maier
2018-06-15  9:42   ` Daniel P. Berrangé
2018-06-15 13:20     ` Matthias Maier
2018-06-15 15:17       ` Markus Armbruster
2018-06-15 15:51         ` Matthias Maier
2018-06-15 15:20       ` Daniel P. Berrangé
2018-06-15  4:40 ` [Qemu-devel] [PATCH v2 2/2] qapi: open files in binary mode and use explicit decoding/encoding in common.py Matthias Maier
2018-06-15 15:31   ` Markus Armbruster
2018-06-15 21:55     ` Matthias Maier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.