All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
@ 2021-09-05  1:35 James Knight
  2021-09-05 14:08 ` Thomas Petazzoni
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: James Knight @ 2021-09-05  1:35 UTC (permalink / raw)
  To: buildroot; +Cc: James Knight

Explicitly indicate the file encoding to UTF-8 for the DEVELOPERS
document. This prevents Unicode decoding errors when printing E-Mail
entries with Unicode characters on systems using an alternative default
encoding (e.g. 'CP1252').

This corrects the following observed error:

    $ ./utils/get-developers outgoing/*
    Traceback (most recent call last):
      File "utils\get-developers", line 105, in <module>
        __main__()
      File "utils\get-developers", line 47, in __main__
        devs = getdeveloperlib.parse_developers()
      File "...\buildroot\utils\getdeveloperlib.py", line 239, in parse_developers
        for line in f:
      File "...\Python<ver>\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6659: character maps to <undefined>

Signed-off-by: James Knight <james.d.knight@live.com>
---
 utils/getdeveloperlib.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/utils/getdeveloperlib.py b/utils/getdeveloperlib.py
index b205817033d1d488c13e88f3b1bdbf82aa6b1224..08abcfed545034cb68ce708bd9aef772484238cb 100644
--- a/utils/getdeveloperlib.py
+++ b/utils/getdeveloperlib.py
@@ -1,4 +1,5 @@
 from __future__ import print_function
+from io import open
 import os
 import re
 import glob
@@ -231,7 +232,8 @@ def parse_developers():
     linen = 0
     global unittests
     unittests = list_unittests()
-    with open(os.path.join(brpath, "DEVELOPERS"), "r") as f:
+    developers_fname = os.path.join(brpath, 'DEVELOPERS')
+    with open(developers_fname, mode='r', encoding='utf_8') as f:
         files = []
         name = None
         for line in f:
-- 
2.28.0.windows.1

_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
  2021-09-05  1:35 [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding James Knight
@ 2021-09-05 14:08 ` Thomas Petazzoni
  2021-09-10 11:34 ` Peter Korsgaard
  2021-09-18 21:17 ` Peter Korsgaard
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Petazzoni @ 2021-09-05 14:08 UTC (permalink / raw)
  To: James Knight; +Cc: buildroot

On Sat,  4 Sep 2021 21:35:19 -0400
James Knight <james.d.knight@live.com> wrote:

> Explicitly indicate the file encoding to UTF-8 for the DEVELOPERS
> document. This prevents Unicode decoding errors when printing E-Mail
> entries with Unicode characters on systems using an alternative default
> encoding (e.g. 'CP1252').
> 
> This corrects the following observed error:
> 
>     $ ./utils/get-developers outgoing/*
>     Traceback (most recent call last):
>       File "utils\get-developers", line 105, in <module>
>         __main__()
>       File "utils\get-developers", line 47, in __main__
>         devs = getdeveloperlib.parse_developers()
>       File "...\buildroot\utils\getdeveloperlib.py", line 239, in parse_developers
>         for line in f:
>       File "...\Python<ver>\lib\encodings\cp1252.py", line 23, in decode
>         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
>     UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6659: character maps to <undefined>
> 
> Signed-off-by: James Knight <james.d.knight@live.com>
> ---
>  utils/getdeveloperlib.py | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Applied to master, thanks.

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
  2021-09-05  1:35 [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding James Knight
  2021-09-05 14:08 ` Thomas Petazzoni
@ 2021-09-10 11:34 ` Peter Korsgaard
  2021-09-10 14:39   ` Baruch Siach
  2021-09-18 21:17 ` Peter Korsgaard
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Korsgaard @ 2021-09-10 11:34 UTC (permalink / raw)
  To: James Knight; +Cc: buildroot

>>>>> "James" == James Knight <james.d.knight@live.com> writes:

 > Explicitly indicate the file encoding to UTF-8 for the DEVELOPERS
 > document. This prevents Unicode decoding errors when printing E-Mail
 > entries with Unicode characters on systems using an alternative default
 > encoding (e.g. 'CP1252').

 > This corrects the following observed error:

 >     $ ./utils/get-developers outgoing/*
 >     Traceback (most recent call last):
 >       File "utils\get-developers", line 105, in <module>
 >         __main__()
 >       File "utils\get-developers", line 47, in __main__
 >         devs = getdeveloperlib.parse_developers()
 >       File "...\buildroot\utils\getdeveloperlib.py", line 239, in parse_developers
 >         for line in f:
 >       File "...\Python<ver>\lib\encodings\cp1252.py", line 23, in decode
 >         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
 >     UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6659: character maps to <undefined>

 > Signed-off-by: James Knight <james.d.knight@live.com>

Committed to 2021.02.x, 2021.05.x and 2021.08.x, thanks.

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
  2021-09-10 11:34 ` Peter Korsgaard
@ 2021-09-10 14:39   ` Baruch Siach
  2021-09-10 14:44     ` Peter Korsgaard
  0 siblings, 1 reply; 7+ messages in thread
From: Baruch Siach @ 2021-09-10 14:39 UTC (permalink / raw)
  To: Peter Korsgaard; +Cc: James Knight, buildroot

Hi Peter,

On Fri, Sep 10 2021, Peter Korsgaard wrote:
>>>>>> "James" == James Knight <james.d.knight@live.com> writes:
>
>  > Explicitly indicate the file encoding to UTF-8 for the DEVELOPERS
>  > document. This prevents Unicode decoding errors when printing E-Mail
>  > entries with Unicode characters on systems using an alternative default
>  > encoding (e.g. 'CP1252').
>
>  > This corrects the following observed error:
>
>  >     $ ./utils/get-developers outgoing/*
>  >     Traceback (most recent call last):
>  >       File "utils\get-developers", line 105, in <module>
>  >         __main__()
>  >       File "utils\get-developers", line 47, in __main__
>  >         devs = getdeveloperlib.parse_developers()
>  >       File "...\buildroot\utils\getdeveloperlib.py", line 239, in parse_developers
>  >         for line in f:
>  >       File "...\Python<ver>\lib\encodings\cp1252.py", line 23, in decode
>  >         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
>  >     UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6659: character maps to <undefined>
>
>  > Signed-off-by: James Knight <james.d.knight@live.com>
>
> Committed to 2021.02.x, 2021.05.x and 2021.08.x, thanks.

Not in 2021.08.x as of 1279d2b13284 ("package/go: security bump version
to 1.16.8").

baruch

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
  2021-09-10 14:39   ` Baruch Siach
@ 2021-09-10 14:44     ` Peter Korsgaard
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Korsgaard @ 2021-09-10 14:44 UTC (permalink / raw)
  To: Baruch Siach; +Cc: James Knight, buildroot

>>>>> "Baruch" == Baruch Siach <baruch@tkos.co.il> writes:

 > Hi Peter,
 > On Fri, Sep 10 2021, Peter Korsgaard wrote:
 >>>>>>> "James" == James Knight <james.d.knight@live.com> writes:
 >> 
 >> > Explicitly indicate the file encoding to UTF-8 for the DEVELOPERS
 >> > document. This prevents Unicode decoding errors when printing E-Mail
 >> > entries with Unicode characters on systems using an alternative default
 >> > encoding (e.g. 'CP1252').
 >> 
 >> > This corrects the following observed error:
 >> 
 >> >     $ ./utils/get-developers outgoing/*
 >> >     Traceback (most recent call last):
 >> >       File "utils\get-developers", line 105, in <module>
 >> >         __main__()
 >> >       File "utils\get-developers", line 47, in __main__
 >> >         devs = getdeveloperlib.parse_developers()
 >> >       File "...\buildroot\utils\getdeveloperlib.py", line 239, in parse_developers
 >> >         for line in f:
 >> >       File "...\Python<ver>\lib\encodings\cp1252.py", line 23, in decode
 >> >         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
 >> >     UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6659: character maps to <undefined>
 >> 
 >> > Signed-off-by: James Knight <james.d.knight@live.com>
 >> 
 >> Committed to 2021.02.x, 2021.05.x and 2021.08.x, thanks.

 > Not in 2021.08.x as of 1279d2b13284 ("package/go: security bump version
 > to 1.16.8").

Ups, too many branches - Fixed, thanks!

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
  2021-09-05  1:35 [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding James Knight
  2021-09-05 14:08 ` Thomas Petazzoni
  2021-09-10 11:34 ` Peter Korsgaard
@ 2021-09-18 21:17 ` Peter Korsgaard
  2021-09-19  2:37   ` James Knight
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Korsgaard @ 2021-09-18 21:17 UTC (permalink / raw)
  To: James Knight; +Cc: buildroot

>>>>> "James" == James Knight <james.d.knight@live.com> writes:

 > Explicitly indicate the file encoding to UTF-8 for the DEVELOPERS
 > document. This prevents Unicode decoding errors when printing E-Mail
 > entries with Unicode characters on systems using an alternative default
 > encoding (e.g. 'CP1252').

 > This corrects the following observed error:

 >     $ ./utils/get-developers outgoing/*
 >     Traceback (most recent call last):
 >       File "utils\get-developers", line 105, in <module>
 >         __main__()
 >       File "utils\get-developers", line 47, in __main__
 >         devs = getdeveloperlib.parse_developers()
 >       File "...\buildroot\utils\getdeveloperlib.py", line 239, in parse_developers
 >         for line in f:
 >       File "...\Python<ver>\lib\encodings\cp1252.py", line 23, in decode
 >         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
 >     UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6659: character maps to <undefined>

 > Signed-off-by: James Knight <james.d.knight@live.com>

Hmm, this doesn't quite seem to work when stdout is not a UTF-8 console
(E.G. when using as git send-email --cc-cmd):

echo $LANG
en_DK.UTF-8

./utils/get-developers -p libyang
Heiko Thiery <heiko.thiery@gmail.com>
Jan Kundrát <jan.kundrat@cesnet.cz>

LANG=C ./utils/get-developers -p libyang
Heiko Thiery <heiko.thiery@gmail.com>
Traceback (most recent call last):
  File "./utils/get-developers", line 105, in <module>
    __main__()
  File "./utils/get-developers", line 68, in __main__
    print(dev.name)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 9: ordinal not in range(128)

./utils/get-developers -p libyang | cat
Traceback (most recent call last):
  File "./utils/get-developers", line 105, in <module>
    __main__()
  File "./utils/get-developers", line 68, in __main__
    Heiko Thiery <heiko.thiery@gmail.com>
print(dev.name)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 9: ordinal not in range(128)


Reverting 9f127cc420884ad fixes it:

./utils/get-developers -p libyang
Heiko Thiery <heiko.thiery@gmail.com>
Jan Kundrát <jan.kundrat@cesnet.cz>

LANG=C ./utils/get-developers -p libyang
Heiko Thiery <heiko.thiery@gmail.com>
Jan Kundrát <jan.kundrat@cesnet.cz>

./utils/get-developers -p libyang | cat
Heiko Thiery <heiko.thiery@gmail.com>
Jan Kundrát <jan.kundrat@cesnet.cz>


Any idea about how to fix this, or should it just be reverted?

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding
  2021-09-18 21:17 ` Peter Korsgaard
@ 2021-09-19  2:37   ` James Knight
  0 siblings, 0 replies; 7+ messages in thread
From: James Knight @ 2021-09-19  2:37 UTC (permalink / raw)
  To: Peter Korsgaard; +Cc: James Knight, buildroot

Peter,

On Sat, Sep 18, 2021 at 5:17 PM Peter Korsgaard <peter@korsgaard.com> wrote:
> Hmm, this doesn't quite seem to work when stdout is not a UTF-8 console
> ...
>
> ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery@gmail.com>
> Jan Kundrát <jan.kundrat@cesnet.cz>
>
> LANG=C ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery@gmail.com>
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 9: ordinal not in range(128)
>
> ./utils/get-developers -p libyang | cat
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 9: ordinal not in range(128)
>
> Reverting 9f127cc420884ad fixes it:
>
> ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery@gmail.com>
> Jan Kundrát <jan.kundrat@cesnet.cz>
>
> LANG=C ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery@gmail.com>
> Jan Kundrát <jan.kundrat@cesnet.cz>
>
> ./utils/get-developers -p libyang | cat
> Heiko Thiery <heiko.thiery@gmail.com>
> Jan Kundrát <jan.kundrat@cesnet.cz>
>
> Any idea about how to fix this, or should it just be reverted?

I have no problem with reverting if this is causing issues.

From my (limited) understanding of dealing with encoding, Python and
shells; I may understand the issue here (feel free to correct me if I
am wrong on any of this). I assume that the Python interpreter being
used here is a Python 2.x version. Using the example provided above,
the name value "Kundrát" includes a Unicode character which cannot be
rendered on an ASCII-only supported terminal. For the second command
(the explicit configuration of "LANG=C"), the running Python
interpreter would assume an ASCII terminal, attempt to convert a
unicode string to ASCII and generate the observed exception. Why this
did not fail before this commit was that the Python interpreter would
be processing the name value as a byte string (i.e. not a Unicode
string). The interpreter would just print out the raw byte string to
the output stream and the UTF-8 console would handle/render it as
expected. In the event that these raw bytes are written to an
ASCII-only (or another type of character-only) supported terminal, the
rendered output may not be an expected one (e.g. a value such as
"Kundrát").

I cannot say I understand the output of the third command (I could not
reproduce with the environments I have set up). I assume that maybe
when the call works with the pipe operation, Python may be
auto-detecting the environment as an ASCII output only.

My initial impression of this issue is that it may be better with this
commit, since having the exception thrown would help guarantee that
the output entries would be renderable on the active terminal.
However, if a user is pushing the output to another command that
understands UTF-8, it would be better that the Python interpreter not
throw an exception here and forwarded the raw byte strings (and I
imagine having a user attempting to force "PYTHONIOENCODING" in this
case would be annoying over time) -- so maybe reverting is the best
case here.
_______________________________________________
buildroot mailing list
buildroot@lists.buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-19  2:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05  1:35 [Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding James Knight
2021-09-05 14:08 ` Thomas Petazzoni
2021-09-10 11:34 ` Peter Korsgaard
2021-09-10 14:39   ` Baruch Siach
2021-09-10 14:44     ` Peter Korsgaard
2021-09-18 21:17 ` Peter Korsgaard
2021-09-19  2:37   ` James Knight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.