All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Heiser <markus.heiser@darmarit.de>
To: "Michal Suchánek" <msuchanek@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>
Subject: Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
Date: Thu, 6 May 2021 19:59:18 +0200	[thread overview]
Message-ID: <493bb1b6-9fc8-fce8-67f2-f6d2e86a07f3@darmarit.de> (raw)
In-Reply-To: <20210506174849.GH6564@kitsune.suse.cz>

Am 06.05.21 um 19:48 schrieb Michal Suchánek:
> On Thu, May 06, 2021 at 07:04:44PM +0200, Markus Heiser wrote:
>> Am 06.05.21 um 18:46 schrieb Mauro Carvalho Chehab:
>>> Em Thu, 6 May 2021 17:57:15 +0200
>>> Markus Heiser <markus.heiser@darmarit.de> escreveu:
>>>
>>>> Am 06.05.21 um 12:39 schrieb Michal Suchánek:
>>>>> When building HTML documentation I get this output:
>>>> ...
>>>>> [  412s] UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
>>>> ...
>>>>>
>>>>> It does not say which input file contains the offending character so I can't tell which file is broken.
>>>>>
>>>>> Any idea how to debug?
>>>>
>>>> I guess the build host is a very simple container, what does
>>>>
>>>>      echo $LC_ALL
>>>>      echo $LANG
> It's actually set to en_US just before the build.
>>>>
>>>> prompt?  If it is latin, change it to something using utf-8 (I recommend
>>>> 'en_US.utf8').
>>>>
>>>> A UnicodeEncodeError can occour everywhere where characters are
>>>> encoded from (internal) unicode to the encoding of the stream.
>>>>
>>>> By example:
>>>>
>>>> A print or log statement which streams to stdout needs to encode
>>>> from unicode to stdout's encoding.  If there is one unicode symbol
>>>> which can not encoded to stream's encoding a UnicodeEncodeError
>>>> is raised.
>>>
>>> Hi Markus,
>>>
>>> It shouldn't matter the builder's locale when building the Kernel
>>> documentation (or any other documents built from other git trees
>>> on other open source projects), as the Kernel's *.rpm document charset
>>> won't change, no matter on what part of the globe it was built.
>>>
>>> I vaguely remember about a change we made a couple of years ago
>>> in order to address this issue.
>>
>> Hi Mauro :)
>>
>> sure? .. what if the logger wants to log some symbols from the
>> chines translated parts to stdout and the encoding of stdout is
>> latin?
> 
> [  127s] + cd linux-5.12-next-20210506
> [  127s] + export LANG=en_US
> [  127s] + LANG=en_US
> [  127s] + mkdir -p html
> [  127s] + python3 -c 'print("↑ᛏ个")'
> [  127s] ↑ᛏ个
> [  127s] + echo 'print("↑ᛏ个")'
> [  127s] + python3 test.py
> [  127s] Traceback (most recent call last):
> [  127s]   File "test.py", line 1, in <module>
> [  127s]     print("\u2191\u16cf\u4e2a\uf8f9")
> [  127s] UnicodeEncodeError: 'latin-1' codec can't encode characters in
> position 0-3: ordinal not in range(256)
> 
> It certainly does not look like python can print unicode in this
> environment. It tells me where the problem is, though.

Can't speak for the image of your container, may you need to install
some utf-8 packages / but in most cases

   export LANG=en_US.UTF-8
   export LC_ALL=en_US.UTF-8

should help.

   -- Markus --

> 
> Thanks
> 
> Michal
> 
> [  127s] + :
> [  127s] + locale
> [  128s] LANG=en_US
> [  128s] LC_CTYPE="en_US"
> [  128s] LC_NUMERIC="en_US"
> [  128s] LC_TIME="en_US"
> [  128s] LC_COLLATE="en_US"
> [  128s] LC_MONETARY="en_US"
> [  128s] LC_MESSAGES="en_US"
> [  128s] LC_PAPER="en_US"
> [  128s] LC_NAME="en_US"
> [  128s] LC_ADDRESS="en_US"
> [  128s] LC_TELEPHONE="en_US"
> [  128s] LC_MEASUREMENT="en_US"
> [  128s] LC_IDENTIFICATION="en_US"
> [  128s] LC_ALL=
> [  128s] + echo LC_ALL=
> [  128s] LC_ALL=
> [  128s] + echo LANG=en_US
> [  128s] LANG=en_US
> 

  reply	other threads:[~2021-05-06 17:59 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:39 Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) Michal Suchánek
2021-05-06 11:20 ` Mauro Carvalho Chehab
2021-05-06 13:32   ` Michal Suchánek
2021-05-06 14:24     ` Mauro Carvalho Chehab
2021-05-06 14:35       ` Michal Suchánek
2021-05-06 15:57 ` Markus Heiser
2021-05-06 16:46   ` Mauro Carvalho Chehab
2021-05-06 17:04     ` Markus Heiser
2021-05-06 17:27       ` Mauro Carvalho Chehab
2021-05-06 17:53         ` Markus Heiser
2021-05-06 18:06           ` Michal Suchánek
2021-05-07  8:52             ` Mauro Carvalho Chehab
2021-05-06 17:57         ` Randy Dunlap
2021-05-06 18:08           ` Matthew Wilcox
2021-05-06 21:21             ` Randy Dunlap
2021-05-07  6:39               ` Mauro Carvalho Chehab
2021-05-07  6:49                 ` Randy Dunlap
2021-05-07  8:04                 ` Mauro Carvalho Chehab
2021-05-07  8:35                   ` Michal Suchánek
2021-05-07  8:56                     ` Markus Heiser
2021-05-07  9:14                       ` Mauro Carvalho Chehab
2021-05-07  9:51                         ` Markus Heiser
2021-05-07 10:29                           ` Michal Suchánek
2021-05-07  9:02                     ` Mauro Carvalho Chehab
2021-05-08  9:22                 ` Mauro Carvalho Chehab
2021-05-08 10:41                   ` Michal Suchánek
2021-05-08 14:41                     ` Mauro Carvalho Chehab
2021-05-08 15:55                       ` Randy Dunlap
2021-05-08 17:09                         ` Michal Suchánek
2021-05-08 17:46                           ` Randy Dunlap
2021-05-10  6:22                             ` Mauro Carvalho Chehab
2021-05-10  8:17                         ` Mauro Carvalho Chehab
2021-05-06 17:48       ` Michal Suchánek
2021-05-06 17:59         ` Markus Heiser [this message]
2021-05-06 18:16           ` Michal Suchánek
2021-05-12  6:22         ` Mauro Carvalho Chehab
2021-05-12  7:01           ` Michal Suchánek
2021-05-12  7:18             ` Markus Heiser
2021-05-12  7:37               ` Markus Heiser
2021-05-12  7:59             ` Mauro Carvalho Chehab
2021-05-17 13:10               ` Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=493bb1b6-9fc8-fce8-67f2-f6d2e86a07f3@darmarit.de \
    --to=markus.heiser@darmarit.de \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=msuchanek@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.