linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michal Suchánek" <msuchanek@suse.de>
To: Markus Heiser <markus.heiser@darmarit.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>
Subject: Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
Date: Thu, 6 May 2021 20:16:25 +0200	[thread overview]
Message-ID: <20210506181625.GJ6564@kitsune.suse.cz> (raw)
In-Reply-To: <493bb1b6-9fc8-fce8-67f2-f6d2e86a07f3@darmarit.de>

On Thu, May 06, 2021 at 07:59:18PM +0200, Markus Heiser wrote:
> Am 06.05.21 um 19:48 schrieb Michal Suchánek:
> > On Thu, May 06, 2021 at 07:04:44PM +0200, Markus Heiser wrote:
> > > Am 06.05.21 um 18:46 schrieb Mauro Carvalho Chehab:
> > > > Em Thu, 6 May 2021 17:57:15 +0200
> > > > Markus Heiser <markus.heiser@darmarit.de> escreveu:
> > > > 
> > > > > Am 06.05.21 um 12:39 schrieb Michal Suchánek:
> > > > > > When building HTML documentation I get this output:
> > > > > ...
> > > > > > [  412s] UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
> > > > > ...
> > > > > > 
> > > > > > It does not say which input file contains the offending character so I can't tell which file is broken.
> > > > > > 
> > > > > > Any idea how to debug?
> > > > > 
> > > > > I guess the build host is a very simple container, what does
> > > > > 
> > > > >      echo $LC_ALL
> > > > >      echo $LANG
> > It's actually set to en_US just before the build.
> > > > > 
> > > > > prompt?  If it is latin, change it to something using utf-8 (I recommend
> > > > > 'en_US.utf8').
> > > > > 
> > > > > A UnicodeEncodeError can occour everywhere where characters are
> > > > > encoded from (internal) unicode to the encoding of the stream.
> > > > > 
> > > > > By example:
> > > > > 
> > > > > A print or log statement which streams to stdout needs to encode
> > > > > from unicode to stdout's encoding.  If there is one unicode symbol
> > > > > which can not encoded to stream's encoding a UnicodeEncodeError
> > > > > is raised.
> > > > 
> > > > Hi Markus,
> > > > 
> > > > It shouldn't matter the builder's locale when building the Kernel
> > > > documentation (or any other documents built from other git trees
> > > > on other open source projects), as the Kernel's *.rpm document charset
> > > > won't change, no matter on what part of the globe it was built.
> > > > 
> > > > I vaguely remember about a change we made a couple of years ago
> > > > in order to address this issue.
> > > 
> > > Hi Mauro :)
> > > 
> > > sure? .. what if the logger wants to log some symbols from the
> > > chines translated parts to stdout and the encoding of stdout is
> > > latin?
> > 
> > [  127s] + cd linux-5.12-next-20210506
> > [  127s] + export LANG=en_US
> > [  127s] + LANG=en_US
> > [  127s] + mkdir -p html
> > [  127s] + python3 -c 'print("↑ᛏ个")'
> > [  127s] ↑ᛏ个
> > [  127s] + echo 'print("↑ᛏ个")'
> > [  127s] + python3 test.py
> > [  127s] Traceback (most recent call last):
> > [  127s]   File "test.py", line 1, in <module>
> > [  127s]     print("\u2191\u16cf\u4e2a\uf8f9")
> > [  127s] UnicodeEncodeError: 'latin-1' codec can't encode characters in
> > position 0-3: ordinal not in range(256)
> > 
> > It certainly does not look like python can print unicode in this
> > environment. It tells me where the problem is, though.
> 
> Can't speak for the image of your container, may you need to install
> some utf-8 packages / but in most cases
> 
>   export LANG=en_US.UTF-8

Yes, in this case export LANG=en_US.utf8 is an easy workaround.

The UTF-8 locale is already included in the build environment by
default.

Thanks

Michal

  reply	other threads:[~2021-05-06 18:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:39 Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) Michal Suchánek
2021-05-06 11:20 ` Mauro Carvalho Chehab
2021-05-06 13:32   ` Michal Suchánek
2021-05-06 14:24     ` Mauro Carvalho Chehab
2021-05-06 14:35       ` Michal Suchánek
2021-05-06 15:57 ` Markus Heiser
2021-05-06 16:46   ` Mauro Carvalho Chehab
2021-05-06 17:04     ` Markus Heiser
2021-05-06 17:27       ` Mauro Carvalho Chehab
2021-05-06 17:53         ` Markus Heiser
2021-05-06 18:06           ` Michal Suchánek
2021-05-07  8:52             ` Mauro Carvalho Chehab
2021-05-06 17:57         ` Randy Dunlap
2021-05-06 18:08           ` Matthew Wilcox
2021-05-06 21:21             ` Randy Dunlap
2021-05-07  6:39               ` Mauro Carvalho Chehab
2021-05-07  6:49                 ` Randy Dunlap
2021-05-07  8:04                 ` Mauro Carvalho Chehab
2021-05-07  8:35                   ` Michal Suchánek
2021-05-07  8:56                     ` Markus Heiser
2021-05-07  9:14                       ` Mauro Carvalho Chehab
2021-05-07  9:51                         ` Markus Heiser
2021-05-07 10:29                           ` Michal Suchánek
2021-05-07  9:02                     ` Mauro Carvalho Chehab
2021-05-08  9:22                 ` Mauro Carvalho Chehab
2021-05-08 10:41                   ` Michal Suchánek
2021-05-08 14:41                     ` Mauro Carvalho Chehab
2021-05-08 15:55                       ` Randy Dunlap
2021-05-08 17:09                         ` Michal Suchánek
2021-05-08 17:46                           ` Randy Dunlap
2021-05-10  6:22                             ` Mauro Carvalho Chehab
2021-05-10  8:17                         ` Mauro Carvalho Chehab
2021-05-06 17:48       ` Michal Suchánek
2021-05-06 17:59         ` Markus Heiser
2021-05-06 18:16           ` Michal Suchánek [this message]
2021-05-12  6:22         ` Mauro Carvalho Chehab
2021-05-12  7:01           ` Michal Suchánek
2021-05-12  7:18             ` Markus Heiser
2021-05-12  7:37               ` Markus Heiser
2021-05-12  7:59             ` Mauro Carvalho Chehab
2021-05-17 13:10               ` Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210506181625.GJ6564@kitsune.suse.cz \
    --to=msuchanek@suse.de \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=markus.heiser@darmarit.de \
    --cc=mchehab@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).