All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Heiser <markus.heiser@darmarit.de>
To: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "Michal Suchánek" <msuchanek@suse.de>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	linux-doc@vger.kernel.org, "Jonathan Corbet" <corbet@lwn.net>
Subject: Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
Date: Fri, 7 May 2021 11:51:47 +0200	[thread overview]
Message-ID: <43583d9c-bfc4-e3c2-96d9-7cffec9e2909@darmarit.de> (raw)
In-Reply-To: <20210507111451.36f063bb@coco.lan>

Am 07.05.21 um 11:14 schrieb Mauro Carvalho Chehab:
> Em Fri, 7 May 2021 10:56:39 +0200
> Markus Heiser <markus.heiser@darmarit.de> escreveu:
> 
>> Am 07.05.21 um 10:35 schrieb Michal Suchánek:
>>> So the bottom line is that UTF-8 in the files will stay, and Sphinx
>>> cannot handle UTF-8 when the locale is not UTF-8.
>>>
>>> In the long run it might be nice to fix Sphinx to properly set the
>>> encoding of the files it reads and writes. Or maybe there is some
>>> parameter that specifies it?
>>
>> Let's not mix things up. The Unicode-Error is not related or limited
>> to log nor to sphinx, it is related to the fact that we (you) try to
>> run a utf-8 application in an environment which is not full utf-8
>> functional.
> 
> No. The application itself is not UTF-8. The application input files are.

May be we have a different view on this, for me an application which
reads UTF-8 in and spids out UTF-8 is an UTF-8 application.

hint: HTML is just one Sphinx writer, there exist also other writers
e.g. LaTeX.

> The big issue with the way python works with charsets is due to that:
> it does a very poor job with regards to that.

This is your POV, the python developers have a different view on
handling strings.  There are epic discussions around about.

But all this discussions won't help, since we can't change the
principles of python.

Personally I think I can't ignore the principles of a language
and I'm feeling well with setting up an UTF-8 environment.

> I remember that in the past I had to use this quite often
> (before UTF-8 being default on the distros I was using on that time):
> 
> 	LANG=C <some_python_script>
> 
> Just to avoid them to crash.
> 
> If I'm not mistaken, older Fedora/Mandrake distros had some bugs with
> python-written scripts that, if the machine's language were not
> English, such scripts crash, as the i18n translated messages were
> on a different charset than what the python script would be expecting.

For me "i18n translated message" is a good example that I'm not
wrong with my opinions.  This is not true for all devices but
on those device you won't run an applications like Sphinx.

>>> For the short term I think it is reasonable to run a python test script
>>> that prints fancy unicode characters before running Sphinx and bail if
>>> the test script fails.
>>
>> To be assure, I recommend to set UTF-8 locale environment in the
>> Makefile.
>>
>> My experience shows that this is the default with almost all
>> containers (images), there are only a few where this is not the
>> case (may be suse?).
> 
> That may not be true on certain parts of the globe.

Sorry, I have spoken about common LXC images.

> I've no idea what charsets the most-used distributions in Asian
> Countries use use ;-)

I guess these days most often they will use UTF-8 since ASCII
haven't helped in the past 80s ;-)

   -- Markus --

  reply	other threads:[~2021-05-07  9:51 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:39 Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) Michal Suchánek
2021-05-06 11:20 ` Mauro Carvalho Chehab
2021-05-06 13:32   ` Michal Suchánek
2021-05-06 14:24     ` Mauro Carvalho Chehab
2021-05-06 14:35       ` Michal Suchánek
2021-05-06 15:57 ` Markus Heiser
2021-05-06 16:46   ` Mauro Carvalho Chehab
2021-05-06 17:04     ` Markus Heiser
2021-05-06 17:27       ` Mauro Carvalho Chehab
2021-05-06 17:53         ` Markus Heiser
2021-05-06 18:06           ` Michal Suchánek
2021-05-07  8:52             ` Mauro Carvalho Chehab
2021-05-06 17:57         ` Randy Dunlap
2021-05-06 18:08           ` Matthew Wilcox
2021-05-06 21:21             ` Randy Dunlap
2021-05-07  6:39               ` Mauro Carvalho Chehab
2021-05-07  6:49                 ` Randy Dunlap
2021-05-07  8:04                 ` Mauro Carvalho Chehab
2021-05-07  8:35                   ` Michal Suchánek
2021-05-07  8:56                     ` Markus Heiser
2021-05-07  9:14                       ` Mauro Carvalho Chehab
2021-05-07  9:51                         ` Markus Heiser [this message]
2021-05-07 10:29                           ` Michal Suchánek
2021-05-07  9:02                     ` Mauro Carvalho Chehab
2021-05-08  9:22                 ` Mauro Carvalho Chehab
2021-05-08 10:41                   ` Michal Suchánek
2021-05-08 14:41                     ` Mauro Carvalho Chehab
2021-05-08 15:55                       ` Randy Dunlap
2021-05-08 17:09                         ` Michal Suchánek
2021-05-08 17:46                           ` Randy Dunlap
2021-05-10  6:22                             ` Mauro Carvalho Chehab
2021-05-10  8:17                         ` Mauro Carvalho Chehab
2021-05-06 17:48       ` Michal Suchánek
2021-05-06 17:59         ` Markus Heiser
2021-05-06 18:16           ` Michal Suchánek
2021-05-12  6:22         ` Mauro Carvalho Chehab
2021-05-12  7:01           ` Michal Suchánek
2021-05-12  7:18             ` Markus Heiser
2021-05-12  7:37               ` Markus Heiser
2021-05-12  7:59             ` Mauro Carvalho Chehab
2021-05-17 13:10               ` Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43583d9c-bfc4-e3c2-96d9-7cffec9e2909@darmarit.de \
    --to=markus.heiser@darmarit.de \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=msuchanek@suse.de \
    --cc=rdunlap@infradead.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.