linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: "Michal Suchánek" <msuchanek@suse.de>
Cc: Markus Heiser <markus.heiser@darmarit.de>,
	linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>
Subject: Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)
Date: Fri, 7 May 2021 10:52:15 +0200	[thread overview]
Message-ID: <20210507105215.0902461d@coco.lan> (raw)
In-Reply-To: <20210506180625.GI6564@kitsune.suse.cz>

Em Thu, 6 May 2021 20:06:25 +0200
Michal Suchánek <msuchanek@suse.de> escreveu:

> On Thu, May 06, 2021 at 07:53:25PM +0200, Markus Heiser wrote:

> > Hi Mauro,
> > 
> > it is not comfortable but is it mad? ..
> > 
> > Most often languages (or applications) do not handle encoding
> > of strings they just piping a binary stream while python
> > decode / encodes strings.
> > 
> > "The Zen of Python" [1] says
> > 
> >    Explicit is better than implicit.

This was taken into an extreme with regards to charsets:

	 "better" should never be translated to "crash" ;-)

> > If a stream can't encode symbols and these symbols should be ignored
> > you have to set the encoding of the stream explicit to ignore
> > such symbols.  
> 
> The problem is this part never happened. Loggers are supposed to tell
> you about the error in your application, not crash it.

It is insane to crash the error log due to a charset issue ;-)

> But the problem with Sphinx may be that the output file is also assumed
> to be in the locale encoding, and the output encoding is never set. It's
> HTML so it could be encoded with entities, too.
> 
> The idea about handlinng encoding precisely is not mad in itself but then
> everybody working with just ASCII and never testing their software works
> in the cases where explicit handling is needed is the mad part. 

True. The machine's locale shouldn't affect *at all* the produced
documents. See, there's a hole set of non-latin family of charsets
supported on Linux:

	https://man7.org/linux/man-pages/man7/charsets.7.html

Nothing prevents that someone using a machine whose default encoding is
KOI8-R/BIG-5/GB 2312/JIS X 0208/... to use Sphinx to produce 
UTF-8 [1] documents.

[1] or whatever other output encoding

Ok, the logger may not be able to correctly display certain
chars, but it it be perfectly fine and sane to use //TRANSLIT (or
something similar) in order to do a charset conversion. 

Even to just print a <?> for all chars that aren't printable at
the logger's output using the charset set by LANG/LC_* is 
better/saner than crashing.

Thanks,
Mauro

  reply	other threads:[~2021-05-07  8:52 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:39 Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) Michal Suchánek
2021-05-06 11:20 ` Mauro Carvalho Chehab
2021-05-06 13:32   ` Michal Suchánek
2021-05-06 14:24     ` Mauro Carvalho Chehab
2021-05-06 14:35       ` Michal Suchánek
2021-05-06 15:57 ` Markus Heiser
2021-05-06 16:46   ` Mauro Carvalho Chehab
2021-05-06 17:04     ` Markus Heiser
2021-05-06 17:27       ` Mauro Carvalho Chehab
2021-05-06 17:53         ` Markus Heiser
2021-05-06 18:06           ` Michal Suchánek
2021-05-07  8:52             ` Mauro Carvalho Chehab [this message]
2021-05-06 17:57         ` Randy Dunlap
2021-05-06 18:08           ` Matthew Wilcox
2021-05-06 21:21             ` Randy Dunlap
2021-05-07  6:39               ` Mauro Carvalho Chehab
2021-05-07  6:49                 ` Randy Dunlap
2021-05-07  8:04                 ` Mauro Carvalho Chehab
2021-05-07  8:35                   ` Michal Suchánek
2021-05-07  8:56                     ` Markus Heiser
2021-05-07  9:14                       ` Mauro Carvalho Chehab
2021-05-07  9:51                         ` Markus Heiser
2021-05-07 10:29                           ` Michal Suchánek
2021-05-07  9:02                     ` Mauro Carvalho Chehab
2021-05-08  9:22                 ` Mauro Carvalho Chehab
2021-05-08 10:41                   ` Michal Suchánek
2021-05-08 14:41                     ` Mauro Carvalho Chehab
2021-05-08 15:55                       ` Randy Dunlap
2021-05-08 17:09                         ` Michal Suchánek
2021-05-08 17:46                           ` Randy Dunlap
2021-05-10  6:22                             ` Mauro Carvalho Chehab
2021-05-10  8:17                         ` Mauro Carvalho Chehab
2021-05-06 17:48       ` Michal Suchánek
2021-05-06 17:59         ` Markus Heiser
2021-05-06 18:16           ` Michal Suchánek
2021-05-12  6:22         ` Mauro Carvalho Chehab
2021-05-12  7:01           ` Michal Suchánek
2021-05-12  7:18             ` Markus Heiser
2021-05-12  7:37               ` Markus Heiser
2021-05-12  7:59             ` Mauro Carvalho Chehab
2021-05-17 13:10               ` Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507105215.0902461d@coco.lan \
    --to=mchehab@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=markus.heiser@darmarit.de \
    --cc=msuchanek@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).