linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Edward Cree <ecree.xilinx@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:00:02 +0200	[thread overview]
Message-ID: <20210511110002.2f187f01@coco.lan> (raw)
In-Reply-To: <ed65025c-1087-9672-7451-6d28e7ab8f92@gmail.com>

Em Mon, 10 May 2021 15:33:47 +0100
Edward Cree <ecree.xilinx@gmail.com> escreveu:

> On 10/05/2021 14:59, Matthew Wilcox wrote:
> > Most of these
> > UTF-8 characters come from latex conversions and really aren't
> > necessary (and are being used incorrectly).  
> I fully agree with fixing those.
> The cover-letter, however, gave the impression that that was not the
>  main purpose of this series; just, perhaps, a happy side-effect.

Sorry for the mess. The main reason why I wrote this series is because
there are lots of UTF-8 left-over chars from the ReST conversion.
See:
  - https://lore.kernel.org/linux-doc/20210507100435.3095f924@coco.lan/

A large set of the UTF-8 letf-over chars were due to my conversion work,
so I feel personally responsible to fix those ;-)

Yet, this series has two positive side effects:

 - it helps people needing to touch the documents using non-utf8 locales[1];
 - it makes easier to grep for a text;

[1] There are still some widely used distros nowadays (LTS ones?) that
    don't set UTF-8 as default. Last time I installed a Debian machine
    I had to explicitly set UTF-8 charset after install as the default
    were using ASCII encoding (can't remember if it was Debian 10 or an
    older version).

Unintentionally, I ended by giving emphasis to the non-utf8 instead of
giving emphasis to the conversion left-overs.

FYI, this patch series originated from a discussion at linux-doc,
reporting that Sphinx breaks when LANG is not set to utf-8[2]. That's
why I probably ended giving the wrong emphasis at the cover letter.

[2] See https://lore.kernel.org/linux-doc/20210506103913.GE6564@kitsune.suse.cz/
    for the original report. I strongly suspect that the VM set by Michal 
    to build the docs was using a distro that doesn't set UTF-8 as default.

    PS.: 
      I intend to prepare afterwards a separate fix to avoid Sphinx
      logger to crash during Kernel doc builds when the locale charset
      is not UTF-8, but I'm not too fluent in python. So, I need some
      time to check if are there a way to just avoid python log crashes
      without touching Sphinx code and without needing to trick it to 
      think that the machine's locale is UTF-8.

See: while there was just a single document originally stored at the
Kernel tree as a LaTeX document during the time we did the conversion
(cdrom-standard.tex), there are several other documents stored as 
text that seemed to be generated by some tool like LaTeX, whose the
original version were not preserved. 

Also, there were other documents using different markdown dialects 
that were converted via pandoc (and/or other similar tools). That's 
not to mention the ones that were converted from DocBook. Such
tools tend to use some logic to use "neat" versions of some ASCII
characters, like what this tool does:

	https://daringfireball.net/projects/smartypants/

(Sphinx itself seemed to use this tool on its early versions)

All tool-converted documents can carry UTF-8 on unexpected places. See,
on this series, a large amount of patches deal with U+A0 (NO-BREAK SPACE)
chars. I can't see why someone writing a plain text document (or a ReST
one) would type a NO-BREAK SPACE instead of a normal white space.

The same applies, up to some sort, to curly commas: usually people just 
write ASCII "commas" on their documents, and use some tool like LaTeX
or a text editor like libreoffice in order to convert them into
 “utf-8 curly commas”[3].

[3] Sphinx will do such things at the produced output, doing something 
    similar to what smartypants does, nowadays using this:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

    E. g.:
      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

> > You seem quite knowedgeable about the various differences.  Perhaps
> > you'd be willing to write a document for Documentation/doc-guide/
> > that provides guidance for when to use which kinds of horizontal
> > line?
> I have Opinions about the proper usage of punctuation, but I also know  
>  that other people have differing opinions.  For instance, I place
>  spaces around an em dash, which is nonstandard according to most
>  style guides.  Really this is an individual enough thing that I'm not
>  sure we could have a "kernel style guide" that would be more useful
>  than general-purpose guidance like the page you linked.

> Moreover, such a guide could make non-native speakers needlessly self-
>  conscious about their writing and discourage them from contributing
>  documentation at all.

I don't think so. In a matter of fact, as a non-native speaker, I guess
this can actually help people willing to write documents.

>  I'm not advocating here for trying to push
>  kernel developers towards an eats-shoots-and-leaves level of
>  linguistic pedantry; rather, I merely think that existing correct
>  usages should be left intact (and therefore, excising incorrect usage
>  should only be attempted by someone with both the expertise and time
>  to check each case).
> 
> But if you really want such a doc I wouldn't mind contributing to it.

IMO, a document like that can be helpful. I can help reviewing it.

Thanks,
Mauro

  reply	other threads:[~2021-05-11  9:00 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 10:26 [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 01/53] docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 02/53] docs: ABI: remove a meaningless UTF-8 character Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 03/53] docs: ABI: remove some spurious characters Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 04/53] docs: index.rst: avoid using UTF-8 chars Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 05/53] docs: hwmon: " Mauro Carvalho Chehab
2021-05-10 13:30   ` Guenter Roeck
2021-05-10 10:26 ` [PATCH 06/53] docs: admin-guide: " Mauro Carvalho Chehab
2021-05-10 18:40   ` Gabriel Krisman Bertazi
2021-05-12  8:44     ` Mauro Carvalho Chehab
2021-05-12  9:25       ` David Woodhouse
2021-05-12 10:22         ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 07/53] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 08/53] docs: admin-guide: sysctl: kernel.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 09/53] docs: admin-guide: perf: imx-ddr.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 10/53] docs: admin-guide: pm: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 11/53] docs: trace: coresight: coresight-etm4x-reference.rst: " Mauro Carvalho Chehab
2021-05-10 19:28   ` Mathieu Poirier
2021-05-10 10:26 ` [PATCH 12/53] docs: driver-api: " Mauro Carvalho Chehab
     [not found]   ` <CAHp75Vegsb-+fVppv3C7Jp0a=mEGAh2pchX=Cr5ZvOMFt+G73Q@mail.gmail.com>
2021-05-12  8:49     ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 13/53] docs: driver-api: fpga: " Mauro Carvalho Chehab
2021-05-10 17:48   ` Moritz Fischer
2021-05-10 10:26 ` [PATCH 14/53] docs: driver-api: iio: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 15/53] docs: driver-api: thermal: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 16/53] docs: driver-api: media: drivers: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 17/53] docs: driver-api: firmware: other_interfaces.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 18/53] docs: driver-api: nvdimm: btt.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 19/53] docs: fault-injection: nvme-fault-injection.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 20/53] docs: usb: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 21/53] docs: process: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 22/53] docs: block: data-integrity.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 23/53] docs: userspace-api: media: fdl-appendix.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 24/53] docs: userspace-api: media: v4l: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 25/53] docs: userspace-api: media: dvb: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 26/53] docs: vm: zswap.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 27/53] docs: filesystems: f2fs.rst: " Mauro Carvalho Chehab
2021-05-11  3:16   ` [f2fs-dev] " Chao Yu
2021-05-10 10:26 ` [PATCH 28/53] docs: filesystems: ext4: " Mauro Carvalho Chehab
2021-05-10 19:23   ` Theodore Ts'o
2021-05-10 10:26 ` [PATCH 29/53] docs: kernel-hacking: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 30/53] docs: hid: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 31/53] docs: security: tpm: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 32/53] docs: security: keys: trusted-encrypted.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 33/53] docs: riscv: vm-layout.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 34/53] docs: networking: scaling.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 35/53] docs: networking: devlink: devlink-dpipe.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 36/53] docs: networking: device_drivers: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 37/53] docs: x86: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 38/53] docs: scheduler: sched-deadline.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 39/53] docs: dev-tools: testing-overview.rst: " Mauro Carvalho Chehab
2021-05-10 10:48   ` Marco Elver
2021-05-12  8:52     ` Mauro Carvalho Chehab
2021-05-10 23:35   ` David Gow
2021-05-12  8:14     ` Mauro Carvalho Chehab
2021-05-12  8:29     ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 40/53] docs: power: powercap: powercap.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 41/53] docs: ABI: " Mauro Carvalho Chehab
2021-05-10 13:53   ` Guenter Roeck
2021-05-10 10:26 ` [PATCH 42/53] docs: doc-guide: contributing.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 43/53] docs: PCI: acpi-info.rst: " Mauro Carvalho Chehab
2021-05-10 10:37   ` Krzysztof Wilczyński
2021-05-10 10:26 ` [PATCH 44/53] docs: gpu: " Mauro Carvalho Chehab
2021-05-10 11:16   ` Jani Nikula
2021-05-10 12:36   ` Liviu Dudau
2021-05-10 10:26 ` [PATCH 45/53] docs: sound: kernel-api: writing-an-alsa-driver.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 46/53] docs: arm64: arm-acpi.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 47/53] docs: infiniband: tag_matching.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 48/53] docs: timers: no_hz.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 49/53] docs: misc-devices: ibmvmc.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 50/53] docs: firmware-guide: acpi: lpit.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 51/53] docs: firmware-guide: acpi: dsd: graph.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 52/53] docs: virt: kvm: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 53/53] docs: RCU: " Mauro Carvalho Chehab
2021-05-11  0:05   ` Paul E. McKenney
2021-05-10 10:52 ` [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Thorsten Leemhuis
2021-05-10 11:19   ` Mauro Carvalho Chehab
2021-05-10 12:27     ` Mauro Carvalho Chehab
2021-05-10 10:54 ` David Woodhouse
2021-05-10 11:55   ` Mauro Carvalho Chehab
2021-05-10 13:16     ` Edward Cree
2021-05-10 13:38       ` Mauro Carvalho Chehab
2021-05-10 13:58         ` Edward Cree
2021-05-10 13:59       ` Matthew Wilcox
2021-05-10 14:33         ` Edward Cree
2021-05-11  9:00           ` Mauro Carvalho Chehab [this message]
2021-05-11  9:19             ` David Woodhouse
2021-05-10 13:49     ` David Woodhouse
2021-05-10 19:22       ` Theodore Ts'o
2021-05-11  9:37         ` Mauro Carvalho Chehab
2021-05-11  9:25       ` Mauro Carvalho Chehab
2021-05-10 14:00     ` Ben Boeckel
2021-05-10 21:57 ` Adam Borowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210511110002.2f187f01@coco.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=alsa-devel@alsa-project.org \
    --cc=corbet@lwn.net \
    --cc=coresight@lists.linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dwmw2@infradead.org \
    --cc=ecree.xilinx@gmail.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=keyrings@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fpga@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mjpeg-users@lists.sourceforge.net \
    --cc=netdev@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).