linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
@ 2021-05-12 12:50 Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 03/40] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 12:50 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

This series contain basically a cleanup from all those years of converting
files to ReST.

During the conversion period, several tools like LaTeX, pandoc, DocBook
and some specially-written scripts were used in order to convert
existing documents.

Such conversion tools - plus some text editor like LibreOffice  or similar  - have
a set of rules that turns some typed ASCII characters into UTF-8 alternatives,
for instance converting commas into curly commas and adding non-breakable
spaces. All of those are meant to produce better results when the text is
displayed in HTML or PDF formats.

While it is perfectly fine to use UTF-8 characters in Linux, and specially at
the documentation,  it is better to  stick to the ASCII subset  on such
particular case,  due to a couple of reasons:

1. it makes life easier for tools like grep;
2. they easier to edit with the some commonly used text/source
   code editors.
    
Also, Sphinx already do such conversion automatically outside 
literal blocks, as described at:

       https://docutils.sourceforge.io/docs/user/smartquotes.html

In this series, the following UTF-8 symbols are replaced:

            - U+00a0 (' '): NO-BREAK SPACE
            - U+00ad ('­'): SOFT HYPHEN
            - U+00b4 ('´'): ACUTE ACCENT
            - U+00d7 ('×'): MULTIPLICATION SIGN
            - U+2010 ('‐'): HYPHEN
            - U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
            - U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
            - U+201c ('“'): LEFT DOUBLE QUOTATION MARK
            - U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
            - U+2212 ('−'): MINUS SIGN
            - U+2217 ('∗'): ASTERISK OPERATOR
            - U+feff (''): ZERO WIDTH NO-BREAK SPACE (BOM)

---

v2:
- removed EM/EN DASH conversion from this patchset;
- removed a few fixes, as those were addressed on a separate series.
 
PS.:
   The first version of this series was posted with a different name:

	https://lore.kernel.org/lkml/cover.1620641727.git.mchehab+huawei@kernel.org/

   I also changed the patch texts, in order to better describe the patches goals.

Mauro Carvalho Chehab (40):
  docs: hwmon: Use ASCII subset instead of UTF-8 alternate symbols
  docs: admin-guide: Use ASCII subset instead of UTF-8 alternate symbols
  docs: admin-guide: media: ipu3.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: admin-guide: perf: imx-ddr.rst: Use ASCII subset instead of
    UTF-8 alternate symbols
  docs: admin-guide: pm: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: trace: coresight: coresight-etm4x-reference.rst: Use ASCII
    subset instead of UTF-8 alternate symbols
  docs: driver-api: ioctl.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: driver-api: thermal: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: driver-api: media: drivers: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: driver-api: firmware: other_interfaces.rst: Use ASCII subset
    instead of UTF-8 alternate symbols
  docs: fault-injection: nvme-fault-injection.rst: Use ASCII subset
    instead of UTF-8 alternate symbols
  docs: usb: Use ASCII subset instead of UTF-8 alternate symbols
  docs: process: code-of-conduct.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: userspace-api: media: fdl-appendix.rst: Use ASCII subset instead
    of UTF-8 alternate symbols
  docs: userspace-api: media: v4l: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: userspace-api: media: dvb: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: vm: zswap.rst: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: filesystems: f2fs.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: filesystems: ext4: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: kernel-hacking: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: hid: Use ASCII subset instead of UTF-8 alternate symbols
  docs: security: tpm: tpm_event_log.rst: Use ASCII subset instead of
    UTF-8 alternate symbols
  docs: security: keys: trusted-encrypted.rst: Use ASCII subset instead
    of UTF-8 alternate symbols
  docs: networking: scaling.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: networking: devlink: devlink-dpipe.rst: Use ASCII subset instead
    of UTF-8 alternate symbols
  docs: networking: device_drivers: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: x86: Use ASCII subset instead of UTF-8 alternate symbols
  docs: scheduler: sched-deadline.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: power: powercap: powercap.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: ABI: Use ASCII subset instead of UTF-8 alternate symbols
  docs: PCI: acpi-info.rst: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: gpu: Use ASCII subset instead of UTF-8 alternate symbols
  docs: sound: kernel-api: writing-an-alsa-driver.rst: Use ASCII subset
    instead of UTF-8 alternate symbols
  docs: arm64: arm-acpi.rst: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: infiniband: tag_matching.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: misc-devices: ibmvmc.rst: Use ASCII subset instead of UTF-8
    alternate symbols
  docs: firmware-guide: acpi: lpit.rst: Use ASCII subset instead of
    UTF-8 alternate symbols
  docs: firmware-guide: acpi: dsd: graph.rst: Use ASCII subset instead
    of UTF-8 alternate symbols
  docs: virt: kvm: api.rst: Use ASCII subset instead of UTF-8 alternate
    symbols
  docs: RCU: Use ASCII subset instead of UTF-8 alternate symbols

 ...sfs-class-chromeos-driver-cros-ec-lightbar |   2 +-
 .../ABI/testing/sysfs-devices-platform-ipmi   |   2 +-
 .../testing/sysfs-devices-platform-trackpoint |   2 +-
 Documentation/ABI/testing/sysfs-devices-soc   |   4 +-
 Documentation/PCI/acpi-info.rst               |  22 +-
 .../Data-Structures/Data-Structures.rst       |  52 ++--
 .../Expedited-Grace-Periods.rst               |  40 +--
 .../Tree-RCU-Memory-Ordering.rst              |  10 +-
 .../RCU/Design/Requirements/Requirements.rst  | 122 ++++-----
 Documentation/admin-guide/media/ipu3.rst      |   2 +-
 Documentation/admin-guide/perf/imx-ddr.rst    |   2 +-
 Documentation/admin-guide/pm/intel_idle.rst   |   4 +-
 Documentation/admin-guide/pm/intel_pstate.rst |   4 +-
 Documentation/admin-guide/ras.rst             |  86 +++---
 .../admin-guide/reporting-issues.rst          |   2 +-
 Documentation/arm64/arm-acpi.rst              |   8 +-
 .../driver-api/firmware/other_interfaces.rst  |   2 +-
 Documentation/driver-api/ioctl.rst            |   8 +-
 .../media/drivers/sh_mobile_ceu_camera.rst    |   8 +-
 .../driver-api/media/drivers/zoran.rst        |   2 +-
 .../driver-api/thermal/cpu-idle-cooling.rst   |  14 +-
 .../driver-api/thermal/intel_powerclamp.rst   |   6 +-
 .../thermal/x86_pkg_temperature_thermal.rst   |   2 +-
 .../fault-injection/nvme-fault-injection.rst  |   2 +-
 Documentation/filesystems/ext4/attributes.rst |  20 +-
 Documentation/filesystems/ext4/bigalloc.rst   |   6 +-
 Documentation/filesystems/ext4/blockgroup.rst |   8 +-
 Documentation/filesystems/ext4/blocks.rst     |   2 +-
 Documentation/filesystems/ext4/directory.rst  |  16 +-
 Documentation/filesystems/ext4/eainode.rst    |   2 +-
 Documentation/filesystems/ext4/inlinedata.rst |   6 +-
 Documentation/filesystems/ext4/inodes.rst     |   6 +-
 Documentation/filesystems/ext4/journal.rst    |   8 +-
 Documentation/filesystems/ext4/mmp.rst        |   2 +-
 .../filesystems/ext4/special_inodes.rst       |   4 +-
 Documentation/filesystems/ext4/super.rst      |  10 +-
 Documentation/filesystems/f2fs.rst            |   4 +-
 .../firmware-guide/acpi/dsd/graph.rst         |   2 +-
 Documentation/firmware-guide/acpi/lpit.rst    |   2 +-
 Documentation/gpu/i915.rst                    |   2 +-
 Documentation/gpu/komeda-kms.rst              |   2 +-
 Documentation/hid/hid-sensor.rst              |  70 ++---
 Documentation/hid/intel-ish-hid.rst           | 246 +++++++++---------
 Documentation/hwmon/ir36021.rst               |   2 +-
 Documentation/hwmon/ltc2992.rst               |   2 +-
 Documentation/hwmon/pm6764tr.rst              |   2 +-
 Documentation/infiniband/tag_matching.rst     |   4 +-
 Documentation/kernel-hacking/hacking.rst      |   2 +-
 Documentation/kernel-hacking/locking.rst      |   2 +-
 Documentation/misc-devices/ibmvmc.rst         |   8 +-
 .../device_drivers/ethernet/intel/i40e.rst    |   8 +-
 .../device_drivers/ethernet/intel/iavf.rst    |   4 +-
 .../device_drivers/ethernet/netronome/nfp.rst |  12 +-
 .../networking/devlink/devlink-dpipe.rst      |   2 +-
 Documentation/networking/scaling.rst          |  18 +-
 Documentation/power/powercap/powercap.rst     | 210 +++++++--------
 Documentation/process/code-of-conduct.rst     |   2 +-
 Documentation/scheduler/sched-deadline.rst    |   2 +-
 .../security/keys/trusted-encrypted.rst       |   4 +-
 Documentation/security/tpm/tpm_event_log.rst  |   2 +-
 .../kernel-api/writing-an-alsa-driver.rst     |  68 ++---
 .../coresight/coresight-etm4x-reference.rst   |  16 +-
 Documentation/usb/ehci.rst                    |   2 +-
 Documentation/usb/gadget_printer.rst          |   2 +-
 Documentation/usb/mass-storage.rst            |  36 +--
 .../media/dvb/audio-set-bypass-mode.rst       |   2 +-
 .../userspace-api/media/dvb/audio.rst         |   2 +-
 .../userspace-api/media/dvb/dmx-fopen.rst     |   2 +-
 .../userspace-api/media/dvb/dmx-fread.rst     |   2 +-
 .../media/dvb/dmx-set-filter.rst              |   2 +-
 .../userspace-api/media/dvb/intro.rst         |   6 +-
 .../userspace-api/media/dvb/video.rst         |   2 +-
 .../userspace-api/media/fdl-appendix.rst      |  64 ++---
 .../userspace-api/media/v4l/crop.rst          |  16 +-
 .../userspace-api/media/v4l/dev-decoder.rst   |   6 +-
 .../userspace-api/media/v4l/diff-v4l.rst      |   2 +-
 .../userspace-api/media/v4l/open.rst          |   2 +-
 .../media/v4l/vidioc-cropcap.rst              |   4 +-
 Documentation/virt/kvm/api.rst                |  28 +-
 Documentation/vm/zswap.rst                    |   4 +-
 Documentation/x86/resctrl.rst                 |   2 +-
 Documentation/x86/sgx.rst                     |   4 +-
 82 files changed, 693 insertions(+), 693 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 03/40] docs: admin-guide: media: ipu3.rst: Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
@ 2021-05-12 12:50 ` Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 09/40] docs: driver-api: media: drivers: " Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 12:50 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Bingbu Cao,
	Mauro Carvalho Chehab, Sakari Ailus, Tianshu Qiu, linux-kernel,
	linux-media

The conversion tools used during DocBook/LaTeX/Markdown->ReST conversion
and some automatic rules which exists on certain text editors like
LibreOffice turned ASCII characters into some UTF-8 alternatives that
are better displayed on html and PDF.

While it is OK to use UTF-8 characters in Linux, it is better to
use the ASCII subset instead of using an UTF-8 equivalent character
as it makes life easier for tools like grep, and are easier to edit
with the some commonly used text/source code editors.

Also, Sphinx already do such conversion automatically outside literal blocks:
   https://docutils.sourceforge.io/docs/user/smartquotes.html

So, replace the occurences of the following UTF-8 characters:

	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/admin-guide/media/ipu3.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/media/ipu3.rst b/Documentation/admin-guide/media/ipu3.rst
index f59697c7b374..f77cb1384dc3 100644
--- a/Documentation/admin-guide/media/ipu3.rst
+++ b/Documentation/admin-guide/media/ipu3.rst
@@ -244,7 +244,7 @@ larger bayer frame for further YUV processing than "VIDEO" mode to get high
 quality images. Besides, "STILL" mode need XNR3 to do noise reduction, hence
 "STILL" mode will need more power and memory bandwidth than "VIDEO" mode. TNR
 will be enabled in "VIDEO" mode and bypassed by "STILL" mode. ImgU is running at
-“VIDEO” mode by default, the user can use v4l2 control V4L2_CID_INTEL_IPU3_MODE
+"VIDEO" mode by default, the user can use v4l2 control V4L2_CID_INTEL_IPU3_MODE
 (currently defined in drivers/staging/media/ipu3/include/intel-ipu3.h) to query
 and set the running mode. For user, there is no difference for buffer queueing
 between the "VIDEO" and "STILL" mode, mandatory input and main output node
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 09/40] docs: driver-api: media: drivers: Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 03/40] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
@ 2021-05-12 12:50 ` Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 14/40] docs: userspace-api: media: fdl-appendix.rst: " Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 12:50 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Corentin Labbe,
	Mauro Carvalho Chehab, linux-kernel, linux-media, mjpeg-users

The conversion tools used during DocBook/LaTeX/Markdown->ReST conversion
and some automatic rules which exists on certain text editors like
LibreOffice turned ASCII characters into some UTF-8 alternatives that
are better displayed on html and PDF.

While it is OK to use UTF-8 characters in Linux, it is better to
use the ASCII subset instead of using an UTF-8 equivalent character
as it makes life easier for tools like grep, and are easier to edit
with the some commonly used text/source code editors.

Also, Sphinx already do such conversion automatically outside literal blocks:
   https://docutils.sourceforge.io/docs/user/smartquotes.html

So, replace the occurences of the following UTF-8 characters:

	- U+00ad ('­'): SOFT HYPHEN
	- U+00b4 ('´'): ACUTE ACCENT

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 .../driver-api/media/drivers/sh_mobile_ceu_camera.rst     | 8 ++++----
 Documentation/driver-api/media/drivers/zoran.rst          | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/media/drivers/sh_mobile_ceu_camera.rst b/Documentation/driver-api/media/drivers/sh_mobile_ceu_camera.rst
index 822fcb8368ae..280a322c34c6 100644
--- a/Documentation/driver-api/media/drivers/sh_mobile_ceu_camera.rst
+++ b/Documentation/driver-api/media/drivers/sh_mobile_ceu_camera.rst
@@ -30,10 +30,10 @@ Generic scaling / cropping scheme
 	|                       `. .6--
 	|
 	|                        . .6'-
-	|                      .´
-	|           ... -4'- .´
-	|       ...´             - -7'.
-	+-5'- .´               -/
+	|                      .'
+	|           ... -4'- .'
+	|       ...'             - -7'.
+	+-5'- .'               -/
 	|            -- -3'- -/
 	|         --/
 	|      --/
diff --git a/Documentation/driver-api/media/drivers/zoran.rst b/Documentation/driver-api/media/drivers/zoran.rst
index 83cbae9cedef..b205e10c3154 100644
--- a/Documentation/driver-api/media/drivers/zoran.rst
+++ b/Documentation/driver-api/media/drivers/zoran.rst
@@ -319,7 +319,7 @@ Conexant bt866 TV encoder
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
 - is used in AVS6EYES, and
-- can generate: NTSC/PAL, PAL­M, PAL­N
+- can generate: NTSC/PAL, PAL-M, PAL-N
 
 The adv717x, should be able to produce PAL N. But you find nothing PAL N
 specific in the registers. Seem that you have to reuse a other standard
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 14/40] docs: userspace-api: media: fdl-appendix.rst: Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 03/40] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 09/40] docs: driver-api: media: drivers: " Mauro Carvalho Chehab
@ 2021-05-12 12:50 ` Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 15/40] docs: userspace-api: media: v4l: " Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 12:50 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Mauro Carvalho Chehab,
	linux-kernel, linux-media

The conversion tools used during DocBook/LaTeX/Markdown->ReST conversion
and some automatic rules which exists on certain text editors like
LibreOffice turned ASCII characters into some UTF-8 alternatives that
are better displayed on html and PDF.

While it is OK to use UTF-8 characters in Linux, it is better to
use the ASCII subset instead of using an UTF-8 equivalent character
as it makes life easier for tools like grep, and are easier to edit
with the some commonly used text/source code editors.

Also, Sphinx already do such conversion automatically outside literal blocks:
   https://docutils.sourceforge.io/docs/user/smartquotes.html

So, replace the occurences of the following UTF-8 characters:

	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 .../userspace-api/media/fdl-appendix.rst      | 64 +++++++++----------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/Documentation/userspace-api/media/fdl-appendix.rst b/Documentation/userspace-api/media/fdl-appendix.rst
index 683ebed87017..b1bc725b4ec7 100644
--- a/Documentation/userspace-api/media/fdl-appendix.rst
+++ b/Documentation/userspace-api/media/fdl-appendix.rst
@@ -13,14 +13,14 @@ GNU Free Documentation License
 ===========
 
 The purpose of this License is to make a manual, textbook, or other
-written document “free” in the sense of freedom: to assure everyone the
+written document "free" in the sense of freedom: to assure everyone the
 effective freedom to copy and redistribute it, with or without modifying
 it, either commercially or noncommercially. Secondarily, this License
 preserves for the author and publisher a way to get credit for their
 work, while not being considered responsible for modifications made by
 others.
 
-This License is a kind of “copyleft”, which means that derivative works
+This License is a kind of "copyleft", which means that derivative works
 of the document must themselves be free in the same sense. It
 complements the GNU General Public License, which is a copyleft license
 designed for free software.
@@ -44,21 +44,21 @@ works whose purpose is instruction or reference.
 
 This License applies to any manual or other work that contains a notice
 placed by the copyright holder saying it can be distributed under the
-terms of this License. The “Document”, below, refers to any such manual
+terms of this License. The "Document", below, refers to any such manual
 or work. Any member of the public is a licensee, and is addressed as
-“you”.
+"you".
 
 
 .. _fdl-modified:
 
-A “Modified Version” of the Document means any work containing the
+A "Modified Version" of the Document means any work containing the
 Document or a portion of it, either copied verbatim, or with
 modifications and/or translated into another language.
 
 
 .. _fdl-secondary:
 
-A “Secondary Section” is a named appendix or a front-matter section of
+A "Secondary Section" is a named appendix or a front-matter section of
 the :ref:`Document <fdl-document>` that deals exclusively with the
 relationship of the publishers or authors of the Document to the
 Document's overall subject (or to related matters) and contains nothing
@@ -72,7 +72,7 @@ regarding them.
 
 .. _fdl-invariant:
 
-The “Invariant Sections” are certain
+The "Invariant Sections" are certain
 :ref:`Secondary Sections <fdl-secondary>` whose titles are designated,
 as being those of Invariant Sections, in the notice that says that the
 :ref:`Document <fdl-document>` is released under this License.
@@ -80,14 +80,14 @@ as being those of Invariant Sections, in the notice that says that the
 
 .. _fdl-cover-texts:
 
-The “Cover Texts” are certain short passages of text that are listed, as
+The "Cover Texts" are certain short passages of text that are listed, as
 Front-Cover Texts or Back-Cover Texts, in the notice that says that the
 :ref:`Document <fdl-document>` is released under this License.
 
 
 .. _fdl-transparent:
 
-A “Transparent” copy of the :ref:`Document <fdl-document>` means a
+A "Transparent" copy of the :ref:`Document <fdl-document>` means a
 machine-readable copy, represented in a format whose specification is
 available to the general public, whose contents can be viewed and edited
 directly and straightforwardly with generic text editors or (for images
@@ -97,7 +97,7 @@ formatters or for automatic translation to a variety of formats suitable
 for input to text formatters. A copy made in an otherwise Transparent
 file format whose markup has been designed to thwart or discourage
 subsequent modification by readers is not Transparent. A copy that is
-not “Transparent” is called “Opaque”.
+not "Transparent" is called "Opaque".
 
 Examples of suitable formats for Transparent copies include plain ASCII
 without markup, Texinfo input format, LaTeX input format, SGML or XML
@@ -111,10 +111,10 @@ word processors for output purposes only.
 
 .. _fdl-title-page:
 
-The “Title Page” means, for a printed book, the title page itself, plus
+The "Title Page" means, for a printed book, the title page itself, plus
 such following pages as are needed to hold, legibly, the material this
 License requires to appear in the title page. For works in formats which
-do not have any title page as such, “Title Page” means the text near the
+do not have any title page as such, "Title Page" means the text near the
 most prominent appearance of the work's title, preceding the beginning
 of the body of the text.
 
@@ -242,11 +242,11 @@ Modified Version:
    Include an unaltered copy of this License.
 
 -  **I.**
-   Preserve the section entitled “History”, and its title, and add to it
+   Preserve the section entitled "History", and its title, and add to it
    an item stating at least the title, year, new authors, and publisher
    of the :ref:`Modified Version <fdl-modified>` as given on the
    :ref:`Title Page <fdl-title-page>`. If there is no section entitled
-   “History” in the :ref:`Document <fdl-document>`, create one stating
+   "History" in the :ref:`Document <fdl-document>`, create one stating
    the title, year, authors, and publisher of the Document as given on
    its Title Page, then add an item describing the Modified Version as
    stated in the previous sentence.
@@ -256,13 +256,13 @@ Modified Version:
    :ref:`Document <fdl-document>` for public access to a
    :ref:`Transparent <fdl-transparent>` copy of the Document, and
    likewise the network locations given in the Document for previous
-   versions it was based on. These may be placed in the “History”
+   versions it was based on. These may be placed in the "History"
    section. You may omit a network location for a work that was
    published at least four years before the Document itself, or if the
    original publisher of the version it refers to gives permission.
 
 -  **K.**
-   In any section entitled “Acknowledgements” or “Dedications”, preserve
+   In any section entitled "Acknowledgements" or "Dedications", preserve
    the section's title, and preserve in the section all the substance
    and tone of each of the contributor acknowledgements and/or
    dedications given therein.
@@ -274,11 +274,11 @@ Modified Version:
    part of the section titles.
 
 -  **M.**
-   Delete any section entitled “Endorsements”. Such a section may not be
+   Delete any section entitled "Endorsements". Such a section may not be
    included in the :ref:`Modified Version <fdl-modified>`.
 
 -  **N.**
-   Do not retitle any existing section as “Endorsements” or to conflict
+   Do not retitle any existing section as "Endorsements" or to conflict
    in title with any :ref:`Invariant Section <fdl-invariant>`.
 
 If the :ref:`Modified Version <fdl-modified>` includes new
@@ -290,7 +290,7 @@ of :ref:`Invariant Sections <fdl-invariant>` in the Modified Version's
 license notice. These titles must be distinct from any other section
 titles.
 
-You may add a section entitled “Endorsements”, provided it contains
+You may add a section entitled "Endorsements", provided it contains
 nothing but endorsements of your
 :ref:`Modified Version <fdl-modified>` by various parties--for
 example, statements of peer review or that the text has been approved by
@@ -337,11 +337,11 @@ the original author or publisher of that section if known, or else a
 unique number. Make the same adjustment to the section titles in the
 list of Invariant Sections in the license notice of the combined work.
 
-In the combination, you must combine any sections entitled “History” in
-the various original documents, forming one section entitled “History”;
-likewise combine any sections entitled “Acknowledgements”, and any
-sections entitled “Dedications”. You must delete all sections entitled
-“Endorsements.”
+In the combination, you must combine any sections entitled "History" in
+the various original documents, forming one section entitled "History";
+likewise combine any sections entitled "Acknowledgements", and any
+sections entitled "Dedications". You must delete all sections entitled
+"Endorsements."
 
 
 .. _fdl-section6:
@@ -372,7 +372,7 @@ with other separate and independent documents or works, in or on a
 volume of a storage or distribution medium, does not as a whole count as
 a :ref:`Modified Version <fdl-modified>` of the Document, provided no
 compilation copyright is claimed for the compilation. Such a compilation
-is called an “aggregate”, and this License does not apply to the other
+is called an "aggregate", and this License does not apply to the other
 self-contained works thus compiled with the Document , on account of
 their being thus compiled, if they are not themselves derivative works
 of the Document. If the :ref:`Cover Text <fdl-cover-texts>`
@@ -429,7 +429,7 @@ concerns. See
 
 Each version of the License is given a distinguishing version number. If
 the :ref:`Document <fdl-document>` specifies that a particular
-numbered version of this License “or any later version” applies to it,
+numbered version of this License "or any later version" applies to it,
 you have the option of following the terms and conditions either of that
 specified version or of any later version that has been published (not
 as a draft) by the Free Software Foundation. If the Document does not
@@ -455,13 +455,13 @@ notices just after the title page:
     being LIST THEIR TITLES, with the
     :ref:`Front-Cover Texts <fdl-cover-texts>` being LIST, and with
     the :ref:`Back-Cover Texts <fdl-cover-texts>` being LIST. A copy
-    of the license is included in the section entitled “GNU Free
-    Documentation License”.
+    of the license is included in the section entitled "GNU Free
+    Documentation License".
 
-If you have no :ref:`Invariant Sections <fdl-invariant>`, write “with
-no Invariant Sections” instead of saying which ones are invariant. If
-you have no :ref:`Front-Cover Texts <fdl-cover-texts>`, write “no
-Front-Cover Texts” instead of “Front-Cover Texts being LIST”; likewise
+If you have no :ref:`Invariant Sections <fdl-invariant>`, write "with
+no Invariant Sections" instead of saying which ones are invariant. If
+you have no :ref:`Front-Cover Texts <fdl-cover-texts>`, write "no
+Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise
 for :ref:`Back-Cover Texts <fdl-cover-texts>`.
 
 If your document contains nontrivial examples of program code, we
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 15/40] docs: userspace-api: media: v4l: Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2021-05-12 12:50 ` [PATCH v2 14/40] docs: userspace-api: media: fdl-appendix.rst: " Mauro Carvalho Chehab
@ 2021-05-12 12:50 ` Mauro Carvalho Chehab
  2021-05-12 12:50 ` [PATCH v2 16/40] docs: userspace-api: media: dvb: " Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 12:50 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Hans Verkuil,
	Mauro Carvalho Chehab, Michael Tretter, Sakari Ailus,
	Stanimir Varbanov, Tomasz Figa, linux-kernel, linux-media

The conversion tools used during DocBook/LaTeX/Markdown->ReST conversion
and some automatic rules which exists on certain text editors like
LibreOffice turned ASCII characters into some UTF-8 alternatives that
are better displayed on html and PDF.

While it is OK to use UTF-8 characters in Linux, it is better to
use the ASCII subset instead of using an UTF-8 equivalent character
as it makes life easier for tools like grep, and are easier to edit
with the some commonly used text/source code editors.

Also, Sphinx already do such conversion automatically outside literal blocks:
   https://docutils.sourceforge.io/docs/user/smartquotes.html

So, replace the occurences of the following UTF-8 characters:

	- U+00a0 (' '): NO-BREAK SPACE
	- U+00d7 ('×'): MULTIPLICATION SIGN
	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/userspace-api/media/v4l/crop.rst   | 16 ++++++++--------
 .../userspace-api/media/v4l/dev-decoder.rst      |  6 +++---
 .../userspace-api/media/v4l/diff-v4l.rst         |  2 +-
 Documentation/userspace-api/media/v4l/open.rst   |  2 +-
 .../userspace-api/media/v4l/vidioc-cropcap.rst   |  4 ++--
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/crop.rst b/Documentation/userspace-api/media/v4l/crop.rst
index 3fe185e25ccf..23c2b71f449e 100644
--- a/Documentation/userspace-api/media/v4l/crop.rst
+++ b/Documentation/userspace-api/media/v4l/crop.rst
@@ -130,22 +130,22 @@ the driver state and therefore only adjust the requested rectangle.
 
 Suppose scaling on a video capture device is restricted to a factor 1:1
 or 2:1 in either direction and the target image size must be a multiple
-of 16 × 16 pixels. The source cropping rectangle is set to defaults,
-which are also the upper limit in this example, of 640 × 400 pixels at
-offset 0, 0. An application requests an image size of 300 × 225 pixels,
+of 16 x 16 pixels. The source cropping rectangle is set to defaults,
+which are also the upper limit in this example, of 640 x 400 pixels at
+offset 0, 0. An application requests an image size of 300 x 225 pixels,
 assuming video will be scaled down from the "full picture" accordingly.
-The driver sets the image size to the closest possible values 304 × 224,
+The driver sets the image size to the closest possible values 304 x 224,
 then chooses the cropping rectangle closest to the requested size, that
-is 608 × 224 (224 × 2:1 would exceed the limit 400). The offset 0, 0 is
+is 608 x 224 (224 x 2:1 would exceed the limit 400). The offset 0, 0 is
 still valid, thus unmodified. Given the default cropping rectangle
 reported by :ref:`VIDIOC_CROPCAP <VIDIOC_CROPCAP>` the application can
 easily propose another offset to center the cropping rectangle.
 
 Now the application may insist on covering an area using a picture
 aspect ratio closer to the original request, so it asks for a cropping
-rectangle of 608 × 456 pixels. The present scaling factors limit
-cropping to 640 × 384, so the driver returns the cropping size 608 × 384
-and adjusts the image size to closest possible 304 × 192.
+rectangle of 608 x 456 pixels. The present scaling factors limit
+cropping to 640 x 384, so the driver returns the cropping size 608 x 384
+and adjusts the image size to closest possible 304 x 192.
 
 
 Examples
diff --git a/Documentation/userspace-api/media/v4l/dev-decoder.rst b/Documentation/userspace-api/media/v4l/dev-decoder.rst
index 3d4138a4ba69..5b9b83feeceb 100644
--- a/Documentation/userspace-api/media/v4l/dev-decoder.rst
+++ b/Documentation/userspace-api/media/v4l/dev-decoder.rst
@@ -38,7 +38,7 @@ Conventions and Notations Used in This Document
 6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
    [0..2]: i = 0, 1, 2.
 
-7. Given an ``OUTPUT`` buffer A, then A’ represents a buffer on the ``CAPTURE``
+7. Given an ``OUTPUT`` buffer A, then A' represents a buffer on the ``CAPTURE``
    queue containing data that resulted from processing buffer A.
 
 .. _decoder-glossary:
@@ -288,7 +288,7 @@ Initialization
 
       Changing the ``OUTPUT`` format may change the currently set ``CAPTURE``
       format. How the new ``CAPTURE`` format is determined is up to the decoder
-      and the client must ensure it matches its needs afterwards.
+      and the client must ensure it matches its needs afterwards.
 
 2.  Allocate source (bytestream) buffers via :c:func:`VIDIOC_REQBUFS` on
     ``OUTPUT``.
@@ -874,7 +874,7 @@ it may be affected as per normal decoder operation.
 
    any of the following results on the ``CAPTURE`` queue is allowed:
 
-     {A’, B’, G’, H’}, {A’, G’, H’}, {G’, H’}.
+     {A', B', G', H'}, {A', G', H'}, {G', H'}.
 
    To determine the CAPTURE buffer containing the first decoded frame after the
    seek, the client may observe the timestamps to match the CAPTURE and OUTPUT
diff --git a/Documentation/userspace-api/media/v4l/diff-v4l.rst b/Documentation/userspace-api/media/v4l/diff-v4l.rst
index 33243ecb5033..9ce60e625974 100644
--- a/Documentation/userspace-api/media/v4l/diff-v4l.rst
+++ b/Documentation/userspace-api/media/v4l/diff-v4l.rst
@@ -447,7 +447,7 @@ name ``V4L2_FBUF_FLAG_CHROMAKEY``.
 
 In V4L, storing a bitmap pointer in ``clips`` and setting ``clipcount``
 to ``VIDEO_CLIP_BITMAP`` (-1) requests bitmap clipping, using a fixed
-size bitmap of 1024 × 625 bits. Struct :c:type:`v4l2_window`
+size bitmap of 1024 x 625 bits. Struct :c:type:`v4l2_window`
 has a separate ``bitmap`` pointer field for this purpose and the bitmap
 size is determined by ``w.width`` and ``w.height``.
 
diff --git a/Documentation/userspace-api/media/v4l/open.rst b/Documentation/userspace-api/media/v4l/open.rst
index 18bfb9b8137d..b015bbbdf8b5 100644
--- a/Documentation/userspace-api/media/v4l/open.rst
+++ b/Documentation/userspace-api/media/v4l/open.rst
@@ -100,7 +100,7 @@ Where ``X`` is a non-negative integer.
 	$ tree /dev/v4l
 	/dev/v4l
 	├── by-id
-	│   └── usb-OmniVision._USB_Camera-B4.04.27.1-video-index0 -> ../../video0
+	│   └── usb-OmniVision._USB_Camera-B4.04.27.1-video-index0 -> ../../video0
 	└── by-path
 	    └── pci-0000:00:14.0-usb-0:2:1.0-video-index0 -> ../../video0
 
diff --git a/Documentation/userspace-api/media/v4l/vidioc-cropcap.rst b/Documentation/userspace-api/media/v4l/vidioc-cropcap.rst
index 551ac9d3c6ef..4ea652e66401 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-cropcap.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-cropcap.rst
@@ -69,8 +69,8 @@ overlay devices.
     * - struct :ref:`v4l2_rect <v4l2-rect-crop>`
       - ``defrect``
       - Default cropping rectangle, it shall cover the "whole picture".
-	Assuming pixel aspect 1/1 this could be for example a 640 × 480
-	rectangle for NTSC, a 768 × 576 rectangle for PAL and SECAM
+	Assuming pixel aspect 1/1 this could be for example a 640 x 480
+	rectangle for NTSC, a 768 x 576 rectangle for PAL and SECAM
 	centered over the active picture area. The same co-ordinate system
 	as for ``bounds`` is used.
     * - struct :c:type:`v4l2_fract`
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 16/40] docs: userspace-api: media: dvb: Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2021-05-12 12:50 ` [PATCH v2 15/40] docs: userspace-api: media: v4l: " Mauro Carvalho Chehab
@ 2021-05-12 12:50 ` Mauro Carvalho Chehab
  2021-05-12 14:14 ` [PATCH v2 00/40] " Theodore Ts'o
  2021-05-12 17:07 ` David Woodhouse
  6 siblings, 0 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 12:50 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Mauro Carvalho Chehab,
	Randy Dunlap, linux-kernel, linux-media

The conversion tools used during DocBook/LaTeX/Markdown->ReST conversion
and some automatic rules which exists on certain text editors like
LibreOffice turned ASCII characters into some UTF-8 alternatives that
are better displayed on html and PDF.

While it is OK to use UTF-8 characters in Linux, it is better to
use the ASCII subset instead of using an UTF-8 equivalent character
as it makes life easier for tools like grep, and are easier to edit
with the some commonly used text/source code editors.

Also, Sphinx already do such conversion automatically outside literal blocks:
   https://docutils.sourceforge.io/docs/user/smartquotes.html

So, replace the occurences of the following UTF-8 characters:

	- U+00a0 (' '): NO-BREAK SPACE
	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 .../userspace-api/media/dvb/audio-set-bypass-mode.rst       | 2 +-
 Documentation/userspace-api/media/dvb/audio.rst             | 2 +-
 Documentation/userspace-api/media/dvb/dmx-fopen.rst         | 2 +-
 Documentation/userspace-api/media/dvb/dmx-fread.rst         | 2 +-
 Documentation/userspace-api/media/dvb/dmx-set-filter.rst    | 2 +-
 Documentation/userspace-api/media/dvb/intro.rst             | 6 +++---
 Documentation/userspace-api/media/dvb/video.rst             | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/userspace-api/media/dvb/audio-set-bypass-mode.rst b/Documentation/userspace-api/media/dvb/audio-set-bypass-mode.rst
index ecac02f1b2fc..80d551a2053a 100644
--- a/Documentation/userspace-api/media/dvb/audio-set-bypass-mode.rst
+++ b/Documentation/userspace-api/media/dvb/audio-set-bypass-mode.rst
@@ -50,7 +50,7 @@ Description
 
 This ioctl call asks the Audio Device to bypass the Audio decoder and
 forward the stream without decoding. This mode shall be used if streams
-that can’t be handled by the Digital TV system shall be decoded. Dolby
+that can't be handled by the Digital TV system shall be decoded. Dolby
 DigitalTM streams are automatically forwarded by the Digital TV subsystem if
 the hardware can handle it.
 
diff --git a/Documentation/userspace-api/media/dvb/audio.rst b/Documentation/userspace-api/media/dvb/audio.rst
index eaae5675a47d..aa753336b31f 100644
--- a/Documentation/userspace-api/media/dvb/audio.rst
+++ b/Documentation/userspace-api/media/dvb/audio.rst
@@ -11,7 +11,7 @@ TV hardware. It can be accessed through ``/dev/dvb/adapter?/audio?``. Data
 types and ioctl definitions can be accessed by including
 ``linux/dvb/audio.h`` in your application.
 
-Please note that some Digital TV cards don’t have their own MPEG decoder, which
+Please note that some Digital TV cards don't have their own MPEG decoder, which
 results in the omission of the audio and video device.
 
 These ioctls were also used by V4L2 to control MPEG decoders implemented
diff --git a/Documentation/userspace-api/media/dvb/dmx-fopen.rst b/Documentation/userspace-api/media/dvb/dmx-fopen.rst
index 8f0a2b831d4a..50b36eb4371e 100644
--- a/Documentation/userspace-api/media/dvb/dmx-fopen.rst
+++ b/Documentation/userspace-api/media/dvb/dmx-fopen.rst
@@ -82,7 +82,7 @@ appropriately.
     :widths: 1 16
 
     -  -  ``EMFILE``
-       -  “Too many open files”, i.e. no more filters available.
+       -  "Too many open files", i.e. no more filters available.
 
 The generic error codes are described at the
 :ref:`Generic Error Codes <gen-errors>` chapter.
diff --git a/Documentation/userspace-api/media/dvb/dmx-fread.rst b/Documentation/userspace-api/media/dvb/dmx-fread.rst
index 78e9daef595a..88c4cddf7c30 100644
--- a/Documentation/userspace-api/media/dvb/dmx-fread.rst
+++ b/Documentation/userspace-api/media/dvb/dmx-fread.rst
@@ -34,7 +34,7 @@ Description
 
 This system call returns filtered data, which might be section or Packetized
 Elementary Stream (PES) data. The filtered data is transferred from
-the driver’s internal circular buffer to ``buf``. The maximum amount of data
+the driver's internal circular buffer to ``buf``. The maximum amount of data
 to be transferred is implied by count.
 
 .. note::
diff --git a/Documentation/userspace-api/media/dvb/dmx-set-filter.rst b/Documentation/userspace-api/media/dvb/dmx-set-filter.rst
index f43455b7adae..1b8c8071b14f 100644
--- a/Documentation/userspace-api/media/dvb/dmx-set-filter.rst
+++ b/Documentation/userspace-api/media/dvb/dmx-set-filter.rst
@@ -37,7 +37,7 @@ parameters provided. A timeout may be defined stating number of seconds
 to wait for a section to be loaded. A value of 0 means that no timeout
 should be applied. Finally there is a flag field where it is possible to
 state whether a section should be CRC-checked, whether the filter should
-be a ”one-shot” filter, i.e. if the filtering operation should be
+be a "one-shot" filter, i.e. if the filtering operation should be
 stopped after the first section is received, and whether the filtering
 operation should be started immediately (without waiting for a
 :ref:`DMX_START` ioctl call). If a filter was previously set-up, this
diff --git a/Documentation/userspace-api/media/dvb/intro.rst b/Documentation/userspace-api/media/dvb/intro.rst
index a935f3914e56..6784ae79657c 100644
--- a/Documentation/userspace-api/media/dvb/intro.rst
+++ b/Documentation/userspace-api/media/dvb/intro.rst
@@ -107,7 +107,7 @@ Audio and video decoder
       a Systems on a Chip (SoC) integrated circuit.
 
       It may also not be needed for certain usages (e.g. for data-only
-      uses like “internet over satellite”).
+      uses like "internet over satellite").
 
 :ref:`stb_components` shows a crude schematic of the control and data
 flow between those components.
@@ -148,9 +148,9 @@ individual devices are called:
 
 -  ``/dev/dvb/adapterN/caM``,
 
-where ``N`` enumerates the Digital TV cards in a system starting from 0, and
+where ``N`` enumerates the Digital TV cards in a system starting from 0, and
 ``M`` enumerates the devices of each type within each adapter, starting
-from 0, too. We will omit the “``/dev/dvb/adapterN/``\ ” in the further
+from 0, too. We will omit the "``/dev/dvb/adapterN/``\ " in the further
 discussion of these devices.
 
 More details about the data structures and function calls of all the
diff --git a/Documentation/userspace-api/media/dvb/video.rst b/Documentation/userspace-api/media/dvb/video.rst
index 38a8d39a1d25..808705b769a1 100644
--- a/Documentation/userspace-api/media/dvb/video.rst
+++ b/Documentation/userspace-api/media/dvb/video.rst
@@ -16,7 +16,7 @@ stream, not its presentation on the TV or computer screen. On PCs this
 is typically handled by an associated video4linux device, e.g.
 **/dev/video**, which allows scaling and defining output windows.
 
-Some Digital TV cards don’t have their own MPEG decoder, which results in the
+Some Digital TV cards don't have their own MPEG decoder, which results in the
 omission of the audio and video device as well as the video4linux
 device.
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2021-05-12 12:50 ` [PATCH v2 16/40] docs: userspace-api: media: dvb: " Mauro Carvalho Chehab
@ 2021-05-12 14:14 ` Theodore Ts'o
  2021-05-12 15:17   ` Mauro Carvalho Chehab
  2021-05-12 17:07 ` David Woodhouse
  6 siblings, 1 reply; 18+ messages in thread
From: Theodore Ts'o @ 2021-05-12 14:14 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

On Wed, May 12, 2021 at 02:50:04PM +0200, Mauro Carvalho Chehab wrote:
> v2:
> - removed EM/EN DASH conversion from this patchset;

Are you still thinking about doing the

EN DASH --> "--"
EM DASH --> "---"

conversion?  That's not going to change what the documentation will
look like in the HTML and PDF output forms, and I think it would make
life easier for people are reading and editing the Documentation/*
files in text form.

				- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 14:14 ` [PATCH v2 00/40] " Theodore Ts'o
@ 2021-05-12 15:17   ` Mauro Carvalho Chehab
  2021-05-12 17:12     ` David Woodhouse
  0 siblings, 1 reply; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-12 15:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

Em Wed, 12 May 2021 10:14:44 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Wed, May 12, 2021 at 02:50:04PM +0200, Mauro Carvalho Chehab wrote:
> > v2:
> > - removed EM/EN DASH conversion from this patchset;  
> 
> Are you still thinking about doing the
> 
> EN DASH --> "--"
> EM DASH --> "---"
> 
> conversion?  

Yes, but I intend to submit it on a separate patch series, probably after
having this one merged. Let's first cleanup the large part of the 
conversion-generated UTF-8 char noise ;-)

> That's not going to change what the documentation will
> look like in the HTML and PDF output forms, and I think it would make
> life easier for people are reading and editing the Documentation/*
> files in text form.

Agreed. I'm also considering to add a couple of cases of this char:

	- U+2026 ('…'): HORIZONTAL ELLIPSIS

As Sphinx also replaces "..." into HORIZONTAL ELLIPSIS.

-

Anyway, I'm opting to submitting those in separate because it seems
that at least some maintainers added EM/EN DASH intentionally.

So, it may generate case-per-case discussions.

Also, IMO, at least a couple of EN/EM DASH cases would be better served 
with a single hyphen.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2021-05-12 14:14 ` [PATCH v2 00/40] " Theodore Ts'o
@ 2021-05-12 17:07 ` David Woodhouse
  2021-05-14  8:21   ` Mauro Carvalho Chehab
  6 siblings, 1 reply; 18+ messages in thread
From: David Woodhouse @ 2021-05-12 17:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Linux Doc Mailing List
  Cc: linux-kernel, Jonathan Corbet, Mali DP Maintainers, alsa-devel,
	coresight, dri-devel, intel-gfx, intel-wired-lan, keyrings, kvm,
	linux-acpi, linux-arm-kernel, linux-edac, linux-ext4,
	linux-f2fs-devel, linux-hwmon, linux-iio, linux-input,
	linux-integrity, linux-media, linux-pci, linux-pm, linux-rdma,
	linux-sgx, linux-usb, mjpeg-users, netdev, rcu

[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]

Your title 'Use ASCII subset' is now at least a bit *closer* to
describing what the patches are actually doing, but it's still a bit
misleading because you're only doing it for *some* characters.

And the wording is still indicative of a fundamentally *misguided*
motivation for doing any of this. Your commit comments should be about
fixing a specific thing, nothing to do with "use ASCII subset", which
is pointless in itself.

On Wed, 2021-05-12 at 14:50 +0200, Mauro Carvalho Chehab wrote:
> Such conversion tools - plus some text editor like LibreOffice  or similar  - have
> a set of rules that turns some typed ASCII characters into UTF-8 alternatives,
> for instance converting commas into curly commas and adding non-breakable
> spaces. All of those are meant to produce better results when the text is
> displayed in HTML or PDF formats.

And don't we render our documentation into HTML or PDF formats? Are
some of those non-breaking spaces not actually *useful* for their
intended purpose?

> While it is perfectly fine to use UTF-8 characters in Linux, and specially at
> the documentation,  it is better to  stick to the ASCII subset  on such
> particular case,  due to a couple of reasons:
> 
> 1. it makes life easier for tools like grep;

Barely, as noted, because of things like line feeds.

> 2. they easier to edit with the some commonly used text/source
>    code editors.

That is nonsense. Any but the most broken and/or anachronistic
environments and editors will be just fine.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 15:17   ` Mauro Carvalho Chehab
@ 2021-05-12 17:12     ` David Woodhouse
  0 siblings, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2021-05-12 17:12 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Theodore Ts'o
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

[-- Attachment #1: Type: text/plain, Size: 1744 bytes --]

On Wed, 2021-05-12 at 17:17 +0200, Mauro Carvalho Chehab wrote:
> Em Wed, 12 May 2021 10:14:44 -0400
> "Theodore Ts'o" <tytso@mit.edu> escreveu:
> 
> > On Wed, May 12, 2021 at 02:50:04PM +0200, Mauro Carvalho Chehab wrote:
> > > v2:
> > > - removed EM/EN DASH conversion from this patchset;  
> > 
> > Are you still thinking about doing the
> > 
> > EN DASH --> "--"
> > EM DASH --> "---"
> > 
> > conversion?  
> 
> Yes, but I intend to submit it on a separate patch series, probably after
> having this one merged. Let's first cleanup the large part of the 
> conversion-generated UTF-8 char noise ;-)
> 
> > That's not going to change what the documentation will
> > look like in the HTML and PDF output forms, and I think it would make
> > life easier for people are reading and editing the Documentation/*
> > files in text form.
> 
> Agreed. I'm also considering to add a couple of cases of this char:
> 
> 	- U+2026 ('…'): HORIZONTAL ELLIPSIS
> 
> As Sphinx also replaces "..." into HORIZONTAL ELLIPSIS.

Er, what?

The *only* part of this whole enterprise that actually seemed to make
even a tiny bit of sense — rather than seeming like a thinly veiled
retrospective excuse for dragging us back in time by 30 years — was the
bit about making it easier to grep.

But if I understand you correctly, you're talking about using something
like C trigraphs to represent the perfectly reasonable text emdash
character ("—") as two hyphen-minuses ("--") in the source code of the
documentation? Isn't that going to achieve precisely the *opposite*? If
I select some text in the HTML output of the docs and then search for
it in the source code, that's going to *stop* it matching my search?


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-12 17:07 ` David Woodhouse
@ 2021-05-14  8:21   ` Mauro Carvalho Chehab
  2021-05-14  9:06     ` David Woodhouse
  0 siblings, 1 reply; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-14  8:21 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

Em Wed, 12 May 2021 18:07:04 +0100
David Woodhouse <dwmw2@infradead.org> escreveu:

> On Wed, 2021-05-12 at 14:50 +0200, Mauro Carvalho Chehab wrote:
> > Such conversion tools - plus some text editor like LibreOffice  or similar  - have
> > a set of rules that turns some typed ASCII characters into UTF-8 alternatives,
> > for instance converting commas into curly commas and adding non-breakable
> > spaces. All of those are meant to produce better results when the text is
> > displayed in HTML or PDF formats.  
> 
> And don't we render our documentation into HTML or PDF formats? 

Yes.

> Are
> some of those non-breaking spaces not actually *useful* for their
> intended purpose?

No.

The thing is: non-breaking space can cause a lot of problems.

We even had to disable Sphinx usage of non-breaking space for
PDF outputs, as this was causing bad LaTeX/PDF outputs.

See, commit: 3b4c963243b1 ("docs: conf.py: adjust the LaTeX document output")

The afore mentioned patch disables Sphinx default behavior of
using NON-BREAKABLE SPACE on literal blocks and strings, using this
special setting: "parsedliteralwraps=true".

When NON-BREAKABLE SPACE were used on PDF outputs, several parts of 
the media uAPI docs were violating the document margins by far,
causing texts to be truncated.

So, please **don't add NON-BREAKABLE SPACE**, unless you test
(and keep testing it from time to time) if outputs on all
formats are properly supporting it on different Sphinx versions.

-

Also, most of those came from conversion tools, together with other
eccentricities, like the usage of U+FEFF (BOM) character at the
start of some documents. The remaining ones seem to came from 
cut-and-paste.

For instance,  bibliographic references (there are a couple of
those on media) sometimes have NON-BREAKABLE SPACE. I'm pretty
sure that those came from cut-and-pasting the document titles
from their names at the original PDF documents or web pages that
are referenced.

> > While it is perfectly fine to use UTF-8 characters in Linux, and specially at
> > the documentation,  it is better to  stick to the ASCII subset  on such
> > particular case,  due to a couple of reasons:
> > 
> > 1. it makes life easier for tools like grep;  
> 
> Barely, as noted, because of things like line feeds.

You can use grep with "-z" to seek for multi-line strings(*), Like:

	$ grep -Pzl 'grace period started,\s*then' $(find Documentation/ -type f)
	Documentation/RCU/Design/Data-Structures/Data-Structures.rst

(*) Unfortunately, while "git grep" also has a "-z" flag, it
    seems that this is (currently?) broken with regards of handling multilines:

	$ git grep -Pzl 'grace period started,\s*then'
	$

> > 2. they easier to edit with the some commonly used text/source
> >    code editors.  
> 
> That is nonsense. Any but the most broken and/or anachronistic
> environments and editors will be just fine.

Not really.

I do use a lot of UTF-8 here, as I type texts in Portuguese, but I rely
on the US-intl keyboard settings, that allow me to type as "'a" for á.
However, there's no shortcut for non-Latin UTF-codes, as far as I know.

So, if would need to type a curly comma on the text editors I normally 
use for development (vim, nano, kate), I would need to cut-and-paste
it from somewhere[1].

[1] If I have a table with UTF-8 codes handy, I could type the UTF-8 
    number manually... However, it seems that this is currently broken 
    at least on Fedora 33 (with Mate Desktop and US intl keyboard with 
    dead keys).

    Here, <CTRL><SHIFT>U is not working. No idea why. I haven't 
    test it for *years*, as I din't see any reason why I would
    need to type UTF-8 characters by numbers until we started
    this thread.
 
In practice, on the very rare cases where I needed to write
non-Latin utf-8 chars (maybe once in a year or so, Like when I
would need to use a Greek letter or some weird symbol), there changes
are high that I wouldn't remember its UTF-8 code.

So, If I need to spend time to seek for an specific symbol, after
finding it, I just cut-and-paste it.

But even in the best case scenario where I know the UTF-8 and
<CTRL><SHIFT>U works, if I wanted to use, for instance, a curly
comma, the keystroke sequence would be:

	<CTRL><SHIFT>U201csome string<CTRL><SHIFT>U201d

That's a lot harder than typing and has a higher chances of
mistakenly add a wrong symbol than just typing:

	"some string"

Knowing that both will produce *exactly* the same output, why
should I bother doing it the hard way?

-

Now, I'm not arguing that you can't use whatever UTF-8 symbol you
want on your docs. I'm just saying that, now that the conversion 
is over and a lot of documents ended getting some UTF-8 characters
by accident, it is time for a cleanup.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-14  8:21   ` Mauro Carvalho Chehab
@ 2021-05-14  9:06     ` David Woodhouse
  2021-05-14 11:08       ` Edward Cree
  2021-05-15  8:22       ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 18+ messages in thread
From: David Woodhouse @ 2021-05-14  9:06 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

[-- Attachment #1: Type: text/plain, Size: 6843 bytes --]

On Fri, 2021-05-14 at 10:21 +0200, Mauro Carvalho Chehab wrote:
> Em Wed, 12 May 2021 18:07:04 +0100
> David Woodhouse <dwmw2@infradead.org> escreveu:
> 
> > On Wed, 2021-05-12 at 14:50 +0200, Mauro Carvalho Chehab wrote:
> > > Such conversion tools - plus some text editor like LibreOffice  or similar  - have
> > > a set of rules that turns some typed ASCII characters into UTF-8 alternatives,
> > > for instance converting commas into curly commas and adding non-breakable
> > > spaces. All of those are meant to produce better results when the text is
> > > displayed in HTML or PDF formats.  
> > 
> > And don't we render our documentation into HTML or PDF formats? 
> 
> Yes.
> 
> > Are
> > some of those non-breaking spaces not actually *useful* for their
> > intended purpose?
> 
> No.
> 
> The thing is: non-breaking space can cause a lot of problems.
> 
> We even had to disable Sphinx usage of non-breaking space for
> PDF outputs, as this was causing bad LaTeX/PDF outputs.
> 
> See, commit: 3b4c963243b1 ("docs: conf.py: adjust the LaTeX document output")
> 
> The afore mentioned patch disables Sphinx default behavior of
> using NON-BREAKABLE SPACE on literal blocks and strings, using this
> special setting: "parsedliteralwraps=true".
> 
> When NON-BREAKABLE SPACE were used on PDF outputs, several parts of 
> the media uAPI docs were violating the document margins by far,
> causing texts to be truncated.
> 
> So, please **don't add NON-BREAKABLE SPACE**, unless you test
> (and keep testing it from time to time) if outputs on all
> formats are properly supporting it on different Sphinx versions.

And there you have a specific change with a specific fix. Nothing to do
with whether NON-BREAKABLE SPACE is ∉ ASCII, and *certainly* nothing to
do with the fact that, like *every* character in every kernel file
except the *binary* files, it's representable in UTF-8.

By all means fix the specific characters which are typographically
wrong or which, like NON-BREAKABLE SPACE, cause problems for rendering
the documentation.


> Also, most of those came from conversion tools, together with other
> eccentricities, like the usage of U+FEFF (BOM) character at the
> start of some documents. The remaining ones seem to came from 
> cut-and-paste.

... or which are just entirely redundant and gratuitous, like a BOM in
an environment where all files are UTF-8 and never 16-bit encodings
anyway.

> > > While it is perfectly fine to use UTF-8 characters in Linux, and specially at
> > > the documentation,  it is better to  stick to the ASCII subset  on such
> > > particular case,  due to a couple of reasons:
> > > 
> > > 1. it makes life easier for tools like grep;  
> > 
> > Barely, as noted, because of things like line feeds.
> 
> You can use grep with "-z" to seek for multi-line strings(*), Like:
> 
> 	$ grep -Pzl 'grace period started,\s*then' $(find Documentation/ -type f)
> 	Documentation/RCU/Design/Data-Structures/Data-Structures.rst

Yeah, right. That works if you don't just use the text that you'll have
seen in the HTML/PDF "grace period started, then", and if you instead
craft a *regex* for it, replacing the spaces with '\s*'. Or is that
[[:space:]]* if you don't want to use the experimental Perl regex
feature?

 $ grep -zlr 'grace[[:space:]]\+period[[:space:]]\+started,[[:space:]]\+then' Documentation/RCU
Documentation/RCU/Design/Data-Structures/Data-Structures.rst

And without '-l' it'll obviously just give you the whole file. No '-A5
-B5' to see the surroundings... it's hardly a useful thing, is it?

> (*) Unfortunately, while "git grep" also has a "-z" flag, it
>     seems that this is (currently?) broken with regards of handling multilines:
> 
> 	$ git grep -Pzl 'grace period started,\s*then'
> 	$

Even better. So no, multiline grep isn't really a commonly usable
feature at all.

This is why we prefer to put user-visible strings on one line in C
source code, even if it takes the lines over 80 characters — to allow
for grep to find them.

> > > 2. they easier to edit with the some commonly used text/source
> > >    code editors.  
> > 
> > That is nonsense. Any but the most broken and/or anachronistic
> > environments and editors will be just fine.
> 
> Not really.
> 
> I do use a lot of UTF-8 here, as I type texts in Portuguese, but I rely
> on the US-intl keyboard settings, that allow me to type as "'a" for á.
> However, there's no shortcut for non-Latin UTF-codes, as far as I know.
> 
> So, if would need to type a curly comma on the text editors I normally 
> use for development (vim, nano, kate), I would need to cut-and-paste
> it from somewhere[1].

That's entirely irrelevant. You don't need to be able to *type* every
character that you see in front of you, as long as your editor will
render it correctly and perhaps let you cut/paste it as you're editing
the document if you're moving things around.

> [1] If I have a table with UTF-8 codes handy, I could type the UTF-8 
>     number manually... However, it seems that this is currently broken 
>     at least on Fedora 33 (with Mate Desktop and US intl keyboard with 
>     dead keys).
> 
>     Here, <CTRL><SHIFT>U is not working. No idea why. I haven't 
>     test it for *years*, as I din't see any reason why I would
>     need to type UTF-8 characters by numbers until we started
>     this thread.

Please provide the bug number for this; I'd like to track it.

> But even in the best case scenario where I know the UTF-8 and
> <CTRL><SHIFT>U works, if I wanted to use, for instance, a curly
> comma, the keystroke sequence would be:
> 
> 	<CTRL><SHIFT>U201csome string<CTRL><SHIFT>U201d
> 
> That's a lot harder than typing and has a higher chances of
> mistakenly add a wrong symbol than just typing:
> 
> 	"some string"
> 
> Knowing that both will produce *exactly* the same output, why
> should I bother doing it the hard way?

Nobody's asked you to do it the "hard way". That's completely
irrelevant to the discussion we were having.

> Now, I'm not arguing that you can't use whatever UTF-8 symbol you
> want on your docs. I'm just saying that, now that the conversion 
> is over and a lot of documents ended getting some UTF-8 characters
> by accident, it is time for a cleanup.

All text documents are *full* of UTF-8 characters. If there is a file
in the source code which has *any* non-UTF8, we call that a 'binary
file'.

Again, if you want to make specific fixes like removing non-breaking
spaces and byte order marks, with specific reasons, then those make
sense. But it's got very little to do with UTF-8 and how easy it is to
type them. And the excuse you've put in the commit comment for your
patches is utterly bogus.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-14  9:06     ` David Woodhouse
@ 2021-05-14 11:08       ` Edward Cree
  2021-05-14 14:18         ` Mauro Carvalho Chehab
  2021-05-15  8:22       ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 18+ messages in thread
From: Edward Cree @ 2021-05-14 11:08 UTC (permalink / raw)
  To: David Woodhouse, Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, dri-devel, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

> On Fri, 2021-05-14 at 10:21 +0200, Mauro Carvalho Chehab wrote:
>> I do use a lot of UTF-8 here, as I type texts in Portuguese, but I rely
>> on the US-intl keyboard settings, that allow me to type as "'a" for á.
>> However, there's no shortcut for non-Latin UTF-codes, as far as I know.
>>
>> So, if would need to type a curly comma on the text editors I normally 
>> use for development (vim, nano, kate), I would need to cut-and-paste
>> it from somewhere

For anyone who doesn't know about it: X has this wonderful thing called
 the Compose key[1].  For instance, type ⎄--- to get —, or ⎄<" for “.
Much more mnemonic than Unicode codepoints; and you can extend it with
 user-defined sequences in your ~/.XCompose file.
(I assume Wayland supports all this too, but don't know the details.)

On 14/05/2021 10:06, David Woodhouse wrote:
> Again, if you want to make specific fixes like removing non-breaking
> spaces and byte order marks, with specific reasons, then those make
> sense. But it's got very little to do with UTF-8 and how easy it is to
> type them. And the excuse you've put in the commit comment for your
> patches is utterly bogus.

+1

-ed

[1] https://en.wikipedia.org/wiki/Compose_key

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-14 11:08       ` Edward Cree
@ 2021-05-14 14:18         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-14 14:18 UTC (permalink / raw)
  To: Edward Cree
  Cc: David Woodhouse, Linux Doc Mailing List, linux-kernel,
	Jonathan Corbet, Mali DP Maintainers, alsa-devel, coresight,
	dri-devel, intel-gfx, intel-wired-lan, keyrings, kvm, linux-acpi,
	linux-arm-kernel, linux-edac, linux-ext4, linux-f2fs-devel,
	linux-hwmon, linux-iio, linux-input, linux-integrity,
	linux-media, linux-pci, linux-pm, linux-rdma, linux-sgx,
	linux-usb, mjpeg-users, netdev, rcu

Em Fri, 14 May 2021 12:08:36 +0100
Edward Cree <ecree.xilinx@gmail.com> escreveu:

> For anyone who doesn't know about it: X has this wonderful thing called
>  the Compose key[1].  For instance, type ⎄--- to get —, or ⎄<" for “.
> Much more mnemonic than Unicode codepoints; and you can extend it with
>  user-defined sequences in your ~/.XCompose file.

Good tip. I haven't use composite for years, as US-intl with dead keys is
enough for 99.999% of my needs. 

Btw, at least on Fedora with Mate, Composite is disabled by default. It has
to be enabled first using the same tool that allows changing the Keyboard
layout[1].

Yet, typing an EN DASH for example, would be "<composite>--.", with is 4
keystrokes instead of just two ('--'). It means twice the effort ;-)

[1] KDE, GNome, Mate, ... have different ways to enable it and to 
    select what key would be considered <composite>:

	https://dry.sailingissues.com/us-international-keyboard-layout.html
	https://help.ubuntu.com/community/ComposeKey

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-14  9:06     ` David Woodhouse
  2021-05-14 11:08       ` Edward Cree
@ 2021-05-15  8:22       ` Mauro Carvalho Chehab
  2021-05-15  9:24         ` David Woodhouse
  1 sibling, 1 reply; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-15  8:22 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

Em Fri, 14 May 2021 10:06:01 +0100
David Woodhouse <dwmw2@infradead.org> escreveu:

> On Fri, 2021-05-14 at 10:21 +0200, Mauro Carvalho Chehab wrote:
> > Em Wed, 12 May 2021 18:07:04 +0100
> > David Woodhouse <dwmw2@infradead.org> escreveu:
> >   
> > > On Wed, 2021-05-12 at 14:50 +0200, Mauro Carvalho Chehab wrote:  
> > > > Such conversion tools - plus some text editor like LibreOffice  or similar  - have
> > > > a set of rules that turns some typed ASCII characters into UTF-8 alternatives,
> > > > for instance converting commas into curly commas and adding non-breakable
> > > > spaces. All of those are meant to produce better results when the text is
> > > > displayed in HTML or PDF formats.    
> > > 
> > > And don't we render our documentation into HTML or PDF formats?   
> > 
> > Yes.
> >   
> > > Are
> > > some of those non-breaking spaces not actually *useful* for their
> > > intended purpose?  
> > 
> > No.
> > 
> > The thing is: non-breaking space can cause a lot of problems.
> > 
> > We even had to disable Sphinx usage of non-breaking space for
> > PDF outputs, as this was causing bad LaTeX/PDF outputs.
> > 
> > See, commit: 3b4c963243b1 ("docs: conf.py: adjust the LaTeX document output")
> > 
> > The afore mentioned patch disables Sphinx default behavior of
> > using NON-BREAKABLE SPACE on literal blocks and strings, using this
> > special setting: "parsedliteralwraps=true".
> > 
> > When NON-BREAKABLE SPACE were used on PDF outputs, several parts of 
> > the media uAPI docs were violating the document margins by far,
> > causing texts to be truncated.
> > 
> > So, please **don't add NON-BREAKABLE SPACE**, unless you test
> > (and keep testing it from time to time) if outputs on all
> > formats are properly supporting it on different Sphinx versions.  
> 
> And there you have a specific change with a specific fix. Nothing to do
> with whether NON-BREAKABLE SPACE is ∉ ASCII, and *certainly* nothing to
> do with the fact that, like *every* character in every kernel file
> except the *binary* files, it's representable in UTF-8.
> 
> By all means fix the specific characters which are typographically
> wrong or which, like NON-BREAKABLE SPACE, cause problems for rendering
> the documentation.
> 
> 
> > Also, most of those came from conversion tools, together with other
> > eccentricities, like the usage of U+FEFF (BOM) character at the
> > start of some documents. The remaining ones seem to came from 
> > cut-and-paste.  
> 
> ... or which are just entirely redundant and gratuitous, like a BOM in
> an environment where all files are UTF-8 and never 16-bit encodings
> anyway.

Agreed.

> 
> > > > While it is perfectly fine to use UTF-8 characters in Linux, and specially at
> > > > the documentation,  it is better to  stick to the ASCII subset  on such
> > > > particular case,  due to a couple of reasons:
> > > > 
> > > > 1. it makes life easier for tools like grep;    
> > > 
> > > Barely, as noted, because of things like line feeds.  
> > 
> > You can use grep with "-z" to seek for multi-line strings(*), Like:
> > 
> > 	$ grep -Pzl 'grace period started,\s*then' $(find Documentation/ -type f)
> > 	Documentation/RCU/Design/Data-Structures/Data-Structures.rst  
> 
> Yeah, right. That works if you don't just use the text that you'll have
> seen in the HTML/PDF "grace period started, then", and if you instead
> craft a *regex* for it, replacing the spaces with '\s*'. Or is that
> [[:space:]]* if you don't want to use the experimental Perl regex
> feature?
> 
>  $ grep -zlr 'grace[[:space:]]\+period[[:space:]]\+started,[[:space:]]\+then' Documentation/RCU
> Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> 
> And without '-l' it'll obviously just give you the whole file. No '-A5
> -B5' to see the surroundings... it's hardly a useful thing, is it?
> 
> > (*) Unfortunately, while "git grep" also has a "-z" flag, it
> >     seems that this is (currently?) broken with regards of handling multilines:
> > 
> > 	$ git grep -Pzl 'grace period started,\s*then'
> > 	$  
> 
> Even better. So no, multiline grep isn't really a commonly usable
> feature at all.
> 
> This is why we prefer to put user-visible strings on one line in C
> source code, even if it takes the lines over 80 characters — to allow
> for grep to find them.

Makes sense, but in case of documentation, this is a little more
complex than that. 

Btw, the theme used when building html by default[1] has a search
box (written in Javascript) that could be able to find multi-line
patterns, working somewhat similar to "git grep foo -a bar".

[1] https://github.com/readthedocs/sphinx_rtd_theme

> > [1] If I have a table with UTF-8 codes handy, I could type the UTF-8 
> >     number manually... However, it seems that this is currently broken 
> >     at least on Fedora 33 (with Mate Desktop and US intl keyboard with 
> >     dead keys).
> > 
> >     Here, <CTRL><SHIFT>U is not working. No idea why. I haven't 
> >     test it for *years*, as I din't see any reason why I would
> >     need to type UTF-8 characters by numbers until we started
> >     this thread.  
> 
> Please provide the bug number for this; I'd like to track it.

Just opened a BZ and added you as c/c.

> > Now, I'm not arguing that you can't use whatever UTF-8 symbol you
> > want on your docs. I'm just saying that, now that the conversion 
> > is over and a lot of documents ended getting some UTF-8 characters
> > by accident, it is time for a cleanup.  
> 
> All text documents are *full* of UTF-8 characters. If there is a file
> in the source code which has *any* non-UTF8, we call that a 'binary
> file'.
> 
> Again, if you want to make specific fixes like removing non-breaking
> spaces and byte order marks, with specific reasons, then those make
> sense. But it's got very little to do with UTF-8 and how easy it is to
> type them. And the excuse you've put in the commit comment for your
> patches is utterly bogus.

Let's take one step back, in order to return to the intents of this
UTF-8, as the discussions here are not centered into the patches, but
instead, on what to do and why.

-

This discussion started originally at linux-doc ML.

While discussing about an issue when machine's locale was not set
to UTF-8 on a build VM, we discovered that some converted docs ended
with BOM characters. Those specific changes were introduced by some
of my convert patches, probably converted via pandoc.

So, I went ahead in order to check what other possible weird things
were introduced by the conversion, where several scripts and tools
were used on files that had already a different markup.

I actually checked the current UTF-8 issues, and asked people at
linux-doc to comment what of those are valid usecases, and what
should be replaced by plain ASCII.

Basically, this is the current situation (at docs/docs-next), for the
ReST files under Documentation/, excluding translations is:

1. Spaces and BOM

	- U+00a0 (' '): NO-BREAK SPACE
	- U+feff (''): ZERO WIDTH NO-BREAK SPACE (BOM)

Based on the discussions there and on this thread, those should be
dropped, as BOM is useless and NO-BREAK SPACE can cause problems
at the html/pdf output;

2. Symbols

	- U+00a9 ('©'): COPYRIGHT SIGN
	- U+00ac ('¬'): NOT SIGN
	- U+00ae ('®'): REGISTERED SIGN
	- U+00b0 ('°'): DEGREE SIGN
	- U+00b1 ('±'): PLUS-MINUS SIGN
	- U+00b2 ('²'): SUPERSCRIPT TWO
	- U+00b5 ('µ'): MICRO SIGN
	- U+03bc ('μ'): GREEK SMALL LETTER MU
	- U+00b7 ('·'): MIDDLE DOT
	- U+00bd ('½'): VULGAR FRACTION ONE HALF
	- U+2122 ('™'): TRADE MARK SIGN
	- U+2264 ('≤'): LESS-THAN OR EQUAL TO
	- U+2265 ('≥'): GREATER-THAN OR EQUAL TO
	- U+2b0d ('⬍'): UP DOWN BLACK ARROW

Those seem OK on my eyes.

On a side note, both MICRO SIGN and GREEK SMALL LETTER MU are
used several docs to represent microseconds, micro-volts and
micro-ampères. If we write an orientation document, it probably
makes sense to recommend using MICRO SIGN on such cases.

3. Latin

	- U+00c7 ('Ç'): LATIN CAPITAL LETTER C WITH CEDILLA
	- U+00df ('ß'): LATIN SMALL LETTER SHARP S
	- U+00e1 ('á'): LATIN SMALL LETTER A WITH ACUTE
	- U+00e4 ('ä'): LATIN SMALL LETTER A WITH DIAERESIS
	- U+00e6 ('æ'): LATIN SMALL LETTER AE
	- U+00e7 ('ç'): LATIN SMALL LETTER C WITH CEDILLA
	- U+00e9 ('é'): LATIN SMALL LETTER E WITH ACUTE
	- U+00ea ('ê'): LATIN SMALL LETTER E WITH CIRCUMFLEX
	- U+00eb ('ë'): LATIN SMALL LETTER E WITH DIAERESIS
	- U+00f3 ('ó'): LATIN SMALL LETTER O WITH ACUTE
	- U+00f4 ('ô'): LATIN SMALL LETTER O WITH CIRCUMFLEX
	- U+00f6 ('ö'): LATIN SMALL LETTER O WITH DIAERESIS
	- U+00f8 ('ø'): LATIN SMALL LETTER O WITH STROKE
	- U+00fa ('ú'): LATIN SMALL LETTER U WITH ACUTE
	- U+00fc ('ü'): LATIN SMALL LETTER U WITH DIAERESIS
	- U+00fd ('ý'): LATIN SMALL LETTER Y WITH ACUTE
	- U+011f ('ğ'): LATIN SMALL LETTER G WITH BREVE
	- U+0142 ('ł'): LATIN SMALL LETTER L WITH STROKE

Those should be kept as well, as they're used for non-English names.

4. arrows and box drawing symbols:
	- U+2191 ('↑'): UPWARDS ARROW
	- U+2192 ('→'): RIGHTWARDS ARROW
	- U+2193 ('↓'): DOWNWARDS ARROW

	- U+2500 ('─'): BOX DRAWINGS LIGHT HORIZONTAL
	- U+2502 ('│'): BOX DRAWINGS LIGHT VERTICAL
	- U+2514 ('└'): BOX DRAWINGS LIGHT UP AND RIGHT
	- U+251c ('├'): BOX DRAWINGS LIGHT VERTICAL AND RIGHT

Also should be kept.

In summary, based on the discussions we have so far, I suspect that
there's not much to be discussed for the above cases.

So, I'll post a v3 of this series, changing only:

	- U+00a0 (' '): NO-BREAK SPACE
	- U+feff (''): ZERO WIDTH NO-BREAK SPACE (BOM)

---

Now, this specific patch series address also this extra case:

5. curly commas:

	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK

IMO, those should be replaced by ASCII commas: ' and ".

The rationale is simple: 

- most were introduced during the conversion from Docbook,
  markdown and LaTex;
- they don't add any extra value, as using "foo" of “foo” means
  the same thing;
- Sphinx already use "fancy" commas at the output. 

I guess I will put this on a separate series, as this is not a bug
fix, but just a cleanup from the conversion work.

I'll re-post those cleanups on a separate series, for patch per patch
review.

---

The remaining cases are future work, outside the scope of this v2:

6. Hyphen/Dashes and ellipsis

	- U+2212 ('−'): MINUS SIGN
	- U+00ad ('­'): SOFT HYPHEN
	- U+2010 ('‐'): HYPHEN

	    Those three are used on places where a normal ASCII hyphen/minus
	    should be used instead. There are even a couple of C files which
	    use them instead of '-' on comments.

	    IMO are fixes/cleanups from conversions and bad cut-and-paste.

	- U+2013 ('–'): EN DASH
	- U+2014 ('—'): EM DASH
	- U+2026 ('…'): HORIZONTAL ELLIPSIS

	    Those are auto-replaced by Sphinx from "--", "---" and "...",
	    respectively.

	    I guess those are a matter of personal preference about
	    weather using ASCII or UTF-8.

            My personal preference (and Ted seems to have a similar
	    opinion) is to let Sphinx do the conversion.

	    For those, I intend to post a separate series, to be
	    reviewed patch per patch, as this is really a matter
	    of personal taste. Hardly we'll reach a consensus here.

7. math symbols:

	- U+00d7 ('×'): MULTIPLICATION SIGN

	   This one is used mostly do describe video resolutions, but this is
	   on a smaller changeset than the ones that use "x" letter.

	- U+2217 ('∗'): ASTERISK OPERATOR

	   This is used only here:
		Documentation/filesystems/ext4/blockgroup.rst:filesystem size to 2^21 ∗ 2^27 = 2^48bytes or 256TiB.

	   Probably added by some conversion tool. IMO, this one should
	   also be replaced by an ASCII asterisk.

I guess I'll post a patch for the ASTERISK OPERATOR.
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-15  8:22       ` Mauro Carvalho Chehab
@ 2021-05-15  9:24         ` David Woodhouse
  2021-05-15 11:23           ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 18+ messages in thread
From: David Woodhouse @ 2021-05-15  9:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

[-- Attachment #1: Type: text/plain, Size: 9347 bytes --]

On Sat, 2021-05-15 at 10:22 +0200, Mauro Carvalho Chehab wrote:
> > >      Here, <CTRL><SHIFT>U is not working. No idea why. I haven't 
> > >      test it for *years*, as I din't see any reason why I would
> > >      need to type UTF-8 characters by numbers until we started
> > >      this thread.  
> > 
> > Please provide the bug number for this; I'd like to track it.
> 
> Just opened a BZ and added you as c/c.

Thanks.

> Let's take one step back, in order to return to the intents of this
> UTF-8, as the discussions here are not centered into the patches, but
> instead, on what to do and why.
> 
> -
> 
> This discussion started originally at linux-doc ML.
> 
> While discussing about an issue when machine's locale was not set
> to UTF-8 on a build VM, 

Stop. Stop *right* there before you go any further.

The machine's locale should have *nothing* to do with anything.

When you view this email, it comes with a Content-Type: header which
explicitly tells you the character set that the message is encoded in, 
which I think I've set to UTF-7.

When showing you the mail, your system has to interpret the bytes of
the content using *that* character set encoding. Anything else is just
fundamentally broken. Your system locale has *nothing* to do with it.

If your local system is running EBCDIC that doesn't *matter*.

Now, the character set encoding of the kernel source and documentation
text files is UTF-8. It isn't EBCDIC, it isn't ISO8859-15 or any of the
legacy crap. It isn't system locale either, unless your system locale
*happens* to be UTF-8.

UTF-8 *happens* to be compatible with ASCII for the limited subset of
characters which ASCII contains, sure — just as *many*, but not all, of
the legacy 8-bit character sets are also a superset of ASCII's 7 bits.

But if the docs contain *any* characters which aren't ASCII, and you
build them with a broken build system which assumes ASCII, you are
going to produce wrong output. There is *no* substitute for fixing the
*actual* bug which started all this, and ensuring your build system (or
whatever) uses the *actual* encoding of the text files it's processing,
instead of making stupid and bogus assumptions based on a system
default.

You concede keeping U+00a9 © COPYRIGHT SIGN. And that's encoded in UTF-
8 as two bytes 0xC2 0xA9. If some broken build system *assumes* those
bytes are ISO8859-15 it'll take them to mean two separate characters

    U+00C2 Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    U+00A9 © COPYRIGHT SIGN

Your broken build system that started all this is never going to be
*anything* other than broken. You can only paper over the cracks and
make it slightly less likely that people will notice in the common
case, perhaps? That's all you do by *reducing* the use of non-ASCII,
unless you're going to drag us all the way back to the 1980s and
strictly limit us to pure ASCII, using the equivalent of trigraphs for
*anything* outside the 0-127 character ranges.

And even if you did that, systems which use EBCDIC as their local
encoding would *still* be broken, if they have the same bug you started
from. Because EBCDIC isn't compatible with ASCII *even* for the first 7
bits.


> we discovered that some converted docs ended
> with BOM characters. Those specific changes were introduced by some
> of my convert patches, probably converted via pandoc.
> 
> So, I went ahead in order to check what other possible weird things
> were introduced by the conversion, where several scripts and tools
> were used on files that had already a different markup.
> 
> I actually checked the current UTF-8 issues, and asked people at
> linux-doc to comment what of those are valid usecases, and what
> should be replaced by plain ASCII.

No, these aren't "UTF-8 issues". Those are *conversion* issues, and
would still be there if the output of the conversion had been UTF-7,
UCS-16, etc. Or *even* if the output of the conversion had been
trigraph-like stuff like '--' for emdash. It's *nothing* to do with the
encoding that we happen to be using.

Fixing the conversion issues makes a lot of sense. Try to do it without
making *any* mention of UTF-8 at all.

> In summary, based on the discussions we have so far, I suspect that
> there's not much to be discussed for the above cases.
> 
> So, I'll post a v3 of this series, changing only:
> 
>         - U+00a0 (' '): NO-BREAK SPACE
>         - U+feff (''): ZERO WIDTH NO-BREAK SPACE (BOM)

Ack, as long as those make *no* mention of UTF-8. Except perhaps to
note that BOM is redundant because UTF-8 doesn't have a byteorder.

> ---
> 
> Now, this specific patch series address also this extra case:
> 
> 5. curly commas:
> 
>         - U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
>         - U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
>         - U+201c ('“'): LEFT DOUBLE QUOTATION MARK
>         - U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
> 
> IMO, those should be replaced by ASCII commas: ' and ".
> 
> The rationale is simple: 
> 
> - most were introduced during the conversion from Docbook,
>   markdown and LaTex;
> - they don't add any extra value, as using "foo" of “foo” means
>   the same thing;
> - Sphinx already use "fancy" commas at the output. 
> 
> I guess I will put this on a separate series, as this is not a bug
> fix, but just a cleanup from the conversion work.
> 
> I'll re-post those cleanups on a separate series, for patch per patch
> review.

Makes sense. 

The left/right quotation marks exists to make human-readable text much
easier to read, but the key point here is that they are redundant
because the tooling already emits them in the *output* so they don't
need to be in the source, yes?

As long as the tooling gets it *right* and uses them where it should,
that seems sane enough.

However, it *does* break 'grep', because if I cut/paste a snippet from
the documentation and try to grep for it, it'll no longer match.

Consistency is good, but perhaps we should actually be consistent the
other way round and always use the left/right versions in the source
*instead* of relying on the tooling, to make searches work better?
You claimed to care about that, right?

> The remaining cases are future work, outside the scope of this v2:
> 
> 6. Hyphen/Dashes and ellipsis
> 
>         - U+2212 ('−'): MINUS SIGN
>         - U+00ad ('­'): SOFT HYPHEN
>         - U+2010 ('‐'): HYPHEN
> 
>             Those three are used on places where a normal ASCII hyphen/minus
>             should be used instead. There are even a couple of C files which
>             use them instead of '-' on comments.
> 
>             IMO are fixes/cleanups from conversions and bad cut-and-paste.

That seems to make sense.

>         - U+2013 ('–'): EN DASH
>         - U+2014 ('—'): EM DASH
>         - U+2026 ('…'): HORIZONTAL ELLIPSIS
> 
>             Those are auto-replaced by Sphinx from "--", "---" and "...",
>             respectively.
> 
>             I guess those are a matter of personal preference about
>             weather using ASCII or UTF-8.
> 
>             My personal preference (and Ted seems to have a similar
>             opinion) is to let Sphinx do the conversion.
> 
>             For those, I intend to post a separate series, to be
>             reviewed patch per patch, as this is really a matter
>             of personal taste. Hardly we'll reach a consensus here.
> 

Again using the trigraph-like '--' and '...' instead of just using the
plain text '—' and '…' breaks searching, because what's in the output
doesn't match the input. Again consistency is good, but perhaps we
should standardise on just putting these in their plain text form
instead of the trigraphs?

> 7. math symbols:
> 
>         - U+00d7 ('×'): MULTIPLICATION SIGN
> 
>            This one is used mostly do describe video resolutions, but this is
>            on a smaller changeset than the ones that use "x" letter.

I think standardising on × for video resolutions in documentation would
make it look better and be easier to read.

> 
>         - U+2217 ('∗'): ASTERISK OPERATOR
> 
>            This is used only here:
>                 Documentation/filesystems/ext4/blockgroup.rst:filesystem size to 2^21 ∗ 2^27 = 2^48bytes or 256TiB.
> 
>            Probably added by some conversion tool. IMO, this one should
>            also be replaced by an ASCII asterisk.
> 
> I guess I'll post a patch for the ASTERISK OPERATOR.

That makes sense.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-15  9:24         ` David Woodhouse
@ 2021-05-15 11:23           ` Mauro Carvalho Chehab
  2021-05-15 12:02             ` David Woodhouse
  0 siblings, 1 reply; 18+ messages in thread
From: Mauro Carvalho Chehab @ 2021-05-15 11:23 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

Em Sat, 15 May 2021 10:24:28 +0100
David Woodhouse <dwmw2@infradead.org> escreveu:

> On Sat, 2021-05-15 at 10:22 +0200, Mauro Carvalho Chehab wrote:
> > > >      Here, <CTRL><SHIFT>U is not working. No idea why. I haven't 
> > > >      test it for *years*, as I din't see any reason why I would
> > > >      need to type UTF-8 characters by numbers until we started
> > > >      this thread.    
> > > 
> > > Please provide the bug number for this; I'd like to track it.  
> > 
> > Just opened a BZ and added you as c/c.  
> 
> Thanks.
> 
> > Let's take one step back, in order to return to the intents of this
> > UTF-8, as the discussions here are not centered into the patches, but
> > instead, on what to do and why.
> > 
> > -
> > 
> > This discussion started originally at linux-doc ML.
> > 
> > While discussing about an issue when machine's locale was not set
> > to UTF-8 on a build VM,   
> 
> Stop. Stop *right* there before you go any further.
> 
> The machine's locale should have *nothing* to do with anything.
> 
> When you view this email, it comes with a Content-Type: header which
> explicitly tells you the character set that the message is encoded in, 
> which I think I've set to UTF-7.
> 
> When showing you the mail, your system has to interpret the bytes of
> the content using *that* character set encoding. Anything else is just
> fundamentally broken. Your system locale has *nothing* to do with it.
> 
> If your local system is running EBCDIC that doesn't *matter*.
> 
> Now, the character set encoding of the kernel source and documentation
> text files is UTF-8. It isn't EBCDIC, it isn't ISO8859-15 or any of the
> legacy crap. It isn't system locale either, unless your system locale
> *happens* to be UTF-8.
> 
> UTF-8 *happens* to be compatible with ASCII for the limited subset of
> characters which ASCII contains, sure — just as *many*, but not all, of
> the legacy 8-bit character sets are also a superset of ASCII's 7 bits.
> 
> But if the docs contain *any* characters which aren't ASCII, and you
> build them with a broken build system which assumes ASCII, you are
> going to produce wrong output. There is *no* substitute for fixing the
> *actual* bug which started all this, and ensuring your build system (or
> whatever) uses the *actual* encoding of the text files it's processing,
> instead of making stupid and bogus assumptions based on a system
> default.
> 
> You concede keeping U+00a9 © COPYRIGHT SIGN. And that's encoded in UTF-
> 8 as two bytes 0xC2 0xA9. If some broken build system *assumes* those
> bytes are ISO8859-15 it'll take them to mean two separate characters
> 
>     U+00C2 Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
>     U+00A9 © COPYRIGHT SIGN
> 
> Your broken build system that started all this is never going to be
> *anything* other than broken. You can only paper over the cracks and
> make it slightly less likely that people will notice in the common
> case, perhaps? That's all you do by *reducing* the use of non-ASCII,
> unless you're going to drag us all the way back to the 1980s and
> strictly limit us to pure ASCII, using the equivalent of trigraphs for
> *anything* outside the 0-127 character ranges.
> 
> And even if you did that, systems which use EBCDIC as their local
> encoding would *still* be broken, if they have the same bug you started
> from. Because EBCDIC isn't compatible with ASCII *even* for the first 7
> bits.

Now, you're making a lot of wrong assumptions here ;-)

1. I didn't report the bug. Another person reported it at linux-doc;
2. I fully agree with you that the building system should work fine
   whatever locate the machine has;
3. Sphinx supported charset for the REST input and its output is UTF-8.

Despite of that, it seems that there are some issues at the building
tool set, at least under certain circunstances. One of the hypothesis 
that it was mentioned there is that the Sphinx logger crashes when it
tries to print an UTF-8 message when the machine's locale is not UTF-8.

That's said, I tried forcing a non-UTF-8 on some tests I did to try
to reproduce, but the build went fine.

So, I was not able to reproduce the issue.

This series doesn't address the issue. It is just a side effect of the
discussions, where, while trying to understand the bug, we noticed
several UTF-8 characters introduced during the conversion that were't
the original author's intent.

So, with regards to the original but report, if I find a way to
reproduce it and to address it, I'll post a separate series.

If you want to discuss this issue further, let's not discuss here, but
instead, at the linux-doc thread:

	https://lore.kernel.org/linux-doc/20210506103913.GE6564@kitsune.suse.cz/

> 
> 
> > we discovered that some converted docs ended
> > with BOM characters. Those specific changes were introduced by some
> > of my convert patches, probably converted via pandoc.
> > 
> > So, I went ahead in order to check what other possible weird things
> > were introduced by the conversion, where several scripts and tools
> > were used on files that had already a different markup.
> > 
> > I actually checked the current UTF-8 issues, and asked people at
> > linux-doc to comment what of those are valid usecases, and what
> > should be replaced by plain ASCII.  
> 
> No, these aren't "UTF-8 issues". Those are *conversion* issues, and
> would still be there if the output of the conversion had been UTF-7,
> UCS-16, etc. Or *even* if the output of the conversion had been
> trigraph-like stuff like '--' for emdash. It's *nothing* to do with the
> encoding that we happen to be using.

Yes. That's what I said.

> 
> Fixing the conversion issues makes a lot of sense. Try to do it without
> making *any* mention of UTF-8 at all.
> 
> > In summary, based on the discussions we have so far, I suspect that
> > there's not much to be discussed for the above cases.
> > 
> > So, I'll post a v3 of this series, changing only:
> > 
> >         - U+00a0 (' '): NO-BREAK SPACE
> >         - U+feff (''): ZERO WIDTH NO-BREAK SPACE (BOM)  
> 
> Ack, as long as those make *no* mention of UTF-8. Except perhaps to
> note that BOM is redundant because UTF-8 doesn't have a byteorder.

I need to tell what UTF-8 codes are replaced, as otherwise the patch
wouldn't make much sense to reviewers, as both U+00a0 and whitespaces
are displayed the same way, and BOM is invisible.

> 
> > ---
> > 
> > Now, this specific patch series address also this extra case:
> > 
> > 5. curly commas:
> > 
> >         - U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> >         - U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> >         - U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> >         - U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
> > 
> > IMO, those should be replaced by ASCII commas: ' and ".
> > 
> > The rationale is simple: 
> > 
> > - most were introduced during the conversion from Docbook,
> >   markdown and LaTex;
> > - they don't add any extra value, as using "foo" of “foo” means
> >   the same thing;
> > - Sphinx already use "fancy" commas at the output. 
> > 
> > I guess I will put this on a separate series, as this is not a bug
> > fix, but just a cleanup from the conversion work.
> > 
> > I'll re-post those cleanups on a separate series, for patch per patch
> > review.  
> 
> Makes sense. 
> 
> The left/right quotation marks exists to make human-readable text much
> easier to read, but the key point here is that they are redundant
> because the tooling already emits them in the *output* so they don't
> need to be in the source, yes?

Yes.

> As long as the tooling gets it *right* and uses them where it should,
> that seems sane enough.
> 
> However, it *does* break 'grep', because if I cut/paste a snippet from
> the documentation and try to grep for it, it'll no longer match.

> 
> Consistency is good, but perhaps we should actually be consistent the
> other way round and always use the left/right versions in the source
> *instead* of relying on the tooling, to make searches work better?
> You claimed to care about that, right?

That's indeed a good point. It would be interesting to have more
opinions with that matter.

There are a couple of things to consider:

1. It is (usually) trivial to discover what document produced a
   certain page at the documentation.

   For instance, if you want to know where the text under this
   file came from, or to grep a text from it:

	https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

   You can click at the "View page source" button at the first line.
   It will show the .rst file used to produce it:

	https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v2.rst.txt

2. If all you want is to search for a text inside the docs,
   you can click at the "Search docs" box, which is part of the
   Read the Docs theme.

3. Kernel has several extensions for Sphinx, in order to make life 
   easier for Kernel developers:

	Documentation/sphinx/automarkup.py
	Documentation/sphinx/cdomain.py
	Documentation/sphinx/kernel_abi.py
	Documentation/sphinx/kernel_feat.py
	Documentation/sphinx/kernel_include.py
	Documentation/sphinx/kerneldoc.py
	Documentation/sphinx/kernellog.py
	Documentation/sphinx/kfigure.py
	Documentation/sphinx/load_config.py
	Documentation/sphinx/maintainers_include.py
	Documentation/sphinx/rstFlatTable.py

Those (in particular automarkup and kerneldoc) will also dynamically 
change things during ReST conversion, which may cause grep to not work. 

5. some PDF tools like evince will match curly commas if you
   type an ASCII comma on their search boxes.

6. Some developers prefer to only deal with the files inside the
   Kernel tree. Those are very unlikely to do grep with curly aspas.

My opinion on that matter is that we should make life easier for
developers to grep on text files, as the ones using the web interface
are already served by the search box in html format or by tools like
evince.

So, my vote here is to keep aspas as plain ASCII.

> 
> > The remaining cases are future work, outside the scope of this v2:
> > 
> > 6. Hyphen/Dashes and ellipsis
> > 
> >         - U+2212 ('−'): MINUS SIGN
> >         - U+00ad ('­'): SOFT HYPHEN
> >         - U+2010 ('‐'): HYPHEN
> > 
> >             Those three are used on places where a normal ASCII hyphen/minus
> >             should be used instead. There are even a couple of C files which
> >             use them instead of '-' on comments.
> > 
> >             IMO are fixes/cleanups from conversions and bad cut-and-paste.  
> 
> That seems to make sense.
> 
> >         - U+2013 ('–'): EN DASH
> >         - U+2014 ('—'): EM DASH
> >         - U+2026 ('…'): HORIZONTAL ELLIPSIS
> > 
> >             Those are auto-replaced by Sphinx from "--", "---" and "...",
> >             respectively.
> > 
> >             I guess those are a matter of personal preference about
> >             weather using ASCII or UTF-8.
> > 
> >             My personal preference (and Ted seems to have a similar
> >             opinion) is to let Sphinx do the conversion.
> > 
> >             For those, I intend to post a separate series, to be
> >             reviewed patch per patch, as this is really a matter
> >             of personal taste. Hardly we'll reach a consensus here.
> >   
> 
> Again using the trigraph-like '--' and '...' instead of just using the
> plain text '—' and '…' breaks searching, because what's in the output
> doesn't match the input. Again consistency is good, but perhaps we
> should standardise on just putting these in their plain text form
> instead of the trigraphs?

Good point. 

While I don't have any strong preferences here, there's something that
annoys me with regards to EM/EN DASH:

With the monospaced fonts I'm using here - both at my e-mailer and
on my terminals, both EM and EN DASH are displayed look *exactly*
the same.

> 
> > 7. math symbols:
> > 
> >         - U+00d7 ('×'): MULTIPLICATION SIGN
> > 
> >            This one is used mostly do describe video resolutions, but this is
> >            on a smaller changeset than the ones that use "x" letter.  
> 
> I think standardising on × for video resolutions in documentation would
> make it look better and be easier to read.
> 
> > 
> >         - U+2217 ('∗'): ASTERISK OPERATOR
> > 
> >            This is used only here:
> >                 Documentation/filesystems/ext4/blockgroup.rst:filesystem size to 2^21 ∗ 2^27 = 2^48bytes or 256TiB.
> > 
> >            Probably added by some conversion tool. IMO, this one should
> >            also be replaced by an ASCII asterisk.
> > 
> > I guess I'll post a patch for the ASTERISK OPERATOR.  
> 
> That makes sense.



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols
  2021-05-15 11:23           ` Mauro Carvalho Chehab
@ 2021-05-15 12:02             ` David Woodhouse
  0 siblings, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2021-05-15 12:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Mali DP Maintainers, alsa-devel, coresight, intel-gfx,
	intel-wired-lan, keyrings, kvm, linux-acpi, linux-arm-kernel,
	linux-edac, linux-ext4, linux-f2fs-devel, linux-hwmon, linux-iio,
	linux-input, linux-integrity, linux-media, linux-pci, linux-pm,
	linux-rdma, linux-sgx, linux-usb, mjpeg-users, netdev, rcu

[-- Attachment #1: Type: text/plain, Size: 9607 bytes --]

On Sat, 2021-05-15 at 13:23 +0200, Mauro Carvalho Chehab wrote:
> Em Sat, 15 May 2021 10:24:28 +0100
> David Woodhouse <dwmw2@infradead.org> escreveu:
> > > Let's take one step back, in order to return to the intents of this
> > > UTF-8, as the discussions here are not centered into the patches, but
> > > instead, on what to do and why.
> > > 
> > > This discussion started originally at linux-doc ML.
> > > 
> > > While discussing about an issue when machine's locale was not set
> > > to UTF-8 on a build VM,   
> > 
> > Stop. Stop *right* there before you go any further.
> > 
> > The machine's locale should have *nothing* to do with anything.
>
> Now, you're making a lot of wrong assumptions here ;-)
> 
> 1. I didn't report the bug. Another person reported it at linux-doc;
> 2. I fully agree with you that the building system should work fine
>    whatever locate the machine has;
> 3. Sphinx supported charset for the REST input and its output is UTF-8.

OK, fine. So that's an unrelated issue really, and just happened to be
what historically triggered the discussion. Let's set it aside.

> > > I actually checked the current UTF-8 issues … 
> > 
> > No, these aren't "UTF-8 issues". Those are *conversion* issues, and 
> > … *nothing* to do with the encoding that we happen to be using.
> 
> Yes. That's what I said.

Er… I'm fairly sure you *did* call them "UTF-8 issues". Whatever.




> > 
> > Fixing the conversion issues makes a lot of sense. Try to do it without
> > making *any* mention of UTF-8 at all.
> > 
> > > In summary, based on the discussions we have so far, I suspect that
> > > there's not much to be discussed for the above cases.
> > > 
> > > So, I'll post a v3 of this series, changing only:
> > > 
> > >         - U+00a0 (' '): NO-BREAK SPACE
> > >         - U+feff (''): ZERO WIDTH NO-BREAK SPACE (BOM)  
> > 
> > Ack, as long as those make *no* mention of UTF-8. Except perhaps to
> > note that BOM is redundant because UTF-8 doesn't have a byteorder.
> 
> I need to tell what UTF-8 codes are replaced, as otherwise the patch
> wouldn't make much sense to reviewers, as both U+00a0 and whitespaces
> are displayed the same way, and BOM is invisible.
> 

No. Again, this is *nothing* to do with UTF-8. The encoding we choose
to map between byte in the file and characters is *utterly* irrelevant
here. If we were using UTF-7, UTF-16, or even (in the case of non-
breaking space) one of the legacy 8-bit charsets that includes it like
ISO8859-1, the issue would be precisely the same. 

It's about the *character* U+00A0 NO-BREAK SPACE; nothing to do with
UTF-8 at all. Don't mention UTF-8. It's *irrelevant* and just shows
that you can't actually bothered to stop and do any critical thinking
about the matter at all.

As I said, the only time that it makes sense to mention UTF-8 in this
context is when talking about *why* the BOM is not needed. And even
then, you could say "because we *aren't* using an encoding where
endianness matters, such as UTF-16", instead of actually mentioning
UTF-8. Try it ☺

> > 
> > > ---
> > > 
> > > Now, this specific patch series address also this extra case:
> > > 
> > > 5. curly commas:
> > > 
> > >         - U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> > >         - U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> > >         - U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> > >         - U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
> > > 
> > > IMO, those should be replaced by ASCII commas: ' and ".
> > > 
> > > The rationale is simple: 
> > > 
> > > - most were introduced during the conversion from Docbook,
> > >   markdown and LaTex;
> > > - they don't add any extra value, as using "foo" of “foo” means
> > >   the same thing;
> > > - Sphinx already use "fancy" commas at the output. 
> > > 
> > > I guess I will put this on a separate series, as this is not a bug
> > > fix, but just a cleanup from the conversion work.
> > > 
> > > I'll re-post those cleanups on a separate series, for patch per patch
> > > review.  
> > 
> > Makes sense. 
> > 
> > The left/right quotation marks exists to make human-readable text much
> > easier to read, but the key point here is that they are redundant
> > because the tooling already emits them in the *output* so they don't
> > need to be in the source, yes?
> 
> Yes.
> 
> > As long as the tooling gets it *right* and uses them where it should,
> > that seems sane enough.
> > 
> > However, it *does* break 'grep', because if I cut/paste a snippet from
> > the documentation and try to grep for it, it'll no longer match.
> > 
> > Consistency is good, but perhaps we should actually be consistent the
> > other way round and always use the left/right versions in the source
> > *instead* of relying on the tooling, to make searches work better?
> > You claimed to care about that, right?
> 
> That's indeed a good point. It would be interesting to have more
> opinions with that matter.
> 
> There are a couple of things to consider:
> 
> 1. It is (usually) trivial to discover what document produced a
>    certain page at the documentation.
> 
>    For instance, if you want to know where the text under this
>    file came from, or to grep a text from it:
> 
> 	https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
> 
>    You can click at the "View page source" button at the first line.
>    It will show the .rst file used to produce it:
> 
> 	https://www.kernel.org/doc/html/latest/_sources/admin-guide/cgroup-v2.rst.txt
> 
> 2. If all you want is to search for a text inside the docs,
>    you can click at the "Search docs" box, which is part of the
>    Read the Docs theme.
> 
> 3. Kernel has several extensions for Sphinx, in order to make life 
>    easier for Kernel developers:
> 
> 	Documentation/sphinx/automarkup.py
> 	Documentation/sphinx/cdomain.py
> 	Documentation/sphinx/kernel_abi.py
> 	Documentation/sphinx/kernel_feat.py
> 	Documentation/sphinx/kernel_include.py
> 	Documentation/sphinx/kerneldoc.py
> 	Documentation/sphinx/kernellog.py
> 	Documentation/sphinx/kfigure.py
> 	Documentation/sphinx/load_config.py
> 	Documentation/sphinx/maintainers_include.py
> 	Documentation/sphinx/rstFlatTable.py
> 
> Those (in particular automarkup and kerneldoc) will also dynamically 
> change things during ReST conversion, which may cause grep to not work. 
> 
> 5. some PDF tools like evince will match curly commas if you
>    type an ASCII comma on their search boxes.
> 
> 6. Some developers prefer to only deal with the files inside the
>    Kernel tree. Those are very unlikely to do grep with curly aspas.
> 
> My opinion on that matter is that we should make life easier for
> developers to grep on text files, as the ones using the web interface
> are already served by the search box in html format or by tools like
> evince.
> 
> So, my vote here is to keep aspas as plain ASCII.

OK, but all your reasoning is about the *character* used, not the
encoding. So try to do it without mentioning ASCII, and especially
without mentioning UTF-8.

Your point is that the *character* is the one easily reachable on
standard keyboard layouts, and the one which people are most likely to
enter manually. It has *nothing* to do with charset encodings, so don't
conflate is with talking about charset encodings.

> 
> > 
> > > The remaining cases are future work, outside the scope of this v2:
> > > 
> > > 6. Hyphen/Dashes and ellipsis
> > > 
> > >         - U+2212 ('−'): MINUS SIGN
> > >         - U+00ad ('­'): SOFT HYPHEN
> > >         - U+2010 ('‐'): HYPHEN
> > > 
> > >             Those three are used on places where a normal ASCII hyphen/minus
> > >             should be used instead. There are even a couple of C files which
> > >             use them instead of '-' on comments.
> > > 
> > >             IMO are fixes/cleanups from conversions and bad cut-and-paste.  
> > 
> > That seems to make sense.
> > 
> > >         - U+2013 ('–'): EN DASH
> > >         - U+2014 ('—'): EM DASH
> > >         - U+2026 ('…'): HORIZONTAL ELLIPSIS
> > > 
> > >             Those are auto-replaced by Sphinx from "--", "---" and "...",
> > >             respectively.
> > > 
> > >             I guess those are a matter of personal preference about
> > >             weather using ASCII or UTF-8.
> > > 
> > >             My personal preference (and Ted seems to have a similar
> > >             opinion) is to let Sphinx do the conversion.
> > > 
> > >             For those, I intend to post a separate series, to be
> > >             reviewed patch per patch, as this is really a matter
> > >             of personal taste. Hardly we'll reach a consensus here.
> > >   
> > 
> > Again using the trigraph-like '--' and '...' instead of just using the
> > plain text '—' and '…' breaks searching, because what's in the output
> > doesn't match the input. Again consistency is good, but perhaps we
> > should standardise on just putting these in their plain text form
> > instead of the trigraphs?
> 
> Good point. 
> 
> While I don't have any strong preferences here, there's something that
> annoys me with regards to EM/EN DASH:
> 
> With the monospaced fonts I'm using here - both at my e-mailer and
> on my terminals, both EM and EN DASH are displayed look *exactly*
> the same.

Interesting. They definitely show differently in my terminal, and in
the monospaced font in email.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-05-15 12:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 12:50 [PATCH v2 00/40] Use ASCII subset instead of UTF-8 alternate symbols Mauro Carvalho Chehab
2021-05-12 12:50 ` [PATCH v2 03/40] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
2021-05-12 12:50 ` [PATCH v2 09/40] docs: driver-api: media: drivers: " Mauro Carvalho Chehab
2021-05-12 12:50 ` [PATCH v2 14/40] docs: userspace-api: media: fdl-appendix.rst: " Mauro Carvalho Chehab
2021-05-12 12:50 ` [PATCH v2 15/40] docs: userspace-api: media: v4l: " Mauro Carvalho Chehab
2021-05-12 12:50 ` [PATCH v2 16/40] docs: userspace-api: media: dvb: " Mauro Carvalho Chehab
2021-05-12 14:14 ` [PATCH v2 00/40] " Theodore Ts'o
2021-05-12 15:17   ` Mauro Carvalho Chehab
2021-05-12 17:12     ` David Woodhouse
2021-05-12 17:07 ` David Woodhouse
2021-05-14  8:21   ` Mauro Carvalho Chehab
2021-05-14  9:06     ` David Woodhouse
2021-05-14 11:08       ` Edward Cree
2021-05-14 14:18         ` Mauro Carvalho Chehab
2021-05-15  8:22       ` Mauro Carvalho Chehab
2021-05-15  9:24         ` David Woodhouse
2021-05-15 11:23           ` Mauro Carvalho Chehab
2021-05-15 12:02             ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).