All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes — which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus — does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro

WARNING: multiple messages have this Message-ID (diff)
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes — which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus — does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: alsa-devel@alsa-project.org, kvm@vger.kernel.org,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-iio@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-fpga@vger.kernel.org, dri-devel@lists.freedesktop.org,
	keyrings@vger.kernel.org, linux-riscv@lists.infradead.org,
	Jonathan Corbet <corbet@lwn.net>,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-acpi@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-input@vger.kernel.org, linux-ext4@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-media@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-sgx@vger.kernel.org,
	coresight@lists.linaro.org, rcu@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-integrity@vger.kernel.org,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [f2fs-dev] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes — which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus — does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: alsa-devel@alsa-project.org, kvm@vger.kernel.org,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-iio@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-fpga@vger.kernel.org, dri-devel@lists.freedesktop.org,
	keyrings@vger.kernel.org, linux-riscv@lists.infradead.org,
	Jonathan Corbet <corbet@lwn.net>,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-acpi@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-input@vger.kernel.org, linux-ext4@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-media@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-sgx@vger.kernel.org,
	coresight@lists.linaro.org, rcu@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-integrity@vger.kernel.org,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes — which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus — does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro

WARNING: multiple messages have this Message-ID (diff)
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes — which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus — does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: alsa-devel@alsa-project.org, kvm@vger.kernel.org,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-iio@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-fpga@vger.kernel.org, dri-devel@lists.freedesktop.org,
	keyrings@vger.kernel.org, linux-riscv@lists.infradead.org,
	Jonathan Corbet <corbet@lwn.net>,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-acpi@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-input@vger.kernel.org, linux-ext4@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-media@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-sgx@vger.kernel.org,
	coresight@lists.linaro.org, rcu@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-integrity@vger.kernel.org,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [Intel-gfx] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes — which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus — does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Tue, 11 May 2021 11:37:17 +0200	[thread overview]
Message-ID: <20210511113717.5c8b68f7@coco.lan> (raw)
In-Reply-To: <YJmH2irxoRsyNudb@mit.edu>

Em Mon, 10 May 2021 15:22:02 -0400
"Theodore Ts'o" <tytso@mit.edu> escreveu:

> On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
> > On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:  
> > > This patch series is doing conversion only when using ASCII makes
> > > more sense than using UTF-8. 
> > > 
> > > See, a number of converted documents ended with weird characters
> > > like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific
> > > character doesn't do any good.
> > > 
> > > Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until
> > > someone tries to use grep[1].  
> > 
> > Replacing those makes sense. But replacing emdashes ? which are a
> > distinct character that has no direct replacement in ASCII and which
> > people do *deliberately* use instead of hyphen-minus ? does not.  
> 
> I regularly use --- for em-dashes and -- for en-dashes.  Markdown will
> automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII
> hyphens to en-dashes.  It's much, much easier for me to type 2 or 3
> hypens into my text editor of choice than trying to enter the UTF-8
> characters. 

Yeah, typing those UTF-8 chars are a lot harder than typing -- and ---
on several text editors ;-)

Here, I only type UTF-8 chars for accents (my US-layout keyboards are 
all set to US international, so typing those are easy).

> If we can make sphinx do this translation, maybe that's
> the best way of dealing with these two characters?

Sphinx already does that by default[1], using smartquotes:

	https://docutils.sourceforge.io/docs/user/smartquotes.html

Those are the conversions that are done there:

      - Straight quotes (" and ') turned into "curly" quote characters;
      - dashes (-- and ---) turned into en- and em-dash entities;
      - three consecutive dots (... or . . .) turned into an ellipsis char.

So, we can simply use single/double commas, hyphens and dots for
curly commas and ellipses.

[1] There's a way to disable it at conf.py, but at the Kernel this is
    kept on its default: to automatically do such conversions. 

Thanks,
Mauro

  reply	other threads:[~2021-05-11  9:37 UTC|newest]

Thread overview: 219+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 10:26 [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Mauro Carvalho Chehab
2021-05-10 10:26 ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 10:26 ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 10:26 ` Mauro Carvalho Chehab
2021-05-10 10:26 ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 10:26 ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 01/53] docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 02/53] docs: ABI: remove a meaningless UTF-8 character Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 03/53] docs: ABI: remove some spurious characters Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 04/53] docs: index.rst: avoid using UTF-8 chars Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 05/53] docs: hwmon: " Mauro Carvalho Chehab
2021-05-10 13:30   ` Guenter Roeck
2021-05-10 10:26 ` [PATCH 06/53] docs: admin-guide: " Mauro Carvalho Chehab
2021-05-10 18:40   ` Gabriel Krisman Bertazi
2021-05-12  8:44     ` Mauro Carvalho Chehab
2021-05-12  9:25       ` David Woodhouse
2021-05-12 10:22         ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 07/53] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 08/53] docs: admin-guide: sysctl: kernel.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 09/53] docs: admin-guide: perf: imx-ddr.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 10/53] docs: admin-guide: pm: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 11/53] docs: trace: coresight: coresight-etm4x-reference.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 19:28   ` Mathieu Poirier
2021-05-10 19:28     ` Mathieu Poirier
2021-05-10 10:26 ` [PATCH 12/53] docs: driver-api: " Mauro Carvalho Chehab
     [not found]   ` <CAHp75Vegsb-+fVppv3C7Jp0a=mEGAh2pchX=Cr5ZvOMFt+G73Q@mail.gmail.com>
2021-05-12  8:49     ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 13/53] docs: driver-api: fpga: " Mauro Carvalho Chehab
2021-05-10 17:48   ` Moritz Fischer
2021-05-10 10:26 ` [PATCH 14/53] docs: driver-api: iio: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 15/53] docs: driver-api: thermal: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 16/53] docs: driver-api: media: drivers: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 17/53] docs: driver-api: firmware: other_interfaces.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 18/53] docs: driver-api: nvdimm: btt.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 19/53] docs: fault-injection: nvme-fault-injection.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 20/53] docs: usb: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 21/53] docs: process: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 22/53] docs: block: data-integrity.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 23/53] docs: userspace-api: media: fdl-appendix.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 24/53] docs: userspace-api: media: v4l: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 25/53] docs: userspace-api: media: dvb: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 26/53] docs: vm: zswap.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 27/53] docs: filesystems: f2fs.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  3:16   ` Chao Yu
2021-05-11  3:16     ` Chao Yu
2021-05-10 10:26 ` [PATCH 28/53] docs: filesystems: ext4: " Mauro Carvalho Chehab
2021-05-10 19:23   ` Theodore Ts'o
2021-05-10 10:26 ` [PATCH 29/53] docs: kernel-hacking: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 30/53] docs: hid: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 31/53] docs: security: tpm: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 32/53] docs: security: keys: trusted-encrypted.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 33/53] docs: riscv: vm-layout.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 34/53] docs: networking: scaling.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 35/53] docs: networking: devlink: devlink-dpipe.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 36/53] docs: networking: device_drivers: " Mauro Carvalho Chehab
2021-05-10 10:26   ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 37/53] docs: x86: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 38/53] docs: scheduler: sched-deadline.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 39/53] docs: dev-tools: testing-overview.rst: " Mauro Carvalho Chehab
2021-05-10 10:48   ` Marco Elver
2021-05-12  8:52     ` Mauro Carvalho Chehab
2021-05-10 23:35   ` David Gow
2021-05-12  8:14     ` Mauro Carvalho Chehab
2021-05-12  8:29     ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 40/53] docs: power: powercap: powercap.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 41/53] docs: ABI: " Mauro Carvalho Chehab
2021-05-10 13:53   ` Guenter Roeck
2021-05-10 10:26 ` [PATCH 42/53] docs: doc-guide: contributing.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 43/53] docs: PCI: acpi-info.rst: " Mauro Carvalho Chehab
2021-05-10 10:37   ` Krzysztof Wilczyński
2021-05-10 10:26 ` [PATCH 44/53] docs: gpu: " Mauro Carvalho Chehab
2021-05-10 10:26   ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 11:16   ` Jani Nikula
2021-05-10 11:16     ` [Intel-gfx] " Jani Nikula
2021-05-10 11:16     ` Jani Nikula
2021-05-10 12:36   ` Liviu Dudau
2021-05-10 12:36     ` [Intel-gfx] " Liviu Dudau
2021-05-10 12:36     ` Liviu Dudau
2021-05-10 10:26 ` [PATCH 45/53] docs: sound: kernel-api: writing-an-alsa-driver.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 46/53] docs: arm64: arm-acpi.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 47/53] docs: infiniband: tag_matching.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 48/53] docs: timers: no_hz.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 49/53] docs: misc-devices: ibmvmc.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 50/53] docs: firmware-guide: acpi: lpit.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 51/53] docs: firmware-guide: acpi: dsd: graph.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 52/53] docs: virt: kvm: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 53/53] docs: RCU: " Mauro Carvalho Chehab
2021-05-11  0:05   ` Paul E. McKenney
2021-05-10 10:52 ` [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Thorsten Leemhuis
2021-05-10 10:52   ` [Intel-wired-lan] " Thorsten Leemhuis
2021-05-10 10:52   ` [Intel-gfx] " Thorsten Leemhuis
2021-05-10 10:52   ` Thorsten Leemhuis
2021-05-10 10:52   ` Thorsten Leemhuis
2021-05-10 10:52   ` [f2fs-dev] " Thorsten Leemhuis
2021-05-10 10:52   ` Thorsten Leemhuis
2021-05-10 11:19   ` Mauro Carvalho Chehab
2021-05-10 11:19     ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 11:19     ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 11:19     ` Mauro Carvalho Chehab
2021-05-10 11:19     ` Mauro Carvalho Chehab
2021-05-10 11:19     ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 11:19     ` Mauro Carvalho Chehab
2021-05-10 12:27     ` Mauro Carvalho Chehab
2021-05-10 12:27       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 12:27       ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 12:27       ` Mauro Carvalho Chehab
2021-05-10 12:27       ` Mauro Carvalho Chehab
2021-05-10 12:27       ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 12:27       ` Mauro Carvalho Chehab
2021-05-10 10:54 ` David Woodhouse
2021-05-10 10:54   ` [Intel-wired-lan] " David Woodhouse
2021-05-10 10:54   ` [Intel-gfx] " David Woodhouse
2021-05-10 10:54   ` David Woodhouse
2021-05-10 10:54   ` David Woodhouse
2021-05-10 10:54   ` David Woodhouse
2021-05-10 11:55   ` Mauro Carvalho Chehab
2021-05-10 11:55     ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 11:55     ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 11:55     ` Mauro Carvalho Chehab
2021-05-10 11:55     ` Mauro Carvalho Chehab
2021-05-10 11:55     ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 11:55     ` Mauro Carvalho Chehab
2021-05-10 12:29     ` [f2fs-dev] " beroal
2021-05-10 13:16     ` Edward Cree
2021-05-10 13:16       ` [Intel-wired-lan] " Edward Cree
2021-05-10 13:16       ` [Intel-gfx] " Edward Cree
2021-05-10 13:16       ` Edward Cree
2021-05-10 13:16       ` Edward Cree
2021-05-10 13:16       ` [f2fs-dev] " Edward Cree
2021-05-10 13:16       ` Edward Cree
2021-05-10 13:38       ` Mauro Carvalho Chehab
2021-05-10 13:38         ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 13:38         ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 13:38         ` Mauro Carvalho Chehab
2021-05-10 13:38         ` Mauro Carvalho Chehab
2021-05-10 13:38         ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 13:38         ` Mauro Carvalho Chehab
2021-05-10 13:58         ` Edward Cree
2021-05-10 13:58           ` [Intel-wired-lan] " Edward Cree
2021-05-10 13:58           ` [Intel-gfx] " Edward Cree
2021-05-10 13:58           ` Edward Cree
2021-05-10 13:58           ` Edward Cree
2021-05-10 13:58           ` [f2fs-dev] " Edward Cree
2021-05-10 13:58           ` Edward Cree
2021-05-10 13:59       ` Matthew Wilcox
2021-05-10 13:59         ` [Intel-wired-lan] " Matthew Wilcox
2021-05-10 13:59         ` [Intel-gfx] " Matthew Wilcox
2021-05-10 13:59         ` Matthew Wilcox
2021-05-10 13:59         ` Matthew Wilcox
2021-05-10 13:59         ` [f2fs-dev] " Matthew Wilcox
2021-05-10 13:59         ` Matthew Wilcox
2021-05-10 14:33         ` Edward Cree
2021-05-10 14:33           ` [Intel-wired-lan] " Edward Cree
2021-05-10 14:33           ` [Intel-gfx] " Edward Cree
2021-05-10 14:33           ` Edward Cree
2021-05-10 14:33           ` Edward Cree
2021-05-10 14:33           ` [f2fs-dev] " Edward Cree
2021-05-10 14:33           ` Edward Cree
2021-05-11  9:00           ` Mauro Carvalho Chehab
2021-05-11  9:00             ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-11  9:00             ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-11  9:00             ` Mauro Carvalho Chehab
2021-05-11  9:00             ` Mauro Carvalho Chehab
2021-05-11  9:00             ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  9:00             ` Mauro Carvalho Chehab
2021-05-11  9:19             ` David Woodhouse
2021-05-11  9:19               ` [Intel-wired-lan] " David Woodhouse
2021-05-11  9:19               ` [Intel-gfx] " David Woodhouse
2021-05-11  9:19               ` David Woodhouse
2021-05-11  9:19               ` David Woodhouse
2021-05-11  9:19               ` David Woodhouse
2021-05-10 13:49     ` David Woodhouse
2021-05-10 13:49       ` [Intel-wired-lan] " David Woodhouse
2021-05-10 13:49       ` [Intel-gfx] " David Woodhouse
2021-05-10 13:49       ` David Woodhouse
2021-05-10 13:49       ` David Woodhouse
2021-05-10 13:49       ` David Woodhouse
2021-05-10 19:22       ` Theodore Ts'o
2021-05-10 19:22         ` [Intel-wired-lan] " Theodore Ts'o
2021-05-10 19:22         ` [Intel-gfx] " Theodore Ts'o
2021-05-10 19:22         ` Theodore Ts'o
2021-05-10 19:22         ` Theodore Ts'o
2021-05-10 19:22         ` [f2fs-dev] " Theodore Ts'o
2021-05-10 19:22         ` Theodore Ts'o
2021-05-11  9:37         ` Mauro Carvalho Chehab [this message]
2021-05-11  9:37           ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-11  9:37           ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-11  9:37           ` Mauro Carvalho Chehab
2021-05-11  9:37           ` Mauro Carvalho Chehab
2021-05-11  9:37           ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  9:37           ` Mauro Carvalho Chehab
2021-05-11  9:25       ` Mauro Carvalho Chehab
2021-05-11  9:25         ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-11  9:25         ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-11  9:25         ` Mauro Carvalho Chehab
2021-05-11  9:25         ` Mauro Carvalho Chehab
2021-05-11  9:25         ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  9:25         ` Mauro Carvalho Chehab
2021-05-10 14:00     ` Ben Boeckel
2021-05-10 14:00       ` [Intel-wired-lan] " Ben Boeckel
2021-05-10 14:00       ` [Intel-gfx] " Ben Boeckel
2021-05-10 14:00       ` Ben Boeckel
2021-05-10 14:00       ` Ben Boeckel
2021-05-10 14:00       ` [f2fs-dev] " Ben Boeckel
2021-05-10 14:00       ` Ben Boeckel
2021-05-10 21:57 ` Adam Borowski
2021-05-10 21:57   ` [Intel-wired-lan] " Adam Borowski
2021-05-10 21:57   ` [Intel-gfx] " Adam Borowski
2021-05-10 21:57   ` Adam Borowski
2021-05-10 21:57   ` Adam Borowski
2021-05-10 21:57   ` [f2fs-dev] " Adam Borowski
2021-05-10 21:57   ` Adam Borowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210511113717.5c8b68f7@coco.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=alsa-devel@alsa-project.org \
    --cc=corbet@lwn.net \
    --cc=coresight@lists.linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dwmw2@infradead.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=keyrings@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fpga@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mjpeg-users@lists.sourceforge.net \
    --cc=netdev@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.