Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Edward Cree <ecree.xilinx@gmail.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because — even in
 monospace fonts — they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions — like en dashes — are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with × or — in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed

WARNING: multiple messages have this Message-ID (diff)

From: Edward Cree <ecree.xilinx@gmail.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because — even in
 monospace fonts — they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions — like en dashes — are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with × or — in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)

From: Edward Cree <ecree.xilinx@gmail.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: alsa-devel@alsa-project.org, kvm@vger.kernel.org,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-iio@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-fpga@vger.kernel.org, dri-devel@lists.freedesktop.org,
	keyrings@vger.kernel.org, linux-riscv@lists.infradead.org,
	Jonathan Corbet <corbet@lwn.net>,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-acpi@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-input@vger.kernel.org, linux-ext4@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-media@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-sgx@vger.kernel.org,
	coresight@lists.linaro.org, rcu@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-integrity@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because — even in
 monospace fonts — they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions — like en dashes — are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with × or — in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)

From: Edward Cree <ecree.xilinx@gmail.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: alsa-devel@alsa-project.org, kvm@vger.kernel.org,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-iio@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-fpga@vger.kernel.org, dri-devel@lists.freedesktop.org,
	keyrings@vger.kernel.org, linux-riscv@lists.infradead.org,
	Jonathan Corbet <corbet@lwn.net>,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-acpi@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-input@vger.kernel.org, linux-ext4@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-media@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-sgx@vger.kernel.org,
	coresight@lists.linaro.org, rcu@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-integrity@vger.kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because — even in
 monospace fonts — they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions — like en dashes — are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with × or — in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed

WARNING: multiple messages have this Message-ID (diff)

From: Edward Cree <ecree.xilinx@gmail.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	alsa-devel@alsa-project.org, coresight@lists.linaro.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-wired-lan@lists.osuosl.org, keyrings@vger.kernel.org,
	kvm@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fpga@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux-iio@vger.kernel.org, linux-input@vger.kernel.org,
	linux-integrity@vger.kernel.org, linux-media@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-sgx@vger.kernel.org, linux-usb@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net, netdev@vger.kernel.org,
	rcu@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because — even in
 monospace fonts — they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions — like en dashes — are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with × or — in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)

From: Edward Cree <ecree.xilinx@gmail.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: alsa-devel@alsa-project.org, kvm@vger.kernel.org,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	linux-iio@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-fpga@vger.kernel.org, dri-devel@lists.freedesktop.org,
	keyrings@vger.kernel.org, linux-riscv@lists.infradead.org,
	Jonathan Corbet <corbet@lwn.net>,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-acpi@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-input@vger.kernel.org, linux-ext4@vger.kernel.org,
	intel-gfx@lists.freedesktop.org, linux-media@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-sgx@vger.kernel.org,
	coresight@lists.linaro.org, rcu@vger.kernel.org,
	mjpeg-users@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-integrity@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because — even in
 monospace fonts — they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions — like en dashes — are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with × or — in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)

From: Edward Cree <ecree.xilinx@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII
Date: Mon, 10 May 2021 14:16:16 +0100	[thread overview]
Message-ID: <df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com> (raw)
In-Reply-To: <20210510135518.305cc03d@coco.lan>

On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> The main point on this series is to replace just the occurrences
> where ASCII represents the symbol equally well

> 	- U+2014 ('?'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not
 serve 'equally well'.  People use em dashes because ? even in
 monospace fonts ? they make text easier to read and comprehend, when
 used correctly.
I accept that some of the other distinctions ? like en dashes ? are
 needlessly pedantic (though I don't doubt there is someone out there
 who will gladly defend them with the same fervour with which I argue
 for the em dash) and I wouldn't take the trouble to use them myself;
 but I think there is a reasonable assumption that when someone goes
 to the effort of using a Unicode punctuation mark that is semantic
 (rather than merely typographical), they probably had a reason for
 doing so.

> 	- U+2018 ('?'): LEFT SINGLE QUOTATION MARK
> 	- U+2019 ('?'): RIGHT SINGLE QUOTATION MARK
> 	- U+201c ('?'): LEFT DOUBLE QUOTATION MARK
> 	- U+201d ('?'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)

> 	- U+00d7 ('?'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case
 changing it to 'x' loses semantic information.

> Using the above symbols will just trick tools like grep for no good
> reason.
NBSP, sure.  That one's probably an artefact of some document format
 conversion somewhere along the line, anyway.
But what kinds of things with ? or ? in are going to be grept for?

If there are em dashes lying around that semantically _should_ be
 hyphen-minus (one of your patches I've seen, for instance, fixes an
 *en* dash moonlighting as the option character in an `ethtool`
 command line), then sure, convert them.
But any time someone is using a Unicode character to *express
 semantics*, even if you happen to think the semantic distinction
 involved is a pedantic or unimportant one, I think you need an
 explicit grep case to justify ASCIIfying it.

-ed

next prev parent reply	other threads:[~2021-05-10 13:42 UTC|newest]

Thread overview: 219+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 10:26 [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Mauro Carvalho Chehab
2021-05-10 10:26 ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 10:26 ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 10:26 ` Mauro Carvalho Chehab
2021-05-10 10:26 ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 10:26 ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 01/53] docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 02/53] docs: ABI: remove a meaningless UTF-8 character Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 03/53] docs: ABI: remove some spurious characters Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 04/53] docs: index.rst: avoid using UTF-8 chars Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 05/53] docs: hwmon: " Mauro Carvalho Chehab
2021-05-10 13:30   ` Guenter Roeck
2021-05-10 10:26 ` [PATCH 06/53] docs: admin-guide: " Mauro Carvalho Chehab
2021-05-10 18:40   ` Gabriel Krisman Bertazi
2021-05-12  8:44     ` Mauro Carvalho Chehab
2021-05-12  9:25       ` David Woodhouse
2021-05-12 10:22         ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 07/53] docs: admin-guide: media: ipu3.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 08/53] docs: admin-guide: sysctl: kernel.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 09/53] docs: admin-guide: perf: imx-ddr.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 10/53] docs: admin-guide: pm: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 11/53] docs: trace: coresight: coresight-etm4x-reference.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 19:28   ` Mathieu Poirier
2021-05-10 19:28     ` Mathieu Poirier
2021-05-10 10:26 ` [PATCH 12/53] docs: driver-api: " Mauro Carvalho Chehab
     [not found]   ` <CAHp75Vegsb-+fVppv3C7Jp0a=mEGAh2pchX=Cr5ZvOMFt+G73Q@mail.gmail.com>
2021-05-12  8:49     ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 13/53] docs: driver-api: fpga: " Mauro Carvalho Chehab
2021-05-10 17:48   ` Moritz Fischer
2021-05-10 10:26 ` [PATCH 14/53] docs: driver-api: iio: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 15/53] docs: driver-api: thermal: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 16/53] docs: driver-api: media: drivers: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 17/53] docs: driver-api: firmware: other_interfaces.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 18/53] docs: driver-api: nvdimm: btt.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 19/53] docs: fault-injection: nvme-fault-injection.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 20/53] docs: usb: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 21/53] docs: process: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 22/53] docs: block: data-integrity.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 23/53] docs: userspace-api: media: fdl-appendix.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 24/53] docs: userspace-api: media: v4l: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 25/53] docs: userspace-api: media: dvb: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 26/53] docs: vm: zswap.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 27/53] docs: filesystems: f2fs.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  3:16   ` Chao Yu
2021-05-11  3:16     ` Chao Yu
2021-05-10 10:26 ` [PATCH 28/53] docs: filesystems: ext4: " Mauro Carvalho Chehab
2021-05-10 19:23   ` Theodore Ts'o
2021-05-10 10:26 ` [PATCH 29/53] docs: kernel-hacking: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 30/53] docs: hid: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 31/53] docs: security: tpm: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 32/53] docs: security: keys: trusted-encrypted.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 33/53] docs: riscv: vm-layout.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 34/53] docs: networking: scaling.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 35/53] docs: networking: devlink: devlink-dpipe.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 36/53] docs: networking: device_drivers: " Mauro Carvalho Chehab
2021-05-10 10:26   ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 37/53] docs: x86: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 38/53] docs: scheduler: sched-deadline.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 39/53] docs: dev-tools: testing-overview.rst: " Mauro Carvalho Chehab
2021-05-10 10:48   ` Marco Elver
2021-05-12  8:52     ` Mauro Carvalho Chehab
2021-05-10 23:35   ` David Gow
2021-05-12  8:14     ` Mauro Carvalho Chehab
2021-05-12  8:29     ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 40/53] docs: power: powercap: powercap.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 41/53] docs: ABI: " Mauro Carvalho Chehab
2021-05-10 13:53   ` Guenter Roeck
2021-05-10 10:26 ` [PATCH 42/53] docs: doc-guide: contributing.rst: " Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 43/53] docs: PCI: acpi-info.rst: " Mauro Carvalho Chehab
2021-05-10 10:37   ` Krzysztof Wilczyński
2021-05-10 10:26 ` [PATCH 44/53] docs: gpu: " Mauro Carvalho Chehab
2021-05-10 10:26   ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 11:16   ` Jani Nikula
2021-05-10 11:16     ` [Intel-gfx] " Jani Nikula
2021-05-10 11:16     ` Jani Nikula
2021-05-10 12:36   ` Liviu Dudau
2021-05-10 12:36     ` [Intel-gfx] " Liviu Dudau
2021-05-10 12:36     ` Liviu Dudau
2021-05-10 10:26 ` [PATCH 45/53] docs: sound: kernel-api: writing-an-alsa-driver.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 46/53] docs: arm64: arm-acpi.rst: " Mauro Carvalho Chehab
2021-05-10 10:26   ` Mauro Carvalho Chehab
2021-05-10 10:26 ` [PATCH 47/53] docs: infiniband: tag_matching.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 48/53] docs: timers: no_hz.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 49/53] docs: misc-devices: ibmvmc.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 50/53] docs: firmware-guide: acpi: lpit.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 51/53] docs: firmware-guide: acpi: dsd: graph.rst: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 52/53] docs: virt: kvm: " Mauro Carvalho Chehab
2021-05-10 10:27 ` [PATCH 53/53] docs: RCU: " Mauro Carvalho Chehab
2021-05-11  0:05   ` Paul E. McKenney
2021-05-10 10:52 ` [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII Thorsten Leemhuis
2021-05-10 10:52   ` [Intel-wired-lan] " Thorsten Leemhuis
2021-05-10 10:52   ` [Intel-gfx] " Thorsten Leemhuis
2021-05-10 10:52   ` Thorsten Leemhuis
2021-05-10 10:52   ` Thorsten Leemhuis
2021-05-10 10:52   ` [f2fs-dev] " Thorsten Leemhuis
2021-05-10 10:52   ` Thorsten Leemhuis
2021-05-10 11:19   ` Mauro Carvalho Chehab
2021-05-10 11:19     ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 11:19     ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 11:19     ` Mauro Carvalho Chehab
2021-05-10 11:19     ` Mauro Carvalho Chehab
2021-05-10 11:19     ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 11:19     ` Mauro Carvalho Chehab
2021-05-10 12:27     ` Mauro Carvalho Chehab
2021-05-10 12:27       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 12:27       ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 12:27       ` Mauro Carvalho Chehab
2021-05-10 12:27       ` Mauro Carvalho Chehab
2021-05-10 12:27       ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 12:27       ` Mauro Carvalho Chehab
2021-05-10 10:54 ` David Woodhouse
2021-05-10 10:54   ` [Intel-wired-lan] " David Woodhouse
2021-05-10 10:54   ` [Intel-gfx] " David Woodhouse
2021-05-10 10:54   ` David Woodhouse
2021-05-10 10:54   ` David Woodhouse
2021-05-10 10:54   ` David Woodhouse
2021-05-10 11:55   ` Mauro Carvalho Chehab
2021-05-10 11:55     ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 11:55     ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 11:55     ` Mauro Carvalho Chehab
2021-05-10 11:55     ` Mauro Carvalho Chehab
2021-05-10 11:55     ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 11:55     ` Mauro Carvalho Chehab
2021-05-10 12:29     ` [f2fs-dev] " beroal
2021-05-10 13:16     ` Edward Cree [this message]
2021-05-10 13:16       ` [Intel-wired-lan] " Edward Cree
2021-05-10 13:16       ` [Intel-gfx] " Edward Cree
2021-05-10 13:16       ` Edward Cree
2021-05-10 13:16       ` Edward Cree
2021-05-10 13:16       ` [f2fs-dev] " Edward Cree
2021-05-10 13:16       ` Edward Cree
2021-05-10 13:38       ` Mauro Carvalho Chehab
2021-05-10 13:38         ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-10 13:38         ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-10 13:38         ` Mauro Carvalho Chehab
2021-05-10 13:38         ` Mauro Carvalho Chehab
2021-05-10 13:38         ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-10 13:38         ` Mauro Carvalho Chehab
2021-05-10 13:58         ` Edward Cree
2021-05-10 13:58           ` [Intel-wired-lan] " Edward Cree
2021-05-10 13:58           ` [Intel-gfx] " Edward Cree
2021-05-10 13:58           ` Edward Cree
2021-05-10 13:58           ` Edward Cree
2021-05-10 13:58           ` [f2fs-dev] " Edward Cree
2021-05-10 13:58           ` Edward Cree
2021-05-10 13:59       ` Matthew Wilcox
2021-05-10 13:59         ` [Intel-wired-lan] " Matthew Wilcox
2021-05-10 13:59         ` [Intel-gfx] " Matthew Wilcox
2021-05-10 13:59         ` Matthew Wilcox
2021-05-10 13:59         ` Matthew Wilcox
2021-05-10 13:59         ` [f2fs-dev] " Matthew Wilcox
2021-05-10 13:59         ` Matthew Wilcox
2021-05-10 14:33         ` Edward Cree
2021-05-10 14:33           ` [Intel-wired-lan] " Edward Cree
2021-05-10 14:33           ` [Intel-gfx] " Edward Cree
2021-05-10 14:33           ` Edward Cree
2021-05-10 14:33           ` Edward Cree
2021-05-10 14:33           ` [f2fs-dev] " Edward Cree
2021-05-10 14:33           ` Edward Cree
2021-05-11  9:00           ` Mauro Carvalho Chehab
2021-05-11  9:00             ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-11  9:00             ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-11  9:00             ` Mauro Carvalho Chehab
2021-05-11  9:00             ` Mauro Carvalho Chehab
2021-05-11  9:00             ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  9:00             ` Mauro Carvalho Chehab
2021-05-11  9:19             ` David Woodhouse
2021-05-11  9:19               ` [Intel-wired-lan] " David Woodhouse
2021-05-11  9:19               ` [Intel-gfx] " David Woodhouse
2021-05-11  9:19               ` David Woodhouse
2021-05-11  9:19               ` David Woodhouse
2021-05-11  9:19               ` David Woodhouse
2021-05-10 13:49     ` David Woodhouse
2021-05-10 13:49       ` [Intel-wired-lan] " David Woodhouse
2021-05-10 13:49       ` [Intel-gfx] " David Woodhouse
2021-05-10 13:49       ` David Woodhouse
2021-05-10 13:49       ` David Woodhouse
2021-05-10 13:49       ` David Woodhouse
2021-05-10 19:22       ` Theodore Ts'o
2021-05-10 19:22         ` [Intel-wired-lan] " Theodore Ts'o
2021-05-10 19:22         ` [Intel-gfx] " Theodore Ts'o
2021-05-10 19:22         ` Theodore Ts'o
2021-05-10 19:22         ` Theodore Ts'o
2021-05-10 19:22         ` [f2fs-dev] " Theodore Ts'o
2021-05-10 19:22         ` Theodore Ts'o
2021-05-11  9:37         ` Mauro Carvalho Chehab
2021-05-11  9:37           ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-11  9:37           ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-11  9:37           ` Mauro Carvalho Chehab
2021-05-11  9:37           ` Mauro Carvalho Chehab
2021-05-11  9:37           ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  9:37           ` Mauro Carvalho Chehab
2021-05-11  9:25       ` Mauro Carvalho Chehab
2021-05-11  9:25         ` [Intel-wired-lan] " Mauro Carvalho Chehab
2021-05-11  9:25         ` [Intel-gfx] " Mauro Carvalho Chehab
2021-05-11  9:25         ` Mauro Carvalho Chehab
2021-05-11  9:25         ` Mauro Carvalho Chehab
2021-05-11  9:25         ` [f2fs-dev] " Mauro Carvalho Chehab
2021-05-11  9:25         ` Mauro Carvalho Chehab
2021-05-10 14:00     ` Ben Boeckel
2021-05-10 14:00       ` [Intel-wired-lan] " Ben Boeckel
2021-05-10 14:00       ` [Intel-gfx] " Ben Boeckel
2021-05-10 14:00       ` Ben Boeckel
2021-05-10 14:00       ` Ben Boeckel
2021-05-10 14:00       ` [f2fs-dev] " Ben Boeckel
2021-05-10 14:00       ` Ben Boeckel
2021-05-10 21:57 ` Adam Borowski
2021-05-10 21:57   ` [Intel-wired-lan] " Adam Borowski
2021-05-10 21:57   ` [Intel-gfx] " Adam Borowski
2021-05-10 21:57   ` Adam Borowski
2021-05-10 21:57   ` Adam Borowski
2021-05-10 21:57   ` [f2fs-dev] " Adam Borowski
2021-05-10 21:57   ` Adam Borowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df6b4567-030c-a480-c5a6-fe579830e8c0@gmail.com \
    --to=ecree.xilinx@gmail.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=corbet@lwn.net \
    --cc=coresight@lists.linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dwmw2@infradead.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=keyrings@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fpga@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mjpeg-users@lists.sourceforge.net \
    --cc=netdev@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.