All of lore.kernel.org
 help / color / mirror / Atom feed
* Review incorrect man-pages commit
@ 2022-03-20  0:04 Alejandro Colomar (man-pages)
  2022-03-20 10:52 ` G. Branden Robinson
  0 siblings, 1 reply; 5+ messages in thread
From: Alejandro Colomar (man-pages) @ 2022-03-20  0:04 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: groff, linux-man, Michael Kerrisk (man-pages)

Hi Branden,

Michael introduced the following commit, which is incorrect (triggers a
groff(1) error; see below).  Do you know what is intended here?
Could you please propose a fix?

Thanks,

Alex



LINT (groff)	tmp/lint/man7/glob.7.lint.groff.touch
troff	man7/glob.7	195	 error	 '\`' is not allowed in an escape name
troff	man7/glob.7	195	 warning	 can't find special character ''



commit 7b97eb9ff04e69eacbe34a32c1089fcf6613b5f6
Author: Michael Kerrisk <mtk.manpages@gmail.com>
Date:   Thu Aug 6 22:02:25 2020 +0200

    glob.7, zic.8: Use \` rather than `

    \` produces better rendering in PDF.

    Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>

diff --git a/man7/glob.7 b/man7/glob.7
index 89569b8d9..b04ce821c 100644
--- a/man7/glob.7
+++ b/man7/glob.7
@@ -203,7 +203,7 @@ where the string between "\fI[=\fP" and "\fI=]\fP"
is any collating
 element from its equivalence class, as defined for the
 current locale.
 For example, "\fI[[=a=]]\fP" might be equivalent
-to "\fI[a\('a\(`a\(:a\(^a]\fP", that is,
+to "\fI[a\('a\(\`a\(:a\(^a]\fP", that is,
 to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP".
 .SH SEE ALSO
 .BR sh (1),
diff --git a/man8/zic.8 b/man8/zic.8
index 940d6e814..aeca0e726 100644
--- a/man8/zic.8
+++ b/man8/zic.8
@@ -293,7 +293,7 @@ nor
 .q + .
 To allow for future extensions,
 an unquoted name should not contain characters from the set
-.q !$%&'()*,/:;<=>?@[\e]^`{|}\(ti .
+.q !$%&'()*,/:;<=>?@[\e]^\`{|}\(ti .
 .TP
 .B FROM
 Gives the first year in which the rule applies.


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Review incorrect man-pages commit
  2022-03-20  0:04 Review incorrect man-pages commit Alejandro Colomar (man-pages)
@ 2022-03-20 10:52 ` G. Branden Robinson
  2022-03-20 11:41   ` Ingo Schwarze
  2022-03-20 17:07   ` Alejandro Colomar (man-pages)
  0 siblings, 2 replies; 5+ messages in thread
From: G. Branden Robinson @ 2022-03-20 10:52 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: groff, linux-man, Michael Kerrisk (man-pages)

[-- Attachment #1: Type: text/plain, Size: 6234 bytes --]

Hi, Alex!

At 2022-03-20T01:04:17+0100, Alejandro Colomar (man-pages) wrote:
> Michael introduced the following commit, which is incorrect (triggers
> a groff(1) error; see below).  Do you know what is intended here?
> Could you please propose a fix?

Sure!  The punctuation does get a bit bewildering.

The first topic is equivalence classes in globs.

> LINT (groff)	tmp/lint/man7/glob.7.lint.groff.touch
> troff	man7/glob.7	195	 error	 '\`' is not allowed in an escape name
> troff	man7/glob.7	195	 warning	 can't find special character ''

>  For example, "\fI[[=a=]]\fP" might be equivalent
> -to "\fI[a\('a\(`a\(:a\(^a]\fP", that is,
> +to "\fI[a\('a\(\`a\(:a\(^a]\fP", that is,

UTF-8 continuation bytes follow in this message.

So what we're trying to say is
  "[=a=]" might be equivalent to "[aáàäâ]"

The man page is using groff special character escape sequences that are
compatible with AT&T troff (1973) in _form_, but the special character
_identifiers_ themselves are not portable that far back.  The form is:
  \(xx
...where "xx" is _exactly_ two characters forming an identifier for a
specific special character.  As is somewhat well known, groff supports
identifier of arbitrary length in escape sequences; anywhere AT&T troff
has an escape sequence syntax form ending in "(xx", groff supports
an additional form "[xxxxxxx]".

Nota bene that word "identifier".

The ones we see above are aliases for commonly used ISO Latin-1 (1985)
characters.  groff supports a more systematic notation for composite
glyphs, that being
  \[base-glyph composite-1 composite-2 ...  composite-n]
and in the instant case, only one composite glyph is used.

Glyph identifiers in groff must consist of valid identifier characters.
The escape character \ is _not_ interpreted as an identifier character,
but has its usual meaning of introducing an escape sequence.  Thus, when
encountering
  \(\`a
the parser hits the expansion of \` and has problems.  \` is itself an
alias for another special character escape sequence: "\(ga".  (This
alias _is_ portable all the way back to AT&T troff, and is documented in
Ossanna 1976, "Nroff/Troff User's Manual"--but that still doesn't make
it a valid part of a special character identifier.  Heirloom Doctools
troff silently ignores it, and I thus suspect Unix V7 troff did too.)

Thus, the special character you're naming has another special character
as part of its identifier.  That is not allowed.

That is why an error is produced.

Now, for the part people actually care about, which is how to fix it:
take the escape character off of that `.

You thus want

+to "\fI[a\('a\(`a\(:a\(^a]\fP", that is,

If you wanted to write this without using any aliases, you could adopt
groff syntax.

+to "\fI[a\[a aa]\[a ga]\[a ad]\[a a^]\fP", that is,

I don't know if people regard that as more or less impenetrable.  It is
more _flexible_, and admits usage of diacritics/combining characters not
envisioned by AT&T troff or ISO Latin-1.  groff supports a baker's
dozen.  They are in a table titled "Accents" in groff_char(7) (1.22.4).

> diff --git a/man8/zic.8 b/man8/zic.8
> index 940d6e814..aeca0e726 100644
> --- a/man8/zic.8
> +++ b/man8/zic.8
> @@ -293,7 +293,7 @@ nor
>  .q + .
>  To allow for future extensions,
>  an unquoted name should not contain characters from the set
> -.q !$%&'()*,/:;<=>?@[\e]^`{|}\(ti .
> +.q !$%&'()*,/:;<=>?@[\e]^\`{|}\(ti .

You didn't proffer any complaints about the foregoing, so I assume it
was just for context (to include the whole commit, maybe).  Nevertheless
I think it can be further improved.

That neutral apostrophe and caret/circumflex should be changed as well,
to ensure that they don't render as a directional closing (right) single
quote, ’ U+2019 and modifier letter circumflex ˆ U+02C6.  This advice is
also in groff 1.22.4's groff_man(7) page.

+.q !$%&\(aq()*,/:;<=>?@[\e]\(ha\`{|}\(ti .

Moreover, as partly noted in our discussion about double quotes in macro
arguments, there were no special characters for the double quote or
neutral apostrophe in Unix troff.  Since we're not getting 50 years of
backward compatibility anyway, for the Linux man-pages project I
recommend going ahead and using groff-style escape sequences for these.

+.q !$%&\[aq]()*,/:;<=>?@[\[rs]]\[ha]\`{|}\[ti] .

Are you willing to settle for 30 years of backward compatibility?  ;-)

In my opinion it is more helpful in dense contexts like this to have the
paired delimiters [ ] to demarcate the glyph identifier then to achieve
portability to systems that don't support identifiers you need anyway.

(I note that `q` is a page-local macro and therefore bad style for
portability reasons.  That said, I have been _sorely_ tempted to add a
`Q` macro for this precise purpose to groff man(7).  I have hopes that
it would give people something to reach for besides bold and italics for
every damn thing.)

Most--I hope all--of the above is discussed comprehensively in the
current version of groff_char(7)[2], which I have rewritten completely
since groff 1.22.4 and substantially modified even since the last Linux
man-pages snapshot at
<https://man7.org/linux/man-pages/man7/groff_char.7.html>.  I now know
the answers to many questions of the form "why the **** is {groff,troff}
this way?", and have endeavored to share them.  The "History" section is
completely new.

Regards,
Branden

[1] groff's own man pages are not without sin in this regard.  I have
    cleaned them up a lot since 1.22.4, but a few adventurous stragglers
    remain that define and use page-local macros pervasively.  All are
    on the long side.

[2] https://git.savannah.gnu.org/cgit/groff.git/tree/man/groff_char.7.man

    I recommend that for source perusal only; do not try to render it
    with man-db man(1) or groff 1.22.4, because groff 1.23.0 will be
    adding a new macro, `MR`, for man page cross references[3] and its
    own pages have already been ported to use it.  (This is where I
    flagellate myself for not having a groff 1.23.0-rc2 out yet. :( )

[3] https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS#n165

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Review incorrect man-pages commit
  2022-03-20 10:52 ` G. Branden Robinson
@ 2022-03-20 11:41   ` Ingo Schwarze
  2022-03-20 17:17     ` G. Branden Robinson
  2022-03-20 17:07   ` Alejandro Colomar (man-pages)
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Schwarze @ 2022-03-20 11:41 UTC (permalink / raw)
  To: g.branden.robinson; +Cc: alx.manpages, linux-man, mtk.manpages, groff

Hi Branden, hi Alex,

G. Branden Robinson wrote on Sun, Mar 20, 2022 at 09:52:37PM +1100:

> If you wanted to write this without using any aliases,
> you could adopt groff syntax.
> 
> +to "\fI[a\[a aa]\[a ga]\[a ad]\[a a^]\fP", that is,

While that is arguably neat, please be aware that it is significantly
less portable even when considering modern formatting software only.
For example, consider this:

   $ mandoc -Wall 
  ==\fI[a\[a aa]\[a ga]\[a ad]\[a a^]]\fP==  <enter> <Ctrl-D>
  mandoc: <stdin>:1:29: WARNING: invalid escape sequence: \[a a^]
  mandoc: <stdin>:1:22: WARNING: invalid escape sequence: \[a ad]
  mandoc: <stdin>:1:15: WARNING: invalid escape sequence: \[a ga]
  mandoc: <stdin>:1:8: WARNING: invalid escape sequence: \[a aa]
  [...]
  ==[a]==
  [...]

Arguably, not supporting the groff multi-argument form of \[]
character escape sequences might be a defect in mandoc, but for
now, that's how things are, so if you go that way, you have to
accept that some (even modern) formatters will drop the accented
characters from the output.

> I don't know if people regard that as more or less impenetrable.
> It is more _flexible_, and admits usage of diacritics/combining
> characters not envisioned by AT&T troff or ISO Latin-1.

That flexibility is precisely what makes the feature somewhat hard
to implement (though not impossible).  Admittedly, for typeset output,
any accent can be placed on any character, and for UTF-8 and HTML
output, zero-width combining Unicode codepoints can be used, but for
arbitrary output modes, the formatter would still have to contain
a complete table of all character-accent combinations to map them
to combined glyphs available in the output mode - and users would
probably have to accept that some combinations can't be rendered
in some output modes.  All that is less than ideal in manual pages,
where portability generally trumps typographic elegance.

> +.q !$%&\[aq]()*,/:;<=>?@[\[rs]]\[ha]\`{|}\[ti] .

I agree that nothing much is wrong with using the \[] variable
length character escape syntax in manual pages nowadays from
the point of view of portability.  Then again, i'm not convinced
that \[aq] is more readable than \(aq.  Why would it be?

Quite to the contrary, in the other example above, you wrote:

  ... \[a a^]\fP

forgetting the trailing square bracket; it should have been:

  ... \[a a^]]\fP

So my impression is the \[] syntax introduces additional opportunities
for markup bugs, if there is any difference to \( at all.

The rest of your message beautifully explains what is going on.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Review incorrect man-pages commit
  2022-03-20 10:52 ` G. Branden Robinson
  2022-03-20 11:41   ` Ingo Schwarze
@ 2022-03-20 17:07   ` Alejandro Colomar (man-pages)
  1 sibling, 0 replies; 5+ messages in thread
From: Alejandro Colomar (man-pages) @ 2022-03-20 17:07 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: groff, linux-man, Michael Kerrisk (man-pages), tz

Hi, Branden!

On 3/20/22 11:52, G. Branden Robinson wrote:
> 
> Sure!  The punctuation does get a bit bewildering.
> 
[...]

Thanks for the great explanation!

> 
> Now, for the part people actually care about, which is how to fix it:
> take the escape character off of that `.
> 
> You thus want
> 
> +to "\fI[a\('a\(`a\(:a\(^a]\fP", that is,

I applied a patch with this.

> 
> If you wanted to write this without using any aliases, you could adopt
> groff syntax.
> 
> +to "\fI[a\[a aa]\[a ga]\[a ad]\[a a^]\fP", that is,

I'm going to hold this for later, so that we decide globally how to
format all pages, and not just this one.  I prefer global consistency here.

> 
> I don't know if people regard that as more or less impenetrable.  It is
> more _flexible_, and admits usage of diacritics/combining characters not
> envisioned by AT&T troff or ISO Latin-1.  groff supports a baker's
> dozen.  They are in a table titled "Accents" in groff_char(7) (1.22.4).
> 
>> diff --git a/man8/zic.8 b/man8/zic.8
>> index 940d6e814..aeca0e726 100644
>> --- a/man8/zic.8
>> +++ b/man8/zic.8
>> @@ -293,7 +293,7 @@ nor
>>  .q + .
>>  To allow for future extensions,
>>  an unquoted name should not contain characters from the set
>> -.q !$%&'()*,/:;<=>?@[\e]^`{|}\(ti .
>> +.q !$%&'()*,/:;<=>?@[\e]^\`{|}\(ti .
> 
> You didn't proffer any complaints about the foregoing, so I assume it
> was just for context (to include the whole commit, maybe). 

Yep

> Nevertheless I think it can be further improved.
> 
> That neutral apostrophe and caret/circumflex should be changed as well,
> to ensure that they don't render as a directional closing (right) single
> quote, ’ U+2019 and modifier letter circumflex ˆ U+02C6.  This advice is
> also in groff 1.22.4's groff_man(7) page.
> 
> +.q !$%&\(aq()*,/:;<=>?@[\e]\(ha\`{|}\(ti .
> 
> Moreover, as partly noted in our discussion about double quotes in macro
> arguments, there were no special characters for the double quote or
> neutral apostrophe in Unix troff.  Since we're not getting 50 years of
> backward compatibility anyway, for the Linux man-pages project I
> recommend going ahead and using groff-style escape sequences for these.
> 
> +.q !$%&\[aq]()*,/:;<=>?@[\[rs]]\[ha]\`{|}\[ti] .
> 
> Are you willing to settle for 30 years of backward compatibility?  ;-)

I do :)

However, I'm not going to fix this page, according to MAINTAINER_NOTES:

$ cat MAINTAINER_NOTES
Externally generated pages
==========================

A few pages come from external sources. Fixes to the pages should really
go to the upstream source.

tzfile(5), zdump(8), and zic(8) come from the tz project
(https://www.iana.org/time-zones).

bpf-helpers(7) is autogenerated from the kernel sources using scripts.
See man-pages commit 53666f6c30451cde022f65d35a8d448f5a7132ba for
details.



So now I wonder why this commit was written in the first place, since it
breaks one page, and fixes another that shouldn't be fixed.
I CCd the tz mailing list in case they want to fix the upstream page
(which I couldn't find, BTW).


> 
> In my opinion it is more helpful in dense contexts like this to have the
> paired delimiters [ ] to demarcate the glyph identifier then to achieve
> portability to systems that don't support identifiers you need anyway.

Yes, I agree with that.

> 
> (I note that `q` is a page-local macro and therefore bad style for
> portability reasons.  That said, I have been _sorely_ tempted to add a
> `Q` macro for this precise purpose to groff man(7).  I have hopes that
> it would give people something to reach for besides bold and italics for
> every damn thing.)
> 
> Most--I hope all--of the above is discussed comprehensively in the
> current version of groff_char(7)[2], which I have rewritten completely
> since groff 1.22.4 and substantially modified even since the last Linux
> man-pages snapshot at
> <https://man7.org/linux/man-pages/man7/groff_char.7.html>.  I now know
> the answers to many questions of the form "why the **** is {groff,troff}
> this way?", and have endeavored to share them.  The "History" section is
> completely new.
> 
> Regards,
> Branden
> 
> [1] groff's own man pages are not without sin in this regard.  I have
>     cleaned them up a lot since 1.22.4, but a few adventurous stragglers
>     remain that define and use page-local macros pervasively.  All are
>     on the long side.
> 
> [2] https://git.savannah.gnu.org/cgit/groff.git/tree/man/groff_char.7.man
> 
>     I recommend that for source perusal only; do not try to render it
>     with man-db man(1) or groff 1.22.4, because groff 1.23.0 will be
>     adding a new macro, `MR`, for man page cross references[3] and its
>     own pages have already been ported to use it.  (This is where I
>     flagellate myself for not having a groff 1.23.0-rc2 out yet. :( )
> 
> [3] https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS#n165

I like your references usually containing other references themselves.
It's funny :-)

Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Review incorrect man-pages commit
  2022-03-20 11:41   ` Ingo Schwarze
@ 2022-03-20 17:17     ` G. Branden Robinson
  0 siblings, 0 replies; 5+ messages in thread
From: G. Branden Robinson @ 2022-03-20 17:17 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: alx.manpages, linux-man, mtk.manpages, groff

[-- Attachment #1: Type: text/plain, Size: 3377 bytes --]

Hi Ingo,

At 2022-03-20T12:41:28+0100, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Sun, Mar 20, 2022 at 09:52:37PM +1100:
> > If you wanted to write this without using any aliases,
> > you could adopt groff syntax.
> > 
> > +to "\fI[a\[a aa]\[a ga]\[a ad]\[a a^]\fP", that is,
> 
> While that is arguably neat, please be aware that it is significantly
> less portable even when considering modern formatting software only.
> For example, consider this:
> 
>    $ mandoc -Wall 
>   ==\fI[a\[a aa]\[a ga]\[a ad]\[a a^]]\fP==  <enter> <Ctrl-D>
>   mandoc: <stdin>:1:29: WARNING: invalid escape sequence: \[a a^]
>   mandoc: <stdin>:1:22: WARNING: invalid escape sequence: \[a ad]
>   mandoc: <stdin>:1:15: WARNING: invalid escape sequence: \[a ga]
>   mandoc: <stdin>:1:8: WARNING: invalid escape sequence: \[a aa]
>   [...]
>   ==[a]==
>   [...]
> 
> Arguably, not supporting the groff multi-argument form of \[]
> character escape sequences might be a defect in mandoc, but for
> now, that's how things are, so if you go that way, you have to
> accept that some (even modern) formatters will drop the accented
> characters from the output.

You have to be prepared for the characters to be dropped in any case,
since they might get rendered on an output device that is limited to
ASCII, or (I suppose less likely?) using KOI8-R...or ISO Latin-2, which
lacks code points for any letters combined with a grave accent.

> That flexibility is precisely what makes the feature somewhat hard
> to implement (though not impossible).  Admittedly, for typeset output,
> any accent can be placed on any character, and for UTF-8 and HTML
> output, zero-width combining Unicode codepoints can be used, but for
> arbitrary output modes, the formatter would still have to contain
> a complete table of all character-accent combinations to map them
> to combined glyphs available in the output mode - and users would
> probably have to accept that some combinations can't be rendered
> in some output modes.  All that is less than ideal in manual pages,
> where portability generally trumps typographic elegance.

It might be wise for the page to include a disclaimer that some of its
glyphs might not render.

> > +.q !$%&\[aq]()*,/:;<=>?@[\[rs]]\[ha]\`{|}\[ti] .
> 
> I agree that nothing much is wrong with using the \[] variable
> length character escape syntax in manual pages nowadays from
> the point of view of portability.  Then again, i'm not convinced
> that \[aq] is more readable than \(aq.  Why would it be?

We get used to delimiters being paired.  :)

I regret Ossanna's choice of a parenthesis here.

> Quite to the contrary, in the other example above, you wrote:
> 
>   ... \[a a^]\fP
> 
> forgetting the trailing square bracket; it should have been:
> 
>   ... \[a a^]]\fP
> 
> So my impression is the \[] syntax introduces additional opportunities
> for markup bugs, if there is any difference to \( at all.

I would attribute that more to my haste in trying to get the email done
to watch a movie, as well as my reliably and severely attenuated
proofreading powers _before_ something I've written becomes irrevocably
public.  Nothing humbles me more than my first draft.  Or first six...

> The rest of your message beautifully explains what is going on.

Thanks!

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-20 17:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-20  0:04 Review incorrect man-pages commit Alejandro Colomar (man-pages)
2022-03-20 10:52 ` G. Branden Robinson
2022-03-20 11:41   ` Ingo Schwarze
2022-03-20 17:17     ` G. Branden Robinson
2022-03-20 17:07   ` Alejandro Colomar (man-pages)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.