All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4/6] xattr.7: wfix
@ 2022-07-29 11:45 Štěpán Němec
  2022-07-29 20:58 ` G. Branden Robinson
  0 siblings, 1 reply; 25+ messages in thread
From: Štěpán Němec @ 2022-07-29 11:45 UTC (permalink / raw)
  To: linux-man, Alejandro Colomar, Michael Kerrisk

(My original intention was to just fix the grammar ("an attribute names
is"), but, on second thought, the whole sentence didn't read very well.)

Signed-off-by: Štěpán Němec <stepnem@gmail.com>
---
 man7/xattr.7 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man7/xattr.7 b/man7/xattr.7
index 4a69e2eb53e8..45a103fad4cc 100644
--- a/man7/xattr.7
+++ b/man7/xattr.7
@@ -119,8 +119,8 @@ manual page for an explanation of the sticky bit).
 .SS Filesystem differences
 The kernel and the filesystem may place limits on the maximum number
 and size of extended attributes that can be associated with a file.
-The VFS imposes limitations that an attribute names is limited to 255 bytes
-and an attribute value is limited to 64\ kB.
+The VFS-imposed limits on attribute names and values are 255 bytes
+and 64\ kB, respectively.
 The list of attribute names that
 can be returned is also limited to 64\ kB
 (see BUGS in
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-07-29 11:45 [PATCH 4/6] xattr.7: wfix Štěpán Němec
@ 2022-07-29 20:58 ` G. Branden Robinson
  2022-07-30 14:15   ` Štěpán Němec
  0 siblings, 1 reply; 25+ messages in thread
From: G. Branden Robinson @ 2022-07-29 20:58 UTC (permalink / raw)
  To: Štěpán Němec
  Cc: linux-man, Alejandro Colomar, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]

Hi Štěpán,

At 2022-07-29T13:45:04+0200, Štěpán Němec wrote:
> (My original intention was to just fix the grammar ("an attribute
> names is"), but, on second thought, the whole sentence didn't read
> very well.)
[...]
> -The VFS imposes limitations that an attribute names is limited to 255 bytes
> -and an attribute value is limited to 64\ kB.
> +The VFS-imposed limits on attribute names and values are 255 bytes
> +and 64\ kB, respectively.

While you're tidying this up, I would convert the `\ ` escape sequence
to `\~`.  Both are non-breaking spaces, but the latter is adjustable.

groff_man(7) from groff 1.22.4 says:

 \~     Adjustable, non-breaking space character.  Use  this  escape  to
        prevent  a  break  inside  a short phrase or between a numerical
        quantity and its corresponding unit(s).

               Before starting the motor, set the output speed to\~1.
               There are 1,024\~bytes in 1\~kiB.
               CSTR\~#8 documents the B language.

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-07-29 20:58 ` G. Branden Robinson
@ 2022-07-30 14:15   ` Štěpán Němec
  2022-07-30 17:53     ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 25+ messages in thread
From: Štěpán Němec @ 2022-07-30 14:15 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: linux-man, Alejandro Colomar, Michael Kerrisk


Hello Branden,

On Fri, 29 Jul 2022 15:58:23 -0500
G. Branden Robinson wrote:

>> -The VFS imposes limitations that an attribute names is limited to 255 bytes
>> -and an attribute value is limited to 64\ kB.
>> +The VFS-imposed limits on attribute names and values are 255 bytes
>> +and 64\ kB, respectively.
>
> While you're tidying this up, I would convert the `\ ` escape sequence
> to `\~`.  Both are non-breaking spaces, but the latter is adjustable.
>
> groff_man(7) from groff 1.22.4 says:
>
>  \~     Adjustable, non-breaking space character.  Use  this  escape  to
>         prevent  a  break  inside  a short phrase or between a numerical
>         quantity and its corresponding unit(s).
>
>                Before starting the motor, set the output speed to\~1.
>                There are 1,024\~bytes in 1\~kiB.
>                CSTR\~#8 documents the B language.

Thank you for the review!

I think I disagree: IMO a number+unit should be treated as a single
entity both semantically/logically and typographically (at least as far
as space stretching goes), i.e., say (if I understand the effect of '\ '
and '\~' right),

  255 bytes               and                64 kB,          respectively.

would make a bit more sense to me than

  255        bytes        and         64         kB,         respectively.

Current Linux man-pages usage doesn't appear quite consistent, but '\ '
prevails over '\~' (about 6:1), and my cursory grep found only one
instance of '\~' used between a number and its unit (vs. many instances
of '\ ' in that context).

In view of the above, failing any instruction from a man-pages
maintainer to the contrary, I'd prefer leaving this as is.

  With best wishes,

  Štěpán

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-07-30 14:15   ` Štěpán Němec
@ 2022-07-30 17:53     ` Alejandro Colomar (man-pages)
  2022-07-30 17:59       ` Alejandro Colomar (man-pages)
  2022-08-01 13:28       ` Alejandro Colomar
  0 siblings, 2 replies; 25+ messages in thread
From: Alejandro Colomar (man-pages) @ 2022-07-30 17:53 UTC (permalink / raw)
  To: Štěpán Němec, G. Branden Robinson
  Cc: linux-man, Michael Kerrisk

Hi Štěpán and Branden!

On 7/30/22 16:15, Štěpán Němec wrote:
> 
> Hello Branden,
> 
> On Fri, 29 Jul 2022 15:58:23 -0500
> G. Branden Robinson wrote:
> 
>>> -The VFS imposes limitations that an attribute names is limited to 255 bytes
>>> -and an attribute value is limited to 64\ kB.
>>> +The VFS-imposed limits on attribute names and values are 255 bytes
>>> +and 64\ kB, respectively.
>>
>> While you're tidying this up, I would convert the `\ ` escape sequence
>> to `\~`.  Both are non-breaking spaces, but the latter is adjustable.
>>
>> groff_man(7) from groff 1.22.4 says:
>>
>>   \~     Adjustable, non-breaking space character.  Use  this  escape  to
>>          prevent  a  break  inside  a short phrase or between a numerical
>>          quantity and its corresponding unit(s).
>>
>>                 Before starting the motor, set the output speed to\~1.
>>                 There are 1,024\~bytes in 1\~kiB.
>>                 CSTR\~#8 documents the B language.
> 
> Thank you for the review!
> 
> I think I disagree: IMO a number+unit should be treated as a single
> entity both semantically/logically and typographically (at least as far
> as space stretching goes), i.e., say (if I understand the effect of '\ '
> and '\~' right),
> 
>    255 bytes               and                64 kB,          respectively.
> 
> would make a bit more sense to me than
> 
>    255        bytes        and         64         kB,         respectively.
> 
> Current Linux man-pages usage doesn't appear quite consistent, but '\ '
> prevails over '\~' (about 6:1), and my cursory grep found only one
> instance of '\~' used between a number and its unit

Would you mind sensing a patch for that one between the number and its unit?

> (vs. many instances
> of '\ ' in that context).

That is just a matter of writers not knowing the existence of \~ ('\ ' 
was documented in man-pages(7), but \~ wasn't).  I wouldn't give much 
more importance to existing practice in this regard.

When I read this email I had no strong opinion; both variants made sense 
to me.  So I did some investigation, to see if the SI already specifies 
something about it; and it does:

<https://www.bipm.org/en/publications/si-brochure/>:

[
5.2 Unit symbols

Unit symbols are printed in upright type regardless of the type used in 
the surrounding text.  They are printed in lower-case letters unless 
they are derived from a proper name, in which case the first letter is a 
capital letter.

An exception, adopted by the 16th CGPM (1979, Resolution 6), is that 
either capital L or lower-case l is allowed for the litre, in order to 
avoid possible confusion between the numeral 1 (one) and the lower-case 
letter l (el).

A multiple or sub-multiple prefix, if used, is part of the unit and 
precedes the unit symbol without a separator.  A prefix is never used in 
isolation and compound prefixes are never used.

Unit symbols are mathematical entities and not abbreviations. Therefore, 
they are not followed by a period except at the end of a sentence, and 
one must neither use the plural nor mix unit symbols and unit names 
within one expression, since names are not mathematical entities.

In forming products and quotients of unit symbols the normal rules of 
algebraic multiplication or division apply.  Multiplication must be 
indicated by a space or a half-high (centred) dot (⋅), since otherwise 
some prefixes could be misinterpreted as a unit symbol.  Division is 
indicated by a horizontal line, by a solidus (oblique stroke, /) or by 
negative exponents.  When several unit symbols are combined, care should 
be taken to avoid ambiguities, for example by using brackets or negative 
exponents.  A solidus must not be used more than once in a given 
expression without brackets to remove ambiguities.

It is not permissible to use abbreviations for unit symbols or unit 
names, such as sec (for either s or second), sq. mm (for either mm2 or 
square millimetre), cc (for either cm3 or cubic centimetre), or mps (for 
either m/s or metre per second).  The use of the correct symbols for
SI units, and for units in general, as listed in earlier chapters of 
this broch ure, is mandatory.  In this way ambiguities and 
misunderstandings in the values of quantities are avoided.
]

[
5.4.3 Formatting the value of a quantity

The numerical value always precedes the unit and a space is always used 
to separate the unit from the number.  Thus the value of the quantity is 
the product of the number and the unit.  The space between the number 
and the unit is regarded as a multiplication sign (just as a space 
between units implies multiplication).  The only exceptions to this rule 
  are for the unit symbols for degree, minute and second for plane 
angle, °, ′ and ′′, respectively, for which no space is left between the 
  numerical value and the unit symbol.

This rule means that the symbol °C for the degree Celsius is preceded by 
a space when one expresses values of Celsius temperature t.

Even when the value of a quantity is used as an adjective, a space is 
left between the numerical value and the unit symbol.  Only when the 
name of the unit is spelled out would the  ordinary rules of grammar 
apply, so that in English a hyphen would be used to separate the number 
from the unit.

In any expression, only one unit is used. An exception to this rule is 
in expressing the values of time and of plane angles using non-SI units. 
  However, for plane angles it is generally preferable to divide the 
degree decimally.  It is  therefore preferable to write 22.20° rather 
than 22° 12′, except in  fields such as navigation, cartography, 
astronomy, and in the measurement of very small angles.
]

Sorry for copying the full text, but I preferred to give enough context.

So, from the SI text quoted above, the space is not a word separator in 
that context (it is for example not allowed to hyphenate between the 
value and the unit even if it acts as an adjective; the SI disables 
normal language rules).  It is instead a mathematical symbol denoting 
multiplication, and the whole value+unit is a single mathematical 
expression; to me, that is better denoted with a single space, rather 
than an adjustable one.

Therefore, I'd say that it makes more sense in this case to use '\ '.

> 
> In view of the above, failing any instruction from a man-pages
> maintainer to the contrary, I'd prefer leaving this as is.

In the general case, I prefer \~, but for value+unit I prefer '\ '.
Thank you both!

> 
>    With best wishes,
> 
>    Štěpán

Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-07-30 17:53     ` Alejandro Colomar (man-pages)
@ 2022-07-30 17:59       ` Alejandro Colomar (man-pages)
  2022-08-01 13:28       ` Alejandro Colomar
  1 sibling, 0 replies; 25+ messages in thread
From: Alejandro Colomar (man-pages) @ 2022-07-30 17:59 UTC (permalink / raw)
  To: Štěpán Němec, G. Branden Robinson
  Cc: linux-man, Michael Kerrisk

On 7/30/22 19:53, Alejandro Colomar (man-pages) wrote:
> 
> Even when the value of a quantity is used as an adjective, a space is 
> left between the numerical value and the unit symbol.  Only when the 
> name of the unit is spelled out would the  ordinary rules of grammar 
> apply, so that in English a hyphen would be used to separate the number 
> from the unit.

Although, I missed this small paragraph.  According to that, it would be 
255\~bytes but 64\ kB.


-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-07-30 17:53     ` Alejandro Colomar (man-pages)
  2022-07-30 17:59       ` Alejandro Colomar (man-pages)
@ 2022-08-01 13:28       ` Alejandro Colomar
  2022-08-11 12:48         ` Ingo Schwarze
  1 sibling, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-08-01 13:28 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: linux-man, groff


[-- Attachment #1.1: Type: text/plain, Size: 9377 bytes --]

[
CC -= Štěpán:
I don't think he's interested in a deep discussion about use of \~ and 
'\ ' in man pages
CC -= mtk:
He's already subscribed to the list, and quite silent these days]
CC += groff@:
probably people there are interested in this discussion
]

Hi Branden,

On 7/30/22 19:53, Alejandro Colomar (man-pages) wrote:
> Hi Štěpán and Branden!
> 
> On 7/30/22 16:15, Štěpán Němec wrote:
>>
>> Hello Branden,
>>
>> On Fri, 29 Jul 2022 15:58:23 -0500
>> G. Branden Robinson wrote:
>>
>>>> -The VFS imposes limitations that an attribute names is limited to 
>>>> 255 bytes
>>>> -and an attribute value is limited to 64\ kB.
>>>> +The VFS-imposed limits on attribute names and values are 255 bytes
>>>> +and 64\ kB, respectively.
>>>
>>> While you're tidying this up, I would convert the `\ ` escape sequence
>>> to `\~`.  Both are non-breaking spaces, but the latter is adjustable.
>>>
>>> groff_man(7) from groff 1.22.4 says:
>>>
>>>   \~     Adjustable, non-breaking space character.  Use  this  
>>> escape  to
>>>          prevent  a  break  inside  a short phrase or between a 
>>> numerical
>>>          quantity and its corresponding unit(s).
>>>
>>>                 Before starting the motor, set the output speed to\~1.
>>>                 There are 1,024\~bytes in 1\~kiB.
>>>                 CSTR\~#8 documents the B language.
>>
>> Thank you for the review!
>>
>> I think I disagree: IMO a number+unit should be treated as a single
>> entity both semantically/logically and typographically (at least as far
>> as space stretching goes), i.e., say (if I understand the effect of '\ '
>> and '\~' right),
>>
>>    255 bytes               and                64 kB,          
>> respectively.
>>
>> would make a bit more sense to me than
>>
>>    255        bytes        and         64         kB,         
>> respectively.
>>
>> Current Linux man-pages usage doesn't appear quite consistent, but '\ '
>> prevails over '\~' (about 6:1), and my cursory grep found only one
>> instance of '\~' used between a number and its unit
> 
> Would you mind sensing a patch for that one between the number and its 
> unit?
> 
>> (vs. many instances
>> of '\ ' in that context).
> 
> That is just a matter of writers not knowing the existence of \~ ('\ ' 
> was documented in man-pages(7), but \~ wasn't).  I wouldn't give much 
> more importance to existing practice in this regard.
> 
> When I read this email I had no strong opinion; both variants made sense 
> to me.  So I did some investigation, to see if the SI already specifies 
> something about it; and it does:
> 
> <https://www.bipm.org/en/publications/si-brochure/>:
> 
> [
> 5.2 Unit symbols
> 
> Unit symbols are printed in upright type regardless of the type used in 
> the surrounding text.  They are printed in lower-case letters unless 
> they are derived from a proper name, in which case the first letter is a 
> capital letter.
> 
> An exception, adopted by the 16th CGPM (1979, Resolution 6), is that 
> either capital L or lower-case l is allowed for the litre, in order to 
> avoid possible confusion between the numeral 1 (one) and the lower-case 
> letter l (el).
> 
> A multiple or sub-multiple prefix, if used, is part of the unit and 
> precedes the unit symbol without a separator.  A prefix is never used in 
> isolation and compound prefixes are never used.
> 
> Unit symbols are mathematical entities and not abbreviations. Therefore, 
> they are not followed by a period except at the end of a sentence, and 
> one must neither use the plural nor mix unit symbols and unit names 
> within one expression, since names are not mathematical entities.
> 
> In forming products and quotients of unit symbols the normal rules of 
> algebraic multiplication or division apply.  Multiplication must be 
> indicated by a space or a half-high (centred) dot (⋅), since otherwise 
> some prefixes could be misinterpreted as a unit symbol.  Division is 
> indicated by a horizontal line, by a solidus (oblique stroke, /) or by 
> negative exponents.  When several unit symbols are combined, care should 
> be taken to avoid ambiguities, for example by using brackets or negative 
> exponents.  A solidus must not be used more than once in a given 
> expression without brackets to remove ambiguities.
> 
> It is not permissible to use abbreviations for unit symbols or unit 
> names, such as sec (for either s or second), sq. mm (for either mm2 or 
> square millimetre), cc (for either cm3 or cubic centimetre), or mps (for 
> either m/s or metre per second).  The use of the correct symbols for
> SI units, and for units in general, as listed in earlier chapters of 
> this broch ure, is mandatory.  In this way ambiguities and 
> misunderstandings in the values of quantities are avoided.
> ]
> 
> [
> 5.4.3 Formatting the value of a quantity
> 
> The numerical value always precedes the unit and a space is always used 
> to separate the unit from the number.  Thus the value of the quantity is 
> the product of the number and the unit.  The space between the number 
> and the unit is regarded as a multiplication sign (just as a space 
> between units implies multiplication).  The only exceptions to this rule 
>   are for the unit symbols for degree, minute and second for plane 
> angle, °, ′ and ′′, respectively, for which no space is left between the 
>   numerical value and the unit symbol.
> 
> This rule means that the symbol °C for the degree Celsius is preceded by 
> a space when one expresses values of Celsius temperature t.
> 
> Even when the value of a quantity is used as an adjective, a space is 
> left between the numerical value and the unit symbol.  Only when the 
> name of the unit is spelled out would the  ordinary rules of grammar 
> apply, so that in English a hyphen would be used to separate the number 
> from the unit.
> 
> In any expression, only one unit is used. An exception to this rule is 
> in expressing the values of time and of plane angles using non-SI units. 
>   However, for plane angles it is generally preferable to divide the 
> degree decimally.  It is  therefore preferable to write 22.20° rather 
> than 22° 12′, except in  fields such as navigation, cartography, 
> astronomy, and in the measurement of very small angles.
> ]
> 
> Sorry for copying the full text, but I preferred to give enough context.
> 
> So, from the SI text quoted above, the space is not a word separator in 
> that context (it is for example not allowed to hyphenate between the 
> value and the unit even if it acts as an adjective; the SI disables 
> normal language rules).  It is instead a mathematical symbol denoting 
> multiplication, and the whole value+unit is a single mathematical 
> expression; to me, that is better denoted with a single space, rather 
> than an adjustable one.
> 
> Therefore, I'd say that it makes more sense in this case to use '\ '.
> 
>>
>> In view of the above, failing any instruction from a man-pages
>> maintainer to the contrary, I'd prefer leaving this as is.
> 
> In the general case, I prefer \~, but for value+unit I prefer '\ '.
> Thank you both!
> 
>>
>>    With best wishes,
>>
>>    Štěpán
> 
> Cheers,
> 
> Alex
> 
> 

On 7/30/22 19:59, Alejandro Colomar (man-pages) wrote:
 > On 7/30/22 19:53, Alejandro Colomar (man-pages) wrote:
 >>
 >> Even when the value of a quantity is used as an adjective, a space is
 >> left between the numerical value and the unit symbol.  Only when the
 >> name of the unit is spelled out would the  ordinary rules of grammar
 >> apply, so that in English a hyphen would be used to separate the
 >> number from the unit.
 >
 > Although, I missed this small paragraph.  According to that, it would be
 > 255\~bytes but 64\ kB.

I left the whole original conversation for groff@ users to read it 
without needing to go to linux-man@ archives.


I'd like to arrive to some consensus on usage of \~ and '\ '.

For things related to the SI, we should follow SI conventions (they 
developed them for a reason, and I don't see a strong reason to deviate).

For things unrelated to the SI, we need to come up with some convention. 
  I think mirroring what the SI does could be good.

For example, for commands, I'd use non-adjustable spaces.  For pointer 
types, I'd also use the non-adjustable space.  For compound names such 
as 'RFC 1234', I'd say normal language rules apply, and the space should 
be adjustable.

To be clear, I'll add some examples taken from the Linux man-pages (and 
some of them modified by me):

.I "struct termios2\ *"
.I (1\ <<\ oparg)
.I unice\ =\ 20\ \-\ knice
is filesystem dependent and is typically 16\ MiB.
.I (uid_t)\ \-1
Enables RFC\~7413 Fast Open support.
.I Power ISA, Book\~II - Section\~3.1 (Program Priority Registers)
Before starting the motor, set the output speed to\~1.
There are 1,024\~bytes in 1\ kiB.
CSTR\~#8 documents the B language.

What do you think?


Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-08-01 13:28       ` Alejandro Colomar
@ 2022-08-11 12:48         ` Ingo Schwarze
  2022-08-11 20:17           ` G. Branden Robinson
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Schwarze @ 2022-08-11 12:48 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: g.branden.robinson, linux-man, groff

Hi Alejandro,

Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200:

> I'd like to arrive to some consensus on usage of \~ and '\ '.

In manual pages, always use "\ " and never use "\~", period.
The former is portable and the latter is a GNU extension.

> What do you think?

I think you are massively overthinking this and the whole SI
argument is irrelevent for manual pages.  While the above concern
about robustness is minor, too (both groff and mandoc support \~),
portability is still significantly more important than such minute
typographical details.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-08-11 12:48         ` Ingo Schwarze
@ 2022-08-11 20:17           ` G. Branden Robinson
  2022-08-12 14:30             ` Ingo Schwarze
  0 siblings, 1 reply; 25+ messages in thread
From: G. Branden Robinson @ 2022-08-11 20:17 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 4482 bytes --]

At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote:
> Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200:
> > I'd like to arrive to some consensus on usage of \~ and '\ '.
> 
> In manual pages, always use "\ " and never use "\~", period.

This is hugely overstated.

> The former is portable and the latter is a GNU extension.

...that is over 30 years old and supported by Heirloom Doctools troff
for 17 years now, neatroff for about six, and your mandoc for three.

For full disclosure, I'll acknowledge that Documenter's Workbench [DWB]
troff doesn't support it, but it doesn't seem to have been maintained
for 30 years (Heirloom Doctools troff appears to be its
descendant/successor).  plan9port troff doesn't either, and its laudable
introduction of a man(7) MR macro notwithstanding, its activity level is
not high.

I would pessimistically assume that most or all proprietary Unix
troffs branched off from V7 Unix troff or early device-independent troff
(maybe DWB 1.0 troff, ca. 1984 [?, 1]) lack support for `\~`.[2]

I further note that groff has a long tradition of inclusion in BSD
Unix,[3] and despite the efforts of the mdocml/mandoc project to
supplant or dispose of it groff in BSD's descendant communities, the
underlying fact remains.  Giving up support for `\~` was therefore, in
this sense, a regression, and one that took quite some time to address.

> > What do you think?
> 
> I think you are massively overthinking this and the whole SI
> argument is irrelevent for manual pages.

Man pages are technical writing and BIPM's recommendations in this area
that Alejandro uncovered have prompted me to reconsider the style advice
in groff_man_style(7) [from groff Git].

But you should welcome that.  It would mean that a handful of uses of
`\~` in the groff man pages would move to `\ `, which is motion in the
direction you want anyway.

In any event, the selection of `\ ` versus `\~`, assuming support for
both and an understanding of their distinct meanings and effect on
adjusted output, is a matter for a software project's documentation
style guide.

As I recall, mandoc does not even support "full justification"
(alignment of text to both left and right margins, with inter-word
spaces expanded ["adjusted"] to achieve this) in the first place and
there are no plans to.  mandoc can thus treat the two sequences as
synonymous--but that doesn't mean the `\~` escape sequence is a
gratuitous alias or deviation from the norm.  It is a replacement for an
arcane troff hack.

  .\" no trailing space or character translation target on the next line
  .tr ~
  G.~W.~Pabst directed several films in the 1920s.

> While the above concern about robustness is minor, too (both groff and
> mandoc support \~),

...as do others, listed above...

> portability is still significantly more important

You are not quantifying anything.  Come on, can we at least get a Fermi
estimation of the installed bases of the respective troff
implementations and mandoc?

There are, I presume, still C compilers out there that don't accept ANSI
C (1989) input.  That doesn't, and shouldn't, stop the rest of the world
from moving forward.

> than such minute typographical details.

For someone arguing from a standpoint of such slavish fidelity to 40
year-old practices, you seem to be selective in the way you do it.  The
Unix manual was always meant to be typeset.

"The manual was intended to be typeset; some detail is sacrificed on
terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_,
Eighth Edition, Volume 1, February 1985)

At the time that statement was written, the sentiment was some 12 years
old; the Bell Labs CSRC typeset man pages as soon as it was possible for
them to do so.[4]

I understand if some man page contributors don't want to mess with
aspects of typography that will appear only when formatting for output
devices more sophisticated than terminal emulators--widow and orphan
management can be tedious, for instance--but we shouldn't promulgate
advice that makes the task of those who do--people like Alejandro and
me--_harder_.

Regards,
Branden

[1] https://archive.org/details/dwb-preprocessor-ref
[2] https://github.com/n-t-roff/Solaris10-ditroff/blob/master/troff/n1.c#L797
[3] https://minnie.tuhs.org/cgi-bin/utree.pl?file=Net2/usr/src/usr.bin/groff/VERSION
[4] https://dspinellis.github.io/unix-v4man/v4man.pdf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] xattr.7: wfix
  2022-08-11 20:17           ` G. Branden Robinson
@ 2022-08-12 14:30             ` Ingo Schwarze
  2022-08-12 22:10               ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Schwarze @ 2022-08-12 14:30 UTC (permalink / raw)
  To: g.branden.robinson; +Cc: Alejandro Colomar, linux-man, groff

Hi Branden,

G. Branden Robinson wrote on Thu, Aug 11, 2022 at 03:17:14PM -0500:
> At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote:
>> Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200:

>>> I'd like to arrive to some consensus on usage of \~ and '\ '.

>> In manual pages, always use "\ " and never use "\~", period.

> This is hugely overstated.

>> The former is portable and the latter is a GNU extension.

> ...that is over 30 years old and supported by Heirloom Doctools troff
> for 17 years now, neatroff for about six, and your mandoc for three.

Actually, mandoc supports \~ at least since Sep 17 2009:
https://cvsweb.bsd.lv/mandoc/Attic/chars.in?rev=1.1&content-type=text/x-cvsweb-markup

> For full disclosure, I'll acknowledge that Documenter's Workbench [DWB]
> https://archive.org/details/dwb-preprocessor-ref
> troff doesn't support it, but it doesn't seem to have been maintained
> for 30 years (Heirloom Doctools troff appears to be its
> descendant/successor).

I agree that missing support in DWB is a weak argument.  It is
unlikely that many people use it for practical work.  They would
likely suffer from more serious problems than \~, too.

> plan9port troff doesn't either, and its laudable introduction
> of a man(7) MR macro notwithstanding, its activity level is
> not high.

There are people using Plan 9 for practical work though, they have
even occasionally posted on the groff and mandoc lists, so that is a
bit more of a problem.

> I would pessimistically assume that most or all proprietary Unix
> troffs branched off from V7 Unix troff or early device-independent troff
> (maybe DWB 1.0 troff, ca. 1984 [?, 1]) lack support for `\~`.
> https://github.com/n-t-roff/Solaris10-ditroff/blob/master/troff/n1.c#L797

That does sound likely.  As an example, look at Oracle Solaris 11:

   > uname -a
  SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
   > printf "a\\\\~b\n" | nroff | head -n 1
  a~b
   > printf "a\\\\~b\n" | groff -T ascii | head -n 1
  a b

> I further note that groff has a long tradition of inclusion in BSD
> Unix, https://minnie.tuhs.org/cgi-bin/utree.pl
> ?file=Net2/usr/src/usr.bin/groff/VERSION

Yes.  Cynthia already considered dropping support for Kernighan's
troff, but the CSRG vetoed that.  Inclusion of groff wasn't
controversial even at a time when groff didn't have its own version
conrol yet.  Consequently, you are right that \~ is unlikely to cause
trouble on any BSD system.

> and despite the efforts of the mdocml/mandoc project to
> supplant or dispose of it groff in BSD's descendant communities, the
> underlying fact remains.  Giving up support for `\~` was therefore, in
> this sense, a regression, and one that took quite some time to address.

I don't think that anyone gave up support for \~.
But we have evidence that some never implemented support for it.

[...]
> As I recall, mandoc does not even support "full justification"
> (alignment of text to both left and right margins, with inter-word
> spaces expanded ["adjusted"] to achieve this) in the first place and
> there are no plans to.

Correct.

> mandoc can thus treat the two sequences as synonymous--

It does.  Mandoc maps all of \  \~ \0 to U+00A0.

> but that doesn't mean the `\~` escape sequence is a gratuitous alias
> or deviation from the norm.

No.  It is useful for general-purpose typesetting,
like many GNU extensions are.

>> portability is still significantly more important

> You are not quantifying anything.  Come on, can we at least get a
> Fermi estimation of the installed bases of the respective troff
> implementations and mandoc?

Frankly, i have no idea how to estimate the number of actively used
installations of Plan 9, Solaris (any version), and possibly
additional commercial systems like AIX and HP-UX, or how to check
what the latter support.

There might be more systems out there parsing manual pages (not
necessarily full-featured roff(7) implementations like those
you listed), but providing specific evidence of such systems
would likely be my job to back up my advice.  I'm not searching
for them right now because we already have a few relevant examples.

>> than such minute typographical details.

> For someone arguing from a standpoint of such slavish fidelity to 40
> year-old practices, you seem to be selective in the way you do it.

Admitted.  Sometimes, i do see the value of new features, even
when they are backward-incompatible.

> The Unix manual was always meant to be typeset.
> 
> "The manual was intended to be typeset; some detail is sacrificed on
> terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_,
> Eighth Edition, Volume 1, February 1985)
> 
> At the time that statement was written, the sentiment was some 12 years
> old; the Bell Labs CSRC typeset man pages as soon as it was possible for
> them to do so.[4]
> [4] https://dspinellis.github.io/unix-v4man/v4man.pdf
>
> I understand if some man page contributors don't want to mess with
> aspects of typography that will appear only when formatting for output
> devices more sophisticated than terminal emulators--widow and orphan
> management can be tedious, for instance--but we shouldn't promulgate
> advice that makes the task of those who do--people like Alejandro and
> me--_harder_.

Even authors might disagree which is more important:

 (1) The typograpical difference between "\~" and "\ "
     in PDF and PostScript output of manual pages.

 (2) Correctly rendering whitespace on Plan 9, Solaris,
     and likely some other systems *at all*, for any output mode.

I suspect that many would prefer (2) - of course, that claim is hard
to quantify.

It would probably be good to arrive at a consensus recommendation
for such cases because many manual page authors probably have little
interest in judging such questions themselves.  Consensus seems
hard to reach though.  So maybe the best we can do is to simply
state the fact that \~ is still not supported by a few not very widely
used, but still somewahat significant roff implementations like Plan 9
and Solaris, even though that forces authors to draw their own
conclusion.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix)
  2022-08-12 14:30             ` Ingo Schwarze
@ 2022-08-12 22:10               ` G. Branden Robinson
  2022-08-13  4:23                 ` G. Branden Robinson
  2022-08-13 17:27                 ` DJ Chase
  0 siblings, 2 replies; 25+ messages in thread
From: G. Branden Robinson @ 2022-08-12 22:10 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 7827 bytes --]

Hi Ingo,

At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Thu, Aug 11, 2022 at 03:17:14PM -0500:
> > At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote:
> >> Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200:
> 
> >>> I'd like to arrive to some consensus on usage of \~ and '\ '.
> 
> >> In manual pages, always use "\ " and never use "\~", period.
> 
> > This is hugely overstated.
> 
> >> The former is portable and the latter is a GNU extension.
> 
> > ...that is over 30 years old and supported by Heirloom Doctools
> > troff for 17 years now, neatroff for about six, and your mandoc for
> > three.
> 
> Actually, mandoc supports \~ at least since Sep 17 2009:
> https://cvsweb.bsd.lv/mandoc/Attic/chars.in?rev=1.1&content-type=text/x-cvsweb-markup

Whoops!  I regret the error, and will update groff's Texinfo manual to
correct this.

> > plan9port troff doesn't either, and its laudable introduction
> > of a man(7) MR macro notwithstanding, its activity level is
> > not high.
> 
> There are people using Plan 9 for practical work though, they have
> even occasionally posted on the groff and mandoc lists, so that is a
> bit more of a problem.

I have no moral objection to submitting a patch; I don't know my way
around the AT&T troff code base (which Plan 9 troff mostly is) nearly as
well as groff, though, and, as ever, available time is scarce.  But, if
that's what it takes to get this escape sequence de facto standardized,
and no one else will do it, that will move it up the priority queue.

I don't expect full support to be trivial.  I don't think AT&T troff has
a concept of a space that is adjustable but not breakable.  If that
blows out the effort/reward estimate, treating `\~` as a synonym of `\ `
as mandoc does _should_ be trivial.

Yup, it looks like it is.

https://github.com/9fans/plan9port/blob/master/src/cmd/troff/n1.c#L515

> > I would pessimistically assume that most or all proprietary Unix
> > troffs branched off from V7 Unix troff or early device-independent troff
> > (maybe DWB 1.0 troff, ca. 1984 [?, 1]) lack support for `\~`.
> > https://github.com/n-t-roff/Solaris10-ditroff/blob/master/troff/n1.c#L797
> 
> That does sound likely.  As an example, look at Oracle Solaris 11:
> 
>    > uname -a
>   SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
>    > printf "a\\\\~b\n" | nroff | head -n 1
>   a~b
>    > printf "a\\\\~b\n" | groff -T ascii | head -n 1
>   a b

Yes.  The rule is, if no semantics are defined for the function selector
(the character after the escape character), then the character is
treated as if it were not escaped.

> > I further note that groff has a long tradition of inclusion in BSD
> > Unix, https://minnie.tuhs.org/cgi-bin/utree.pl
> > ?file=Net2/usr/src/usr.bin/groff/VERSION
> 
> Yes.  Cynthia already considered dropping support for Kernighan's
> troff, but the CSRG vetoed that.  Inclusion of groff wasn't
> controversial even at a time when groff didn't have its own version
> conrol yet.

It seems strange now how revision control ever seemed like a luxury.
For a few years I maintained Debian's XFree86 packages, which had
_megabytes_ of patches on top of upstream, without using SCCS or RCS or
CVS and even without a tool as nice as quilt.

I was completely insane.  On the other hand, it trained me to be pretty
careful.

Eventually, I acquired sanity and started using Subversion.

> Frankly, i have no idea how to estimate the number of actively used
> installations of Plan 9, Solaris (any version), and possibly
> additional commercial systems like AIX and HP-UX, or how to check
> what the latter support.

Users/maintainers of these systems have to get involved and speak up.
There is an unbounded quantity of Russell's Teapots labeled with names
of Unix variants that have gone defunct.

Without evidence, we must assume their numbers are too small to serve as
a gate on development.

That said, it remains polite to document changes that would affect them.

> There might be more systems out there parsing manual pages (not
> necessarily full-featured roff(7) implementations like those
> you listed), but providing specific evidence of such systems
> would likely be my job to back up my advice.  I'm not searching
> for them right now because we already have a few relevant examples.

plan9port's troff seems like the only case for which we have concrete
evidence, and Russ Cox has already been a pleasure to work with.

I don't know that any user of OpenSolaris/Illumos troff has ever spoken
up on the groff mailing list, which in spite of its
implementation-specific name seems to be the water cooler for what
remains of the global *roff community.

The good news is that, both being descended from AT&T troff and, from
what I've seen, neither having been re-architected, if someone comes up
with `\~` support for plan9port troff, I predict that it will be
mergeable into OpenSolaris/Illumos troff without much difficulty.

...especially the trivial `\ ` synonym version discussed above.

> Even authors might disagree which is more important:
> 
>  (1) The typograpical difference between "\~" and "\ "
>      in PDF and PostScript output of manual pages.
> 
>  (2) Correctly rendering whitespace on Plan 9, Solaris,
>      and likely some other systems *at all*, for any output mode.
> 
> I suspect that many would prefer (2) - of course, that claim is hard
> to quantify.

Another thing to consider is how bad the damage to comprehension is if a
tilde shows up in place of a space.

In a prose phrase, it is likely to be distracting and annoying but will
not be a barrier to comprehension.

[from groff_diff(7):]
  For example, if the current font is\~1 and font position\~1 is

In synopses of commands and language features (like *roff requests or
macros), I think anyone already familiar with Unix command lines or
*roff languages, respectively, can still push their way past it, but it
is worse.

[from gdiffmk(1):]
  .RB [ \-a\~\c
  .RB [ \-c\~\c
  .RB [ \-d\~\c
  .RB [ \-x\~\c
  .BI \-a\~ add-mark
  .BI \-c\~ change-mark
  .BI \-d\~ delete-mark
  .BI \-M\~ "mark1 mark2"
  .BI \-x\~ diff-command
  .BI \-x\~ diff-command

[from groff_diff(7):]
  .BI .chop\~ object
  .BI .class\~ "name c1 c2\~"\c
  .BI .close\~ stream
  .BI .composite\~ glyph1\~glyph2
  .BI .color\~ n
  .BI .cp\~ n

The tilde showing up in boldface would be especially disappointing.

On the gripping hand, such aggressive use of `\~` is much more often
seen in groff man pages than in (any?) others, and groff man pages can
be expected to be formatted with groff or another `\~`-recognizing
formatter much of the time.

> It would probably be good to arrive at a consensus recommendation
> for such cases because many manual page authors probably have little
> interest in judging such questions themselves.  Consensus seems
> hard to reach though.  So maybe the best we can do is to simply
> state the fact that \~ is still not supported by a few not very widely
> used, but still somewahat significant roff implementations like Plan 9
> and Solaris, even though that forces authors to draw their own
> conclusion.

I could easily copy the (now-corrected with respected to the age of
mandoc's `\~` support) material about this escape sequence from our
groff Texinfo manual to groff_man_style(1), where the "Portability"
section quoted earlier in the thread is housed.

As with the uptake of groff man(7) extension macros (be they 15 years
old or more recent), a software project's documentors may be better
placed than we are to assess the formatting capabilities of their users.

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix)
  2022-08-12 22:10               ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson
@ 2022-08-13  4:23                 ` G. Branden Robinson
  2022-08-14 14:15                   ` Ingo Schwarze
  2022-08-13 17:27                 ` DJ Chase
  1 sibling, 1 reply; 25+ messages in thread
From: G. Branden Robinson @ 2022-08-13  4:23 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

[self-follow-up]

At 2022-08-12T17:10:35-0500, G. Branden Robinson wrote:
> At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote:
> > There are people using Plan 9 for practical work though, they have
> > even occasionally posted on the groff and mandoc lists, so that is a
> > bit more of a problem.

plan9port's troff is no longer a problem, thanks to Dan Cross acting on
my pull request at relativistic speed.

https://github.com/9fans/plan9port/commit/93f814360076ccf28d33c9cb909fca7200ba4a7d

I also have a PR pending with Illumos.

https://github.com/illumos/illumos-gate/pull/83

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix)
  2022-08-12 22:10               ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson
  2022-08-13  4:23                 ` G. Branden Robinson
@ 2022-08-13 17:27                 ` DJ Chase
  2022-08-14 13:56                   ` Standardize roff (was: *roff `\~` support) Ingo Schwarze
  1 sibling, 1 reply; 25+ messages in thread
From: DJ Chase @ 2022-08-13 17:27 UTC (permalink / raw)
  To: G. Branden Robinson, Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff

On Fri Aug 12, 2022 at 6:10 PM EDT, G. Branden Robinson wrote:
> Hi Ingo,
>
> At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote:
> > G. Branden Robinson wrote on Thu, Aug 11, 2022 at 03:17:14PM -0500:
> > > At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote:
> > >> The former is portable and the latter is a GNU extension.
> > 
> > > ...that is over 30 years old and supported by Heirloom Doctools
> > > troff for 17 years now, neatroff for about six, and your mandoc for
> > > three.
> > 
> > Actually, mandoc supports \~ at least since Sep 17 2009:
> > https://cvsweb.bsd.lv/mandoc/Attic/chars.in?rev=1.1&content-type=text/x-cvsweb-markup
>
> Whoops!  I regret the error, and will update groff's Texinfo manual to
> correct this.
>
> > > plan9port troff doesn't either, and its laudable introduction
> > > of a man(7) MR macro notwithstanding, its activity level is
> > > not high.
> > 
> > There are people using Plan 9 for practical work though, they have
> > even occasionally posted on the groff and mandoc lists, so that is a
> > bit more of a problem.
>
> […] But, if
> that's what it takes to get this escape sequence de facto standardized,
> and no one else will do it, that will move it up the priority queue.

Have we ever considered a de jure *roff standard? If not, here are just
some reasons:

	•  [the obvious benefits of standardizing anything]
	•  A standard could lead to more implementations because
	   developers would not have to be intimately familiar with the
	   {groff,heirloom,neatroff} toolchain before implementing a
	   *roff toolchain themselves.
	•  It could also lead to more users & use cases because existing
	   users could count on systems supporting certain features, so
	   they could use *roff in more situations, which would lead to
	   more exposure.

Cheers,
-- 
DJ Chase
They, Them, Theirs

PS: It’s ridiculous that *roff isn’t part of POSIX when it was Unix’s
    killer feature.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Standardize roff (was: *roff `\~` support)
  2022-08-13 17:27                 ` DJ Chase
@ 2022-08-14 13:56                   ` Ingo Schwarze
  2022-08-14 14:49                     ` DJ Chase
  2022-08-15  0:20                     ` Sam Varshavchik
  0 siblings, 2 replies; 25+ messages in thread
From: Ingo Schwarze @ 2022-08-14 13:56 UTC (permalink / raw)
  To: DJ Chase; +Cc: g.branden.robinson, Alejandro Colomar, linux-man, groff

Hi,

DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:

> Have we ever considered a de jure *roff standard?

No, i think that would be pure madness given the amount of working
time available in any of the roff projects.

I expect the amount of effort required to be significantly larger
than the amount of effort that would be required for rewriting
the entire groff documentation from scratch because:

 1. You would have to study all features of all the major roff
    implementations (groff, Heirloom, neatroff, Plan 9, and possibly
    some others, maybe even historical ones) and compare the features.
    For every difference (i.e. typically multiple times for almost every
    feature), you would have to descide which behaviour to standardize
    and what to leave unspecified.

 2. Discussions of the kind mentioned in item 1 are typically
    lengthy and often heated.  If you don't believe me, just buy
    several pounds of popcorn and watch the Austin list, where
    maintenance of the POSIX standard is being discussed.
    Even discussions of the most minute details tend to be
    complicated and extended.

 3. Even after deciding what you want to specify, looking at the
    manuals typically provides very little help because a
    standard document requires a completely different style.
    User and even reference documentation is optimized for clarity,
    comprehensibility, and usefulness in practice; a standard document
    needs to be optimized for formal precision, whereas
    comprehensibility and conciseness matters much less.

 4. Even when you have the text - almost certainly after many years
    of work by many people - be prepared for huge amounts of red
    tape, like dealing with elected decision-making bodies of
    professional associations, for example the IEEE.  Be prepared
    for having to know things like what technical societies,
    technical councils, and technical committees are, and how to
    deal with each of them.  You are certainly in for a lot of
    committee work, and i would count you lucky if you got away
    without having to deal with lawyers, paying membership fees,
    buying expensive standard documents you need for your work,
    and so on and so forth.  Even when you submit a technically
    perfect proposal, it will typically be rejected without even
    being considered until you secure the official sponsorship
    of at least one of the following: the IEEE, the Open Group,
    or ISO/IEC JTC 1/SC 22.  Of course, your milage may vary
    depending on what exactly you want to standardize and how,
    but since roff(1) is arguably the most famous UNIX program,
    i wouldn't be surprised if you were if for an uanbridged
    POSIX-style Odyssey.

 5. The above is not helped by standards committee work being
    typically conducted in ways that are technically ridiculously
    outdated, and i'm saying that as an avid user of cvs(1) who
    somewhat dislikes git(1) as overengineered and very strongly
    detests GitHub.  Take the Austin groups as an example.  Most of
    its work is changing the content of technical documents,
    but the group *never* uses diff(1), never uses patch(1), and
    never makes diffs available even after they have been approved.
    They are very firmly stuck in the 1980ies regarding the technolgies
    they are using and missed even most of the 1990ies innovations.
    They do have some kind of version control system internally, but
    no web interface of such version control ins publicly available,
    nor any other public read-only access to that version control.
    Even the source code of the finished version of the standard
    is typically not made available to the public (at least not
    without forcing people to jump through hoops).

> A standard could lead to more implementations because
> developers would not have to be intimately familiar with the
> {groff,heirloom,neatroff} toolchain before implementing a
> *roff toolchain themselves.

That's not even wishful thinking.  Better maintenance of the
existing implementations would be so much more useful than yet
another implementation.

> It could also lead to more users & use cases because existing
> users could count on systems supporting certain features, so
> they could use *roff in more situations, which would lead to
> more exposure.

You appear to massively overrate the importance end-users
typically attribute to standardization.  Even people *implementing*
a system rarely put such an emphasis on standardization.

> It’s ridiculous that *roff isn’t part of POSIX when it was Unix’s
> killer feature.

You are welcome to spend the many years required to change that.
But be aware that some standardization efforts that are part of
POSIX resulted in parts of the standard that are barely useable
for practical work.  One famous example is make(1).

Don't get me wrong: i think standardization is very nice to have,
should be taken very seriously when available, and provides some
value even when the standardization effort mostly failed, like in
the case of make(1).  But standardization is absolutely not cheap.
To the contrary, it is usually significantly more expensive than
implementation and documentation.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix)
  2022-08-13  4:23                 ` G. Branden Robinson
@ 2022-08-14 14:15                   ` Ingo Schwarze
  2022-08-14 22:21                     ` G. Branden Robinson
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Schwarze @ 2022-08-14 14:15 UTC (permalink / raw)
  To: g.branden.robinson; +Cc: Alejandro Colomar, linux-man, groff

Hi Branden,

G. Branden Robinson wrote on Fri, Aug 12, 2022 at 11:23:11PM -0500:
> At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote:

>> There are people using Plan 9 for practical work though, they have
>> even occasionally posted on the groff and mandoc lists, so that is a
>> bit more of a problem.

> plan9port's troff is no longer a problem, thanks to Dan Cross acting on
> my pull request at relativistic speed.
> https://github.com/9fans/plan9port/commit/93f814360076ccf28d33c9cb909fca7200ba4a7d

Nice.  :-)

> I also have a PR pending with Illumos.
> https://github.com/illumos/illumos-gate/pull/83

Illumos isn't doing development on GitHub.

Besides, Illumos is less of a problem because they have been using
mandoc as the default manual page formatter since July 2014.

All the same, getting \~ supported in their general-purpose
roff implementation is no doubt nice to have, too.

That reduces my converns mostly to commercial UNIXes and potentially
to a few ad-hoc conversion tools we are not even aware of.
Consequently, the converns aren't 100% resolved yet but getting
closer to becoming theoretical concerns.  If it's only commercial
UNIXes and unknown tools that may break, the improved typesetting
quality may be worth the risk.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 13:56                   ` Standardize roff (was: *roff `\~` support) Ingo Schwarze
@ 2022-08-14 14:49                     ` DJ Chase
  2022-08-14 16:32                       ` Alejandro Colomar
  2022-08-14 22:35                       ` G. Branden Robinson
  2022-08-15  0:20                     ` Sam Varshavchik
  1 sibling, 2 replies; 25+ messages in thread
From: DJ Chase @ 2022-08-14 14:49 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: g.branden.robinson, Alejandro Colomar, linux-man, groff

On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote:
> Hi,
>
> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:
>
> > Have we ever considered a de jure *roff standard?
>
> No, i think that would be pure madness given the amount of working
> time available in any of the roff projects.
>
> […]

This is very sad to hear.

> > It could also lead to more users & use cases because existing
> > users could count on systems supporting certain features, so
> > they could use *roff in more situations, which would lead to
> > more exposure.
>
> You appear to massively overrate the importance end-users
> typically attribute to standardization.

That’s probably because *I* massively overrate the importance of
standardization (I mean I literally carry a standards binder with me).
Still, though, it’s rather annoying that end users — especially
programmers — don’t value standards as much.

> > It’s ridiculous that *roff isn’t part of POSIX when it was Unix’s
> > killer feature.
>
> You are welcome to spend the many years required to change that.
> But be aware that some standardization efforts that are part of
> POSIX resulted in parts of the standard that are barely useable
> for practical work.  One famous example is make(1).
>
> Don't get me wrong: i think standardization is very nice to have,
> should be taken very seriously when available, and provides some
> value even when the standardization effort mostly failed, like in
> the case of make(1).  But standardization is absolutely not cheap.
> To the contrary, it is usually significantly more expensive than
> implementation and documentation.

Would an informal de jure standard be of any use? Like how TOML just has
a specification, but it’s somewhat usable as a standard because it’s
been pretty stable and because it’s written clearly enough.

Cheers,
-- 
DJ Chase
They, Them, Theirs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 14:49                     ` DJ Chase
@ 2022-08-14 16:32                       ` Alejandro Colomar
  2022-08-14 19:43                         ` DJ Chase
  2022-08-14 22:35                       ` G. Branden Robinson
  1 sibling, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-08-14 16:32 UTC (permalink / raw)
  To: DJ Chase, Ingo Schwarze; +Cc: g.branden.robinson, linux-man, groff


[-- Attachment #1.1: Type: text/plain, Size: 7383 bytes --]

Hi,

On 8/14/22 16:49, DJ Chase wrote:
> On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote:
>> Hi,
>>
>> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:
>>
>>> Have we ever considered a de jure *roff standard?
>>
>> No, i think that would be pure madness given the amount of working
>> time available in any of the roff projects.
>>
>> […]
> 
> This is very sad to hear.
> 
>>> It could also lead to more users & use cases because existing
>>> users could count on systems supporting certain features, so
>>> they could use *roff in more situations, which would lead to
>>> more exposure.
>>
>> You appear to massively overrate the importance end-users
>> typically attribute to standardization.
> 
> That’s probably because *I* massively overrate the importance of
> standardization (I mean I literally carry a standards binder with me).
> Still, though, it’s rather annoying that end users — especially
> programmers — don’t value standards as much.

(Official) standardization isn't necessarily a good thing.  With C, it 
was originally good, in the times of ISO C89.  Now, it's doing more 
damage to the language and current implementations than any good (it's 
still doing some good, but a lot of bad).

The best that a standardization process can do is limit itself to 
describe _only_ features already existing in the language, being a kind 
of arbiter that decides on which behavior is best for a given feature, 
so that all implementations follow the best existing one.  Where 
different implementations might have good reasons to do it differently, 
the standard should describe the behavior as implementation-specific. 
And of course, a standard should only standardize features that are 
expected to be good for every implementation, with optional features 
either not being standardized, or being marked optional by the standard 
(like Annex K was; although that one was broken, so it was later removed 
for good).

But that shouldn't be necessary if implementors had some decency and 
didn't implement features so that they are completely incompatible with 
those of other systems.  I.e., if an existing system has 'foo(int a);', 
you don't provide 'foo(int *b);'; you go for 'foo2(int *b);' or 'bar(int 
*b);'.  There's plenty of cases where this has happened, and in some 
cases it might be due to an accident, but in some other cases, it's just 
due to incompetence.  See an example that bit me a month ago: 
<https://github.com/nginx/unit/issues/737>.

And the bad things that standardization can do are several:

By reserving the power to centrally decide the future of a language, 
they take power from implementations, which now can't add some features, 
by fear that they might contradict a future standard.  This is very sad, 
because while the implementations are guided by usefulness and 
worthiness, and try to come up with the best feature for them (and by 
natural selection, implementations are then used or not used, depending 
on their quality), standards have a large part of bureaucracy, and that 
doesn't provide the best features.

A few examples of that are: a %b printf specifier for binary was 
rejected by glibc on the terms of something like "the feature is good, 
and the implementation seems correct, but %b is reserved by the 
standard, so we don't want to possibly conflict with a future standard"; 
luckily, the standard defined that, and the feature was added a few 
years later.  One example that is much more necessary is a way to get 
the size of an array, which currently is impossible in portable C (at 
least not in a way that safely rejects to compile on non-arrays).  I 
also proposed an addition to glibc, and the reasons to reject it were of 
the same kind, and arguing that the standard was discussing about adding 
such a feature; guess what? the standard hasn't added such a feature for 
C23, and we still have no portable way to do it (and the unportable ways 
are more cumbersome than what one would expect).  I hope C3x adds 
_Lengthof(arr), but who knows.

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf>
<https://stackoverflow.com/questions/37538/how-do-i-determine-the-size-of-my-array-in-c/57537491#57537491>


And then we have another problem of standardization committees: their 
priorities are so broken, that they prefer inventing a completely new 
feature for C, with nothing even remotely resemblant to it within the 
existing language (I'm talking about nullptr and nullptr_t), rather than 
standardizing an existing good feature such as POSIX's NULL  ((void *) 
0).  So now we have 0, NULL, and nullptr for referring to a null pointer 
constant in C.  And none of them is perfect.  0 needs to be casted when 
passed to variadic functions, and has readability issues.  NULL is 
perfect within the POSIX world, but if you go out of POSIX, it's as bad 
as 0.  nullptr, apart from being incomprehensible, it is unsafe; okay 
it's not unsafe by itself, and if it were the only way to refer to a 
null pointer constant, it would be great, but it's not, and even the 
committee recognizes that it will never be.

<https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269/48>

Many existing projects that use NULL (especially POSIX projects), are 
not going to change their whole codebase to use nullptr.  nullptr_t adds 
some features that add safety against null pointer constants based on 
the type of the constant (by means of _Generic); but that means that one 
can easily bypass those features by using NULL or 0, which means that 
it's not really safe, and it might give a sense of safety that it has 
not.  So, without extending my rant about nullptr much more, it's just a 
feature broken from day 0, invented by the ISO C committee.

Maybe one of the worst problems of the committee (WG14) is that many of 
its members are also members of the WG21, and as such, they may have 
incompatible priorities.

I don't see standardization as good as it may seem at first glance.

And of course following the standard should come with a pinch of salt: 
one should follow the standard, when the standard isn't broken.

But then, the standard isn't better than any other implementation.  So, 
as a programmer, I think programs should target their expected systems, 
and not more (unless it's easy).  If a program is to be run on Linux, 
then target GNU C.  If you can add some partial support for ISO C 
without interfering in your way significantly, then okay, go for it; but 
complete ISO C support is unthinkable; a program conforming to ISO C is 
useless, or unnecessarily complex, or even unsafe.  I implement things 
thinking on my system first, then if it's easy, I can support other FOSS 
Unix systems, if it's easy, but only if it is.  Commercial systems are 
automatically out of support.  I'm not spending a single minute of my 
time to be nice to those systems when their not nice to me.

I think it's better to let natural selection to work out its way.  If a 
feature is good, other implementations will pick it, and maybe even 
improve it.  If a feature is not good (or it's not needed by other 
systems), it will not be portable.

Cheers,

Alex


-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 16:32                       ` Alejandro Colomar
@ 2022-08-14 19:43                         ` DJ Chase
  2022-08-15 11:59                           ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: DJ Chase @ 2022-08-14 19:43 UTC (permalink / raw)
  To: Alejandro Colomar, Ingo Schwarze; +Cc: g.branden.robinson, linux-man, groff

On Sun Aug 14, 2022 at 12:32 PM EDT, Alejandro Colomar wrote:
> On 8/14/22 16:49, DJ Chase wrote:
> > On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote:
> >> You appear to massively overrate the importance end-users
> >> typically attribute to standardization.
> > 
> > That’s probably because *I* massively overrate the importance of
> > standardization (I mean I literally carry a standards binder with me).
> > Still, though, it’s rather annoying that end users — especially
> > programmers — don’t value standards as much.
>
> (Official) standardization isn't necessarily a good thing.  With C, it 
> was originally good, in the times of ISO C89.  Now, it's doing more 
> damage to the language and current implementations than any good (it's 
> still doing some good, but a lot of bad).
>
> [Snipped because I’m not going to quote the whole email — see previous
> message for argument]
>
> I think it's better to let natural selection to work out its way.  If a 
> feature is good, other implementations will pick it, and maybe even 
> improve it.  If a feature is not good (or it's not needed by other 
> systems), it will not be portable.

True; prescriptive standards can certainly make some things worse. As a
further example, ISO 8601 sucks. I mean, its core specification is
great, but there are so many different ways that are allowed that the
full standard is almost completely unparseable. It also uses a slash
between the start and end times of a period instead of something
sensible, like, I don’t know, an en-dash! Which means that periods can
be written with a slash (because that’s the standard) but also with an
en-dash (because that’s how ranges work in English), but also that one
can’t properly write a period in a file name or URI.

Still, though, I think descriptive standards can be net-positive. The
POSIX shell utilities comes to mind. Sure, they certainly have some
issues, but because it’s a trailing standard, implementers are free to
fix them.

Do you think that a descriptive/trailing standard could be beneficial
or would you still say that it could mostly hinder *roff
implementations?

Cheers,
-- 
DJ Chase
They, Them, Theirs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix)
  2022-08-14 14:15                   ` Ingo Schwarze
@ 2022-08-14 22:21                     ` G. Branden Robinson
  0 siblings, 0 replies; 25+ messages in thread
From: G. Branden Robinson @ 2022-08-14 22:21 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 1957 bytes --]

At 2022-08-14T16:15:54+0200, Ingo Schwarze wrote:
> > I also have a PR pending with Illumos.
> > https://github.com/illumos/illumos-gate/pull/83
> 
> Illumos isn't doing development on GitHub.

Yeah, I promptly got a lengthy follow-up from a member of the core team
pointing me to even more lengthy contribution procedures.

(I guess this explains the "-gate" suffix in the GH project name.)

> Besides, Illumos is less of a problem because they have been using
> mandoc as the default manual page formatter since July 2014.

Ahh, so the general Illumos user won't suffer mishandling of `\~`
anyway--not in man pages, at least.

> All the same, getting \~ supported in their general-purpose
> roff implementation is no doubt nice to have, too.

Yes.  But I don't have the spoons to go through their formal
contribution procedure.  I think my PR will have to sit there as a form
of incompatibility notice, and someone else will need to pick up the
patch and advocate for its incorporation.  I also have a serious
handicap in that I can't test my patch; I don't run Illumos.  (Plan 9
from User Space makes it easy to test _in situ_.)

I don't blame them for having a lot of process; their concerns are
surely more with sexy but delicate, high-stakes stuff like ZFS and
DTrace.  Not post-1989 developments in troff.

> That reduces my converns mostly to commercial UNIXes and potentially
> to a few ad-hoc conversion tools we are not even aware of.
> Consequently, the converns aren't 100% resolved yet but getting
> closer to becoming theoretical concerns.  If it's only commercial
> UNIXes and unknown tools that may break, the improved typesetting
> quality may be worth the risk.

And we don't know how many, if any, of those are even _maintained_, so
even if the knowledge of what to patch were available, the will may be
lacking.

I'll take my easy win and move on to the next problem.  :D

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 14:49                     ` DJ Chase
  2022-08-14 16:32                       ` Alejandro Colomar
@ 2022-08-14 22:35                       ` G. Branden Robinson
  2022-08-14 22:58                         ` DJ Chase
  1 sibling, 1 reply; 25+ messages in thread
From: G. Branden Robinson @ 2022-08-14 22:35 UTC (permalink / raw)
  To: DJ Chase; +Cc: Ingo Schwarze, Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 2183 bytes --]

At 2022-08-14T14:49:10+0000, DJ Chase wrote:
> On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote:
> > DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:
> >
> > > Have we ever considered a de jure *roff standard?
> >
> > No, i think that would be pure madness given the amount of working
> > time available in any of the roff projects.

Mark your calendars--Ingo and I are in substantial agreement.  ;-)

> This is very sad to hear.

I think the take-away here is that the decision to formally standardize
a technology, like many things, is an economic one.  There are costs and
benefits.  Being seduced by the benefits without a full understanding of
the costs often leads to remorse.  (And, in many domains, fat
commissions for sales personnel.)

> That’s probably because *I* massively overrate the importance of
> standardization (I mean I literally carry a standards binder with me).
> Still, though, it’s rather annoying that end users — especially
> programmers — don’t value standards as much.

I think it is less that programmers value standards in the wrong amount,
than that they disregard them for the wrong reasons--like "moving fast"
and building fragile solutions that will cost more on the back end after
higher-paid decision makers have moved on to greener pastures.

Nothing succeeds like handing your successor a trash fire.

> Would an informal de jure standard

You just defined "de facto standard".  ;-)

"De jure" is Latin for "of the law".  If something is not codified in
"law", or a normative document like a formal standard, then what is
"standard" is simply the intersection of prevailing practices.

> be of any use? Like how TOML just has a specification, but it’s
> somewhat usable as a standard because it’s been pretty stable and
> because it’s written clearly enough.

A purely descriptive document, mainly comprising a matrix of features
with escape sequence, request, and predefined register names on one axis
and the names of implementations on the other, with version numbers and
commentary populating the elements, could be a useful thing to have.

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 22:35                       ` G. Branden Robinson
@ 2022-08-14 22:58                         ` DJ Chase
  0 siblings, 0 replies; 25+ messages in thread
From: DJ Chase @ 2022-08-14 22:58 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Ingo Schwarze, Alejandro Colomar, linux-man, groff

On Sun Aug 14, 2022 at 6:35 PM EDT, G. Branden Robinson wrote:
> At 2022-08-14T14:49:10+0000, DJ Chase wrote:
> > On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote:
> > > DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:
> > >
> > > > Have we ever considered a de jure *roff standard?
> > >
> > > No, i think that would be pure madness given the amount of working
> > > time available in any of the roff projects.
>
> Mark your calendars--Ingo and I are in substantial agreement.  ;-)
>
> > This is very sad to hear.
>
> I think the take-away here is that the decision to formally standardize
> a technology, like many things, is an economic one.  There are costs and
> benefits.  Being seduced by the benefits without a full understanding of
> the costs often leads to remorse.  (And, in many domains, fat
> commissions for sales personnel.)
>
> > That’s probably because *I* massively overrate the importance of
> > standardization (I mean I literally carry a standards binder with me).
> > Still, though, it’s rather annoying that end users — especially
> > programmers — don’t value standards as much.
>
> I think it is less that programmers value standards in the wrong amount,
> than that they disregard them for the wrong reasons--like "moving fast"
> and building fragile solutions that will cost more on the back end after
> higher-paid decision makers have moved on to greener pastures.
>
> Nothing succeeds like handing your successor a trash fire.
>
> > Would an informal de jure standard
>
> You just defined "de facto standard".  ;-)
>
> "De jure" is Latin for "of the law".  If something is not codified in
> "law", or a normative document like a formal standard, then what is
> "standard" is simply the intersection of prevailing practices.

By “informal de jure”, I meant ‘de jure, but written in an informal
manner’.

> > be of any use? Like how TOML just has a specification, but it’s
> > somewhat usable as a standard because it’s been pretty stable and
> > because it’s written clearly enough.
>
> A purely descriptive document, mainly comprising a matrix of features
> with escape sequence, request, and predefined register names on one axis
> and the names of implementations on the other, with version numbers and
> commentary populating the elements, could be a useful thing to have.

I’m on it (except not really, because we’re in the middle of a move,
school resumes shortly, and etc. But eventually™, I’m on it).

Cheers,
-- 
DJ Chase
They, Them, Theirs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 13:56                   ` Standardize roff (was: *roff `\~` support) Ingo Schwarze
  2022-08-14 14:49                     ` DJ Chase
@ 2022-08-15  0:20                     ` Sam Varshavchik
  2022-08-16 12:52                       ` Standardize roff Ingo Schwarze
  1 sibling, 1 reply; 25+ messages in thread
From: Sam Varshavchik @ 2022-08-15  0:20 UTC (permalink / raw)
  To: Ingo Schwarze
  Cc: DJ Chase, g.branden.robinson, Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]

Ingo Schwarze writes:

> Hi,
>
> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:
>
> > Have we ever considered a de jure *roff standard?
>
> No, i think that would be pure madness given the amount of working
> time available in any of the roff projects.
>
> I expect the amount of effort required to be significantly larger
> than the amount of effort that would be required for rewriting
> the entire groff documentation from scratch because:

I tinkered with something like this some years ago, but I took a slightly  
different approach.

I converted man pages from 'roff source to Docbook XML using a … pretty  
large Perl script.

Once a year, or so, when I have nothing better to do I pull the current man  
page tarball and reconvert it. I usually need to tinker the Perl script,  
here and there, each time.

The Docbook folks provide a stylesheet that converts Docbook XML back to  
'roff. The end result you get is standardized 'roff, whatever that means.

But, yes, the effort require to clean up and standardize the formatting
of man pages would be mammoth. There's more inconsistency across the
various man pages, from various sources, than consistency.


[-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-14 19:43                         ` DJ Chase
@ 2022-08-15 11:59                           ` Alejandro Colomar
  2022-08-16 11:48                             ` Ingo Schwarze
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-08-15 11:59 UTC (permalink / raw)
  To: DJ Chase; +Cc: g.branden.robinson, linux-man, groff, Ingo Schwarze


[-- Attachment #1.1: Type: text/plain, Size: 2506 bytes --]

Hi,

On 8/14/22 21:43, DJ Chase wrote:
> True; prescriptive standards can certainly make some things worse. As a
> further example, ISO 8601 sucks. I mean, its core specification is
> great, but there are so many different ways that are allowed that the
> full standard is almost completely unparseable. It also uses a slash
> between the start and end times of a period instead of something
> sensible, like, I don’t know, an en-dash! Which means that periods can
> be written with a slash (because that’s the standard) but also with an
> en-dash (because that’s how ranges work in English), but also that one
> can’t properly write a period in a file name or URI.
> 
> Still, though, I think descriptive standards can be net-positive. The
> POSIX shell utilities comes to mind. Sure, they certainly have some
> issues, but because it’s a trailing standard, implementers are free to
> fix them.
> 
> Do you think that a descriptive/trailing standard could be beneficial
> or would you still say that it could mostly hinder *roff
> implementations?

Well, a standard that truly recognizes the authority of implementations 
to drive the language and doesn't do anything else but describe the best 
already-implemented ways to achive things is a good thing.  It can't 
hinder future implementations, because it doesn't have the power to 
drive the future of the language, only describes the past.

POSIX C has been doing good in that; much better than ISO C.

I don't understand how POSIX works internally, though.  If some entity 
can fund (and is interested in) such a standardization process, it could 
bring some good.  But yeah, it will likely be very costly in time and 
money.  Worth it?  I don't know.

But we can achieve something very similar by documenting the differences 
between known roff alternatives somewhere.  And that's likely to be much 
easier.

In the Linux man-pages we document when a function is in ISO C or in 
POSIX, but also when it's not standardized but present in other Unix 
systems (so that it has some degree of portability), or when it is 
Linux-only.  Maybe having something similar in groff's manual pages 
would be effective.

For example, for .MR, we were discussing that probably it would be good 
to add a note like "(since groff 1.23.0)" and maybe it could also state 
which other roff (or mandoc) implementations support it.

Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff (was: *roff `\~` support)
  2022-08-15 11:59                           ` Alejandro Colomar
@ 2022-08-16 11:48                             ` Ingo Schwarze
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Schwarze @ 2022-08-16 11:48 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: DJ Chase, g.branden.robinson, linux-man, groff

Hi Alejandro,

Alejandro Colomar wrote on Mon, Aug 15, 2022 at 01:59:24PM +0200:
> On 8/14/22 21:43, DJ Chase wrote:

>> Do you think that a descriptive/trailing standard could be beneficial
>> or would you still say that it could mostly hinder *roff
>> implementations?

When prepared with diligence and without falling for featurism,
it might be useful because the common subset of the major roff
implementations is large enough that it would likely be possibly
prepare portable roff documents following such a standard.

However, such a standard could likely *not* include *any* of the
best features of any of the implementations: yes, implementations
have diverged that much - not quite as bad as make(1), but still more
than many other classical Unix programs.  Consequently, only authors
with modest needs could possibly consider adhering to the standard.
To provide some striking examples, the standard could include neither
the mom(7) macro set - which is a killer feature of groff - nor the
mdoc(7) macro set - which has been an important feature of groff for
more than 30 years and of mandoc for more than 10 years.

This is all theoretical though - as i explained, the effort required
for developing such a (necessarily seriously stunted) standard is
prohibitive.

[...]
> But we can achieve something very similar by documenting the differences 
> between known roff alternatives somewhere.  And that's likely to be much 
> easier.

That's a much lower bar than a standard, but don't underestimate
the effort involved even in that.

A few very small parts of that already exist.

For example,

  https://mandoc.bsd.lv/man/man.options.1.html

documents command line options of some roff(1) and man(1)
implementations, mostly intended for people who see themselves
forced to invent a new command line option - which should of course
be avoided if at all possible because the tangle of existing options
is already terrifying.

For example,

  https://man.openbsd.org/roff.7

documents roff requests and roff escape sequences; search for
"extension" in that page.  Even though this page focusses on groff,
Heirloom, and mandoc and does not mention Plan 9, neatroff, or other
implementations, the amount of compatibility information scattered
around that page is already larger than what would seem healthy for
most user-facing documentation.  It's OK here because this page is
geared more towards developers than towards users.
Also, note that this page is already very long even though it is
extremely terse - so terse that it is insufficient for learning
how to use most of the features mentioned.

> In the Linux man-pages we document when a function is in ISO C or in 
> POSIX, but also when it's not standardized but present in other Unix 
> systems (so that it has some degree of portability), or when it is 
> Linux-only.  Maybe having something similar in groff's manual pages 
> would be effective.

Except that the bulk, and in particular the core, of groff functionality
is *not* described in manual pages in the first place.  Would you
want to litter groff.texi with compatibility information throughout?
That would likely cause a significant increase in size, almost certainly
a very signifant decrease in maintainability, and possibly it might also
somewhat decrease readability.

> For example, for .MR, we were discussing that probably it would be good 
> to add a note like "(since groff 1.23.0)" and maybe it could also state 
> which other roff (or mandoc) implementations support it.

But that feels like an exception rather than the rule.  It seems
warranted for this particular case because we are introducing a
new feature without consideration for compatibility that will cause
information loss for end-users unless something unusual is done
about it.  Hopefully, we are not going to turn that vice into a habit.

The particular case of .MR is somewhat specific to manual pages, too.
If people prepare a typeset document using many advanced features with
groff or Heirloom, they are used to the fact that it won't work with
the other, nor with Plan 9.  That's not a major problem because most of
the time, the author is the only person who really needs to typeset a
document.  Nowadays, the average reader will only read the PDF version,
which is totally different from the situation with manual pages.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff
  2022-08-15  0:20                     ` Sam Varshavchik
@ 2022-08-16 12:52                       ` Ingo Schwarze
  2022-08-16 23:46                         ` Sam Varshavchik
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Schwarze @ 2022-08-16 12:52 UTC (permalink / raw)
  To: Sam Varshavchik
  Cc: DJ Chase, g.branden.robinson, Alejandro Colomar, linux-man, groff

Hi San,

Sam Varshavchik wrote on Sun, Aug 14, 2022 at 08:20:34PM -0400:
> Ingo Schwarze writes:
>> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:

>>> Have we ever considered a de jure *roff standard?

>> No, i think that would be pure madness given the amount of working
>> time available in any of the roff projects.

> I tinkered with something like this some years ago, but I took a slightly  
> different approach.
> 
> I converted man pages

What kind of manual pages?

> from 'roff source to Docbook XML using a … pretty large Perl script.

That sounds very foolish on several levels.

First, and most obviously, you seem to be duplicating esr@'s work
on doclifter:

  http://www.catb.org/~esr/doclifter/
  https://gitlab.com/esr/doclifter/-/blob/master/doclifter

Second, quick and dirty Perl-style parsing is usually not good
enough to parse roff code, and a huge script is not particularly
good for readability and maintainability.

Yes, i know the same resevations would apply to esr@'s work,
which is a giant Python 3 script.  But at least there is some
evidence that his work was able to find significant numbers of
real issues in real manual pages.

> Once a year, or so, when I have nothing better to do I pull the current
> man  page tarball and reconvert it. I usually need to tinker the Perl
> script, here and there, each time.
> 
> The Docbook folks provide a stylesheet that converts Docbook XML
> back to 'roff.

Yikes.  That thing is by far the worst man(7) code generator existing
on this planet.  If at all possible, you should avoid that toolchain
like the plague.

It is so bad that for years, bogus reports caused by that totally
broken toolchain have caused the majority of invalid mandoc bug
reports.

> The end result you get is standardized 'roff, whatever that means.

Absolutely not.  The result is utter crap.  It is rarely even
syntactically valid, let alone reasonable style.

> But, yes, the effort require to clean up and standardize the formatting
> of man pages would be mammoth. There's more inconsistency across the
> various man pages, from various sources, than consistency.

That isn't completely untrue, but all the same, mandoc copes well
enough with more than 95% of valid real-world manual pages, and groff
with 100%.  In a nutshell, the only stuff that breaks with groff
is manual pages that are completely invalid, usually coming from
the official DocBook XML toolchain, and in rarer cases coming from
other broken man(7) generators.

All this is barely related to the question of standardizing roff(7),
though.  Roff is much more than manual pages.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Standardize roff
  2022-08-16 12:52                       ` Standardize roff Ingo Schwarze
@ 2022-08-16 23:46                         ` Sam Varshavchik
  0 siblings, 0 replies; 25+ messages in thread
From: Sam Varshavchik @ 2022-08-16 23:46 UTC (permalink / raw)
  To: Ingo Schwarze
  Cc: DJ Chase, g.branden.robinson, Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]

Ingo Schwarze writes:

> Hi San,
>
> Sam Varshavchik wrote on Sun, Aug 14, 2022 at 08:20:34PM -0400:
> > Ingo Schwarze writes:
> >> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000:
>
> >>> Have we ever considered a de jure *roff standard?
>
> >> No, i think that would be pure madness given the amount of working
> >> time available in any of the roff projects.
>
> > I tinkered with something like this some years ago, but I took a slightly
> > different approach.
> >
> > I converted man pages
>
> What kind of manual pages?

The ones that are the subject of discussions on linux-man@vger.kernel.org.

> > from 'roff source to Docbook XML using a … pretty large Perl script.
>
> That sounds very foolish on several levels.

Well, I had some free time the other day, and had nothing better to do.

> First, and most obviously, you seem to be duplicating esr@'s work
> on doclifter:
>
>   http://www.catb.org/~esr/doclifter/
>   https://gitlab.com/esr/doclifter/-/blob/master/doclifter

Seems so, except that I tailored my logic to man pages, and specifically to  
the linux-man@vger.kernel.org manpages.

> Second, quick and dirty Perl-style parsing is usually not good
> enough to parse roff code, and a huge script is not particularly
> good for readability and maintainability.

Yes, arbitrary roff code will not fly very far. But something that's  
tailored can produce productive results.

> Yes, i know the same resevations would apply to esr@'s work,
> which is a giant Python 3 script.  But at least there is some
> evidence that his work was able to find significant numbers of
> real issues in real manual pages.

Yes, there are plenty of issues there. I fed quite a few patches to Mr.  
Kerrisk when he maintained them, based on my scripts chewing through them.  
There were plenty of mismatched .nf/.fi, and other things of that sort.


> > Once a year, or so, when I have nothing better to do I pull the current
> > man  page tarball and reconvert it. I usually need to tinker the Perl
> > script, here and there, each time.
> >
> > The Docbook folks provide a stylesheet that converts Docbook XML
> > back to 'roff.
>
> Yikes.  That thing is by far the worst man(7) code generator existing
> on this planet.  If at all possible, you should avoid that toolchain
> like the plague.

I do not view it as an authoritative source of man sources, but more of  
backwards compatibility. I believe that for man pages, roff should've been  
replaced by Docbook XML a long time ago.

That was really the original impetus for my Perl hacking: to see how  
feasible it would be to convert the existing man pages to Docbook XML. My  
end result showed that at least that it was doable; and I think that the  
Docbook XML stylesheet for man pages would've been an acceptable way to get  
some roff source generated from Docbook XML that's shown by the man command.

> > The end result you get is standardized 'roff, whatever that means.
>
> Absolutely not.  The result is utter crap.  It is rarely even
> syntactically valid, let alone reasonable style.

I should've used "consistent" instead of "standardized". Different man pages  
from different sources use different ways of rendering the same content,  
i.e. function names. Sometimes it's in bold. Sometimes it's in italic.  
Sometimes it's something else. With consistent semantic markup a <function>  
in every man page would've produced the same markup in the generated roff  
source.



[-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-08-16 23:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 11:45 [PATCH 4/6] xattr.7: wfix Štěpán Němec
2022-07-29 20:58 ` G. Branden Robinson
2022-07-30 14:15   ` Štěpán Němec
2022-07-30 17:53     ` Alejandro Colomar (man-pages)
2022-07-30 17:59       ` Alejandro Colomar (man-pages)
2022-08-01 13:28       ` Alejandro Colomar
2022-08-11 12:48         ` Ingo Schwarze
2022-08-11 20:17           ` G. Branden Robinson
2022-08-12 14:30             ` Ingo Schwarze
2022-08-12 22:10               ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson
2022-08-13  4:23                 ` G. Branden Robinson
2022-08-14 14:15                   ` Ingo Schwarze
2022-08-14 22:21                     ` G. Branden Robinson
2022-08-13 17:27                 ` DJ Chase
2022-08-14 13:56                   ` Standardize roff (was: *roff `\~` support) Ingo Schwarze
2022-08-14 14:49                     ` DJ Chase
2022-08-14 16:32                       ` Alejandro Colomar
2022-08-14 19:43                         ` DJ Chase
2022-08-15 11:59                           ` Alejandro Colomar
2022-08-16 11:48                             ` Ingo Schwarze
2022-08-14 22:35                       ` G. Branden Robinson
2022-08-14 22:58                         ` DJ Chase
2022-08-15  0:20                     ` Sam Varshavchik
2022-08-16 12:52                       ` Standardize roff Ingo Schwarze
2022-08-16 23:46                         ` Sam Varshavchik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.