* [PATCH 4/6] xattr.7: wfix @ 2022-07-29 11:45 Štěpán Němec 2022-07-29 20:58 ` G. Branden Robinson 0 siblings, 1 reply; 25+ messages in thread From: Štěpán Němec @ 2022-07-29 11:45 UTC (permalink / raw) To: linux-man, Alejandro Colomar, Michael Kerrisk (My original intention was to just fix the grammar ("an attribute names is"), but, on second thought, the whole sentence didn't read very well.) Signed-off-by: Štěpán Němec <stepnem@gmail.com> --- man7/xattr.7 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/man7/xattr.7 b/man7/xattr.7 index 4a69e2eb53e8..45a103fad4cc 100644 --- a/man7/xattr.7 +++ b/man7/xattr.7 @@ -119,8 +119,8 @@ manual page for an explanation of the sticky bit). .SS Filesystem differences The kernel and the filesystem may place limits on the maximum number and size of extended attributes that can be associated with a file. -The VFS imposes limitations that an attribute names is limited to 255 bytes -and an attribute value is limited to 64\ kB. +The VFS-imposed limits on attribute names and values are 255 bytes +and 64\ kB, respectively. The list of attribute names that can be returned is also limited to 64\ kB (see BUGS in -- 2.37.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-07-29 11:45 [PATCH 4/6] xattr.7: wfix Štěpán Němec @ 2022-07-29 20:58 ` G. Branden Robinson 2022-07-30 14:15 ` Štěpán Němec 0 siblings, 1 reply; 25+ messages in thread From: G. Branden Robinson @ 2022-07-29 20:58 UTC (permalink / raw) To: Štěpán Němec Cc: linux-man, Alejandro Colomar, Michael Kerrisk [-- Attachment #1: Type: text/plain, Size: 1041 bytes --] Hi Štěpán, At 2022-07-29T13:45:04+0200, Štěpán Němec wrote: > (My original intention was to just fix the grammar ("an attribute > names is"), but, on second thought, the whole sentence didn't read > very well.) [...] > -The VFS imposes limitations that an attribute names is limited to 255 bytes > -and an attribute value is limited to 64\ kB. > +The VFS-imposed limits on attribute names and values are 255 bytes > +and 64\ kB, respectively. While you're tidying this up, I would convert the `\ ` escape sequence to `\~`. Both are non-breaking spaces, but the latter is adjustable. groff_man(7) from groff 1.22.4 says: \~ Adjustable, non-breaking space character. Use this escape to prevent a break inside a short phrase or between a numerical quantity and its corresponding unit(s). Before starting the motor, set the output speed to\~1. There are 1,024\~bytes in 1\~kiB. CSTR\~#8 documents the B language. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-07-29 20:58 ` G. Branden Robinson @ 2022-07-30 14:15 ` Štěpán Němec 2022-07-30 17:53 ` Alejandro Colomar (man-pages) 0 siblings, 1 reply; 25+ messages in thread From: Štěpán Němec @ 2022-07-30 14:15 UTC (permalink / raw) To: G. Branden Robinson; +Cc: linux-man, Alejandro Colomar, Michael Kerrisk Hello Branden, On Fri, 29 Jul 2022 15:58:23 -0500 G. Branden Robinson wrote: >> -The VFS imposes limitations that an attribute names is limited to 255 bytes >> -and an attribute value is limited to 64\ kB. >> +The VFS-imposed limits on attribute names and values are 255 bytes >> +and 64\ kB, respectively. > > While you're tidying this up, I would convert the `\ ` escape sequence > to `\~`. Both are non-breaking spaces, but the latter is adjustable. > > groff_man(7) from groff 1.22.4 says: > > \~ Adjustable, non-breaking space character. Use this escape to > prevent a break inside a short phrase or between a numerical > quantity and its corresponding unit(s). > > Before starting the motor, set the output speed to\~1. > There are 1,024\~bytes in 1\~kiB. > CSTR\~#8 documents the B language. Thank you for the review! I think I disagree: IMO a number+unit should be treated as a single entity both semantically/logically and typographically (at least as far as space stretching goes), i.e., say (if I understand the effect of '\ ' and '\~' right), 255 bytes and 64 kB, respectively. would make a bit more sense to me than 255 bytes and 64 kB, respectively. Current Linux man-pages usage doesn't appear quite consistent, but '\ ' prevails over '\~' (about 6:1), and my cursory grep found only one instance of '\~' used between a number and its unit (vs. many instances of '\ ' in that context). In view of the above, failing any instruction from a man-pages maintainer to the contrary, I'd prefer leaving this as is. With best wishes, Štěpán ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-07-30 14:15 ` Štěpán Němec @ 2022-07-30 17:53 ` Alejandro Colomar (man-pages) 2022-07-30 17:59 ` Alejandro Colomar (man-pages) 2022-08-01 13:28 ` Alejandro Colomar 0 siblings, 2 replies; 25+ messages in thread From: Alejandro Colomar (man-pages) @ 2022-07-30 17:53 UTC (permalink / raw) To: Štěpán Němec, G. Branden Robinson Cc: linux-man, Michael Kerrisk Hi Štěpán and Branden! On 7/30/22 16:15, Štěpán Němec wrote: > > Hello Branden, > > On Fri, 29 Jul 2022 15:58:23 -0500 > G. Branden Robinson wrote: > >>> -The VFS imposes limitations that an attribute names is limited to 255 bytes >>> -and an attribute value is limited to 64\ kB. >>> +The VFS-imposed limits on attribute names and values are 255 bytes >>> +and 64\ kB, respectively. >> >> While you're tidying this up, I would convert the `\ ` escape sequence >> to `\~`. Both are non-breaking spaces, but the latter is adjustable. >> >> groff_man(7) from groff 1.22.4 says: >> >> \~ Adjustable, non-breaking space character. Use this escape to >> prevent a break inside a short phrase or between a numerical >> quantity and its corresponding unit(s). >> >> Before starting the motor, set the output speed to\~1. >> There are 1,024\~bytes in 1\~kiB. >> CSTR\~#8 documents the B language. > > Thank you for the review! > > I think I disagree: IMO a number+unit should be treated as a single > entity both semantically/logically and typographically (at least as far > as space stretching goes), i.e., say (if I understand the effect of '\ ' > and '\~' right), > > 255 bytes and 64 kB, respectively. > > would make a bit more sense to me than > > 255 bytes and 64 kB, respectively. > > Current Linux man-pages usage doesn't appear quite consistent, but '\ ' > prevails over '\~' (about 6:1), and my cursory grep found only one > instance of '\~' used between a number and its unit Would you mind sensing a patch for that one between the number and its unit? > (vs. many instances > of '\ ' in that context). That is just a matter of writers not knowing the existence of \~ ('\ ' was documented in man-pages(7), but \~ wasn't). I wouldn't give much more importance to existing practice in this regard. When I read this email I had no strong opinion; both variants made sense to me. So I did some investigation, to see if the SI already specifies something about it; and it does: <https://www.bipm.org/en/publications/si-brochure/>: [ 5.2 Unit symbols Unit symbols are printed in upright type regardless of the type used in the surrounding text. They are printed in lower-case letters unless they are derived from a proper name, in which case the first letter is a capital letter. An exception, adopted by the 16th CGPM (1979, Resolution 6), is that either capital L or lower-case l is allowed for the litre, in order to avoid possible confusion between the numeral 1 (one) and the lower-case letter l (el). A multiple or sub-multiple prefix, if used, is part of the unit and precedes the unit symbol without a separator. A prefix is never used in isolation and compound prefixes are never used. Unit symbols are mathematical entities and not abbreviations. Therefore, they are not followed by a period except at the end of a sentence, and one must neither use the plural nor mix unit symbols and unit names within one expression, since names are not mathematical entities. In forming products and quotients of unit symbols the normal rules of algebraic multiplication or division apply. Multiplication must be indicated by a space or a half-high (centred) dot (⋅), since otherwise some prefixes could be misinterpreted as a unit symbol. Division is indicated by a horizontal line, by a solidus (oblique stroke, /) or by negative exponents. When several unit symbols are combined, care should be taken to avoid ambiguities, for example by using brackets or negative exponents. A solidus must not be used more than once in a given expression without brackets to remove ambiguities. It is not permissible to use abbreviations for unit symbols or unit names, such as sec (for either s or second), sq. mm (for either mm2 or square millimetre), cc (for either cm3 or cubic centimetre), or mps (for either m/s or metre per second). The use of the correct symbols for SI units, and for units in general, as listed in earlier chapters of this broch ure, is mandatory. In this way ambiguities and misunderstandings in the values of quantities are avoided. ] [ 5.4.3 Formatting the value of a quantity The numerical value always precedes the unit and a space is always used to separate the unit from the number. Thus the value of the quantity is the product of the number and the unit. The space between the number and the unit is regarded as a multiplication sign (just as a space between units implies multiplication). The only exceptions to this rule are for the unit symbols for degree, minute and second for plane angle, °, ′ and ′′, respectively, for which no space is left between the numerical value and the unit symbol. This rule means that the symbol °C for the degree Celsius is preceded by a space when one expresses values of Celsius temperature t. Even when the value of a quantity is used as an adjective, a space is left between the numerical value and the unit symbol. Only when the name of the unit is spelled out would the ordinary rules of grammar apply, so that in English a hyphen would be used to separate the number from the unit. In any expression, only one unit is used. An exception to this rule is in expressing the values of time and of plane angles using non-SI units. However, for plane angles it is generally preferable to divide the degree decimally. It is therefore preferable to write 22.20° rather than 22° 12′, except in fields such as navigation, cartography, astronomy, and in the measurement of very small angles. ] Sorry for copying the full text, but I preferred to give enough context. So, from the SI text quoted above, the space is not a word separator in that context (it is for example not allowed to hyphenate between the value and the unit even if it acts as an adjective; the SI disables normal language rules). It is instead a mathematical symbol denoting multiplication, and the whole value+unit is a single mathematical expression; to me, that is better denoted with a single space, rather than an adjustable one. Therefore, I'd say that it makes more sense in this case to use '\ '. > > In view of the above, failing any instruction from a man-pages > maintainer to the contrary, I'd prefer leaving this as is. In the general case, I prefer \~, but for value+unit I prefer '\ '. Thank you both! > > With best wishes, > > Štěpán Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-07-30 17:53 ` Alejandro Colomar (man-pages) @ 2022-07-30 17:59 ` Alejandro Colomar (man-pages) 2022-08-01 13:28 ` Alejandro Colomar 1 sibling, 0 replies; 25+ messages in thread From: Alejandro Colomar (man-pages) @ 2022-07-30 17:59 UTC (permalink / raw) To: Štěpán Němec, G. Branden Robinson Cc: linux-man, Michael Kerrisk On 7/30/22 19:53, Alejandro Colomar (man-pages) wrote: > > Even when the value of a quantity is used as an adjective, a space is > left between the numerical value and the unit symbol. Only when the > name of the unit is spelled out would the ordinary rules of grammar > apply, so that in English a hyphen would be used to separate the number > from the unit. Although, I missed this small paragraph. According to that, it would be 255\~bytes but 64\ kB. -- Alejandro Colomar Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-07-30 17:53 ` Alejandro Colomar (man-pages) 2022-07-30 17:59 ` Alejandro Colomar (man-pages) @ 2022-08-01 13:28 ` Alejandro Colomar 2022-08-11 12:48 ` Ingo Schwarze 1 sibling, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-08-01 13:28 UTC (permalink / raw) To: G. Branden Robinson; +Cc: linux-man, groff [-- Attachment #1.1: Type: text/plain, Size: 9377 bytes --] [ CC -= Štěpán: I don't think he's interested in a deep discussion about use of \~ and '\ ' in man pages CC -= mtk: He's already subscribed to the list, and quite silent these days] CC += groff@: probably people there are interested in this discussion ] Hi Branden, On 7/30/22 19:53, Alejandro Colomar (man-pages) wrote: > Hi Štěpán and Branden! > > On 7/30/22 16:15, Štěpán Němec wrote: >> >> Hello Branden, >> >> On Fri, 29 Jul 2022 15:58:23 -0500 >> G. Branden Robinson wrote: >> >>>> -The VFS imposes limitations that an attribute names is limited to >>>> 255 bytes >>>> -and an attribute value is limited to 64\ kB. >>>> +The VFS-imposed limits on attribute names and values are 255 bytes >>>> +and 64\ kB, respectively. >>> >>> While you're tidying this up, I would convert the `\ ` escape sequence >>> to `\~`. Both are non-breaking spaces, but the latter is adjustable. >>> >>> groff_man(7) from groff 1.22.4 says: >>> >>> \~ Adjustable, non-breaking space character. Use this >>> escape to >>> prevent a break inside a short phrase or between a >>> numerical >>> quantity and its corresponding unit(s). >>> >>> Before starting the motor, set the output speed to\~1. >>> There are 1,024\~bytes in 1\~kiB. >>> CSTR\~#8 documents the B language. >> >> Thank you for the review! >> >> I think I disagree: IMO a number+unit should be treated as a single >> entity both semantically/logically and typographically (at least as far >> as space stretching goes), i.e., say (if I understand the effect of '\ ' >> and '\~' right), >> >> 255 bytes and 64 kB, >> respectively. >> >> would make a bit more sense to me than >> >> 255 bytes and 64 kB, >> respectively. >> >> Current Linux man-pages usage doesn't appear quite consistent, but '\ ' >> prevails over '\~' (about 6:1), and my cursory grep found only one >> instance of '\~' used between a number and its unit > > Would you mind sensing a patch for that one between the number and its > unit? > >> (vs. many instances >> of '\ ' in that context). > > That is just a matter of writers not knowing the existence of \~ ('\ ' > was documented in man-pages(7), but \~ wasn't). I wouldn't give much > more importance to existing practice in this regard. > > When I read this email I had no strong opinion; both variants made sense > to me. So I did some investigation, to see if the SI already specifies > something about it; and it does: > > <https://www.bipm.org/en/publications/si-brochure/>: > > [ > 5.2 Unit symbols > > Unit symbols are printed in upright type regardless of the type used in > the surrounding text. They are printed in lower-case letters unless > they are derived from a proper name, in which case the first letter is a > capital letter. > > An exception, adopted by the 16th CGPM (1979, Resolution 6), is that > either capital L or lower-case l is allowed for the litre, in order to > avoid possible confusion between the numeral 1 (one) and the lower-case > letter l (el). > > A multiple or sub-multiple prefix, if used, is part of the unit and > precedes the unit symbol without a separator. A prefix is never used in > isolation and compound prefixes are never used. > > Unit symbols are mathematical entities and not abbreviations. Therefore, > they are not followed by a period except at the end of a sentence, and > one must neither use the plural nor mix unit symbols and unit names > within one expression, since names are not mathematical entities. > > In forming products and quotients of unit symbols the normal rules of > algebraic multiplication or division apply. Multiplication must be > indicated by a space or a half-high (centred) dot (⋅), since otherwise > some prefixes could be misinterpreted as a unit symbol. Division is > indicated by a horizontal line, by a solidus (oblique stroke, /) or by > negative exponents. When several unit symbols are combined, care should > be taken to avoid ambiguities, for example by using brackets or negative > exponents. A solidus must not be used more than once in a given > expression without brackets to remove ambiguities. > > It is not permissible to use abbreviations for unit symbols or unit > names, such as sec (for either s or second), sq. mm (for either mm2 or > square millimetre), cc (for either cm3 or cubic centimetre), or mps (for > either m/s or metre per second). The use of the correct symbols for > SI units, and for units in general, as listed in earlier chapters of > this broch ure, is mandatory. In this way ambiguities and > misunderstandings in the values of quantities are avoided. > ] > > [ > 5.4.3 Formatting the value of a quantity > > The numerical value always precedes the unit and a space is always used > to separate the unit from the number. Thus the value of the quantity is > the product of the number and the unit. The space between the number > and the unit is regarded as a multiplication sign (just as a space > between units implies multiplication). The only exceptions to this rule > are for the unit symbols for degree, minute and second for plane > angle, °, ′ and ′′, respectively, for which no space is left between the > numerical value and the unit symbol. > > This rule means that the symbol °C for the degree Celsius is preceded by > a space when one expresses values of Celsius temperature t. > > Even when the value of a quantity is used as an adjective, a space is > left between the numerical value and the unit symbol. Only when the > name of the unit is spelled out would the ordinary rules of grammar > apply, so that in English a hyphen would be used to separate the number > from the unit. > > In any expression, only one unit is used. An exception to this rule is > in expressing the values of time and of plane angles using non-SI units. > However, for plane angles it is generally preferable to divide the > degree decimally. It is therefore preferable to write 22.20° rather > than 22° 12′, except in fields such as navigation, cartography, > astronomy, and in the measurement of very small angles. > ] > > Sorry for copying the full text, but I preferred to give enough context. > > So, from the SI text quoted above, the space is not a word separator in > that context (it is for example not allowed to hyphenate between the > value and the unit even if it acts as an adjective; the SI disables > normal language rules). It is instead a mathematical symbol denoting > multiplication, and the whole value+unit is a single mathematical > expression; to me, that is better denoted with a single space, rather > than an adjustable one. > > Therefore, I'd say that it makes more sense in this case to use '\ '. > >> >> In view of the above, failing any instruction from a man-pages >> maintainer to the contrary, I'd prefer leaving this as is. > > In the general case, I prefer \~, but for value+unit I prefer '\ '. > Thank you both! > >> >> With best wishes, >> >> Štěpán > > Cheers, > > Alex > > On 7/30/22 19:59, Alejandro Colomar (man-pages) wrote: > On 7/30/22 19:53, Alejandro Colomar (man-pages) wrote: >> >> Even when the value of a quantity is used as an adjective, a space is >> left between the numerical value and the unit symbol. Only when the >> name of the unit is spelled out would the ordinary rules of grammar >> apply, so that in English a hyphen would be used to separate the >> number from the unit. > > Although, I missed this small paragraph. According to that, it would be > 255\~bytes but 64\ kB. I left the whole original conversation for groff@ users to read it without needing to go to linux-man@ archives. I'd like to arrive to some consensus on usage of \~ and '\ '. For things related to the SI, we should follow SI conventions (they developed them for a reason, and I don't see a strong reason to deviate). For things unrelated to the SI, we need to come up with some convention. I think mirroring what the SI does could be good. For example, for commands, I'd use non-adjustable spaces. For pointer types, I'd also use the non-adjustable space. For compound names such as 'RFC 1234', I'd say normal language rules apply, and the space should be adjustable. To be clear, I'll add some examples taken from the Linux man-pages (and some of them modified by me): .I "struct termios2\ *" .I (1\ <<\ oparg) .I unice\ =\ 20\ \-\ knice is filesystem dependent and is typically 16\ MiB. .I (uid_t)\ \-1 Enables RFC\~7413 Fast Open support. .I Power ISA, Book\~II - Section\~3.1 (Program Priority Registers) Before starting the motor, set the output speed to\~1. There are 1,024\~bytes in 1\ kiB. CSTR\~#8 documents the B language. What do you think? Cheers, Alex -- Alejandro Colomar <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-08-01 13:28 ` Alejandro Colomar @ 2022-08-11 12:48 ` Ingo Schwarze 2022-08-11 20:17 ` G. Branden Robinson 0 siblings, 1 reply; 25+ messages in thread From: Ingo Schwarze @ 2022-08-11 12:48 UTC (permalink / raw) To: Alejandro Colomar; +Cc: g.branden.robinson, linux-man, groff Hi Alejandro, Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200: > I'd like to arrive to some consensus on usage of \~ and '\ '. In manual pages, always use "\ " and never use "\~", period. The former is portable and the latter is a GNU extension. > What do you think? I think you are massively overthinking this and the whole SI argument is irrelevent for manual pages. While the above concern about robustness is minor, too (both groff and mandoc support \~), portability is still significantly more important than such minute typographical details. Yours, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-08-11 12:48 ` Ingo Schwarze @ 2022-08-11 20:17 ` G. Branden Robinson 2022-08-12 14:30 ` Ingo Schwarze 0 siblings, 1 reply; 25+ messages in thread From: G. Branden Robinson @ 2022-08-11 20:17 UTC (permalink / raw) To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 4482 bytes --] At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote: > Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200: > > I'd like to arrive to some consensus on usage of \~ and '\ '. > > In manual pages, always use "\ " and never use "\~", period. This is hugely overstated. > The former is portable and the latter is a GNU extension. ...that is over 30 years old and supported by Heirloom Doctools troff for 17 years now, neatroff for about six, and your mandoc for three. For full disclosure, I'll acknowledge that Documenter's Workbench [DWB] troff doesn't support it, but it doesn't seem to have been maintained for 30 years (Heirloom Doctools troff appears to be its descendant/successor). plan9port troff doesn't either, and its laudable introduction of a man(7) MR macro notwithstanding, its activity level is not high. I would pessimistically assume that most or all proprietary Unix troffs branched off from V7 Unix troff or early device-independent troff (maybe DWB 1.0 troff, ca. 1984 [?, 1]) lack support for `\~`.[2] I further note that groff has a long tradition of inclusion in BSD Unix,[3] and despite the efforts of the mdocml/mandoc project to supplant or dispose of it groff in BSD's descendant communities, the underlying fact remains. Giving up support for `\~` was therefore, in this sense, a regression, and one that took quite some time to address. > > What do you think? > > I think you are massively overthinking this and the whole SI > argument is irrelevent for manual pages. Man pages are technical writing and BIPM's recommendations in this area that Alejandro uncovered have prompted me to reconsider the style advice in groff_man_style(7) [from groff Git]. But you should welcome that. It would mean that a handful of uses of `\~` in the groff man pages would move to `\ `, which is motion in the direction you want anyway. In any event, the selection of `\ ` versus `\~`, assuming support for both and an understanding of their distinct meanings and effect on adjusted output, is a matter for a software project's documentation style guide. As I recall, mandoc does not even support "full justification" (alignment of text to both left and right margins, with inter-word spaces expanded ["adjusted"] to achieve this) in the first place and there are no plans to. mandoc can thus treat the two sequences as synonymous--but that doesn't mean the `\~` escape sequence is a gratuitous alias or deviation from the norm. It is a replacement for an arcane troff hack. .\" no trailing space or character translation target on the next line .tr ~ G.~W.~Pabst directed several films in the 1920s. > While the above concern about robustness is minor, too (both groff and > mandoc support \~), ...as do others, listed above... > portability is still significantly more important You are not quantifying anything. Come on, can we at least get a Fermi estimation of the installed bases of the respective troff implementations and mandoc? There are, I presume, still C compilers out there that don't accept ANSI C (1989) input. That doesn't, and shouldn't, stop the rest of the world from moving forward. > than such minute typographical details. For someone arguing from a standpoint of such slavish fidelity to 40 year-old practices, you seem to be selective in the way you do it. The Unix manual was always meant to be typeset. "The manual was intended to be typeset; some detail is sacrificed on terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_, Eighth Edition, Volume 1, February 1985) At the time that statement was written, the sentiment was some 12 years old; the Bell Labs CSRC typeset man pages as soon as it was possible for them to do so.[4] I understand if some man page contributors don't want to mess with aspects of typography that will appear only when formatting for output devices more sophisticated than terminal emulators--widow and orphan management can be tedious, for instance--but we shouldn't promulgate advice that makes the task of those who do--people like Alejandro and me--_harder_. Regards, Branden [1] https://archive.org/details/dwb-preprocessor-ref [2] https://github.com/n-t-roff/Solaris10-ditroff/blob/master/troff/n1.c#L797 [3] https://minnie.tuhs.org/cgi-bin/utree.pl?file=Net2/usr/src/usr.bin/groff/VERSION [4] https://dspinellis.github.io/unix-v4man/v4man.pdf [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] xattr.7: wfix 2022-08-11 20:17 ` G. Branden Robinson @ 2022-08-12 14:30 ` Ingo Schwarze 2022-08-12 22:10 ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson 0 siblings, 1 reply; 25+ messages in thread From: Ingo Schwarze @ 2022-08-12 14:30 UTC (permalink / raw) To: g.branden.robinson; +Cc: Alejandro Colomar, linux-man, groff Hi Branden, G. Branden Robinson wrote on Thu, Aug 11, 2022 at 03:17:14PM -0500: > At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote: >> Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200: >>> I'd like to arrive to some consensus on usage of \~ and '\ '. >> In manual pages, always use "\ " and never use "\~", period. > This is hugely overstated. >> The former is portable and the latter is a GNU extension. > ...that is over 30 years old and supported by Heirloom Doctools troff > for 17 years now, neatroff for about six, and your mandoc for three. Actually, mandoc supports \~ at least since Sep 17 2009: https://cvsweb.bsd.lv/mandoc/Attic/chars.in?rev=1.1&content-type=text/x-cvsweb-markup > For full disclosure, I'll acknowledge that Documenter's Workbench [DWB] > https://archive.org/details/dwb-preprocessor-ref > troff doesn't support it, but it doesn't seem to have been maintained > for 30 years (Heirloom Doctools troff appears to be its > descendant/successor). I agree that missing support in DWB is a weak argument. It is unlikely that many people use it for practical work. They would likely suffer from more serious problems than \~, too. > plan9port troff doesn't either, and its laudable introduction > of a man(7) MR macro notwithstanding, its activity level is > not high. There are people using Plan 9 for practical work though, they have even occasionally posted on the groff and mandoc lists, so that is a bit more of a problem. > I would pessimistically assume that most or all proprietary Unix > troffs branched off from V7 Unix troff or early device-independent troff > (maybe DWB 1.0 troff, ca. 1984 [?, 1]) lack support for `\~`. > https://github.com/n-t-roff/Solaris10-ditroff/blob/master/troff/n1.c#L797 That does sound likely. As an example, look at Oracle Solaris 11: > uname -a SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise > printf "a\\\\~b\n" | nroff | head -n 1 a~b > printf "a\\\\~b\n" | groff -T ascii | head -n 1 a b > I further note that groff has a long tradition of inclusion in BSD > Unix, https://minnie.tuhs.org/cgi-bin/utree.pl > ?file=Net2/usr/src/usr.bin/groff/VERSION Yes. Cynthia already considered dropping support for Kernighan's troff, but the CSRG vetoed that. Inclusion of groff wasn't controversial even at a time when groff didn't have its own version conrol yet. Consequently, you are right that \~ is unlikely to cause trouble on any BSD system. > and despite the efforts of the mdocml/mandoc project to > supplant or dispose of it groff in BSD's descendant communities, the > underlying fact remains. Giving up support for `\~` was therefore, in > this sense, a regression, and one that took quite some time to address. I don't think that anyone gave up support for \~. But we have evidence that some never implemented support for it. [...] > As I recall, mandoc does not even support "full justification" > (alignment of text to both left and right margins, with inter-word > spaces expanded ["adjusted"] to achieve this) in the first place and > there are no plans to. Correct. > mandoc can thus treat the two sequences as synonymous-- It does. Mandoc maps all of \ \~ \0 to U+00A0. > but that doesn't mean the `\~` escape sequence is a gratuitous alias > or deviation from the norm. No. It is useful for general-purpose typesetting, like many GNU extensions are. >> portability is still significantly more important > You are not quantifying anything. Come on, can we at least get a > Fermi estimation of the installed bases of the respective troff > implementations and mandoc? Frankly, i have no idea how to estimate the number of actively used installations of Plan 9, Solaris (any version), and possibly additional commercial systems like AIX and HP-UX, or how to check what the latter support. There might be more systems out there parsing manual pages (not necessarily full-featured roff(7) implementations like those you listed), but providing specific evidence of such systems would likely be my job to back up my advice. I'm not searching for them right now because we already have a few relevant examples. >> than such minute typographical details. > For someone arguing from a standpoint of such slavish fidelity to 40 > year-old practices, you seem to be selective in the way you do it. Admitted. Sometimes, i do see the value of new features, even when they are backward-incompatible. > The Unix manual was always meant to be typeset. > > "The manual was intended to be typeset; some detail is sacrificed on > terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_, > Eighth Edition, Volume 1, February 1985) > > At the time that statement was written, the sentiment was some 12 years > old; the Bell Labs CSRC typeset man pages as soon as it was possible for > them to do so.[4] > [4] https://dspinellis.github.io/unix-v4man/v4man.pdf > > I understand if some man page contributors don't want to mess with > aspects of typography that will appear only when formatting for output > devices more sophisticated than terminal emulators--widow and orphan > management can be tedious, for instance--but we shouldn't promulgate > advice that makes the task of those who do--people like Alejandro and > me--_harder_. Even authors might disagree which is more important: (1) The typograpical difference between "\~" and "\ " in PDF and PostScript output of manual pages. (2) Correctly rendering whitespace on Plan 9, Solaris, and likely some other systems *at all*, for any output mode. I suspect that many would prefer (2) - of course, that claim is hard to quantify. It would probably be good to arrive at a consensus recommendation for such cases because many manual page authors probably have little interest in judging such questions themselves. Consensus seems hard to reach though. So maybe the best we can do is to simply state the fact that \~ is still not supported by a few not very widely used, but still somewahat significant roff implementations like Plan 9 and Solaris, even though that forces authors to draw their own conclusion. Yours, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) 2022-08-12 14:30 ` Ingo Schwarze @ 2022-08-12 22:10 ` G. Branden Robinson 2022-08-13 4:23 ` G. Branden Robinson 2022-08-13 17:27 ` DJ Chase 0 siblings, 2 replies; 25+ messages in thread From: G. Branden Robinson @ 2022-08-12 22:10 UTC (permalink / raw) To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 7827 bytes --] Hi Ingo, At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote: > G. Branden Robinson wrote on Thu, Aug 11, 2022 at 03:17:14PM -0500: > > At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote: > >> Alejandro Colomar wrote on Mon, Aug 01, 2022 at 03:28:03PM +0200: > > >>> I'd like to arrive to some consensus on usage of \~ and '\ '. > > >> In manual pages, always use "\ " and never use "\~", period. > > > This is hugely overstated. > > >> The former is portable and the latter is a GNU extension. > > > ...that is over 30 years old and supported by Heirloom Doctools > > troff for 17 years now, neatroff for about six, and your mandoc for > > three. > > Actually, mandoc supports \~ at least since Sep 17 2009: > https://cvsweb.bsd.lv/mandoc/Attic/chars.in?rev=1.1&content-type=text/x-cvsweb-markup Whoops! I regret the error, and will update groff's Texinfo manual to correct this. > > plan9port troff doesn't either, and its laudable introduction > > of a man(7) MR macro notwithstanding, its activity level is > > not high. > > There are people using Plan 9 for practical work though, they have > even occasionally posted on the groff and mandoc lists, so that is a > bit more of a problem. I have no moral objection to submitting a patch; I don't know my way around the AT&T troff code base (which Plan 9 troff mostly is) nearly as well as groff, though, and, as ever, available time is scarce. But, if that's what it takes to get this escape sequence de facto standardized, and no one else will do it, that will move it up the priority queue. I don't expect full support to be trivial. I don't think AT&T troff has a concept of a space that is adjustable but not breakable. If that blows out the effort/reward estimate, treating `\~` as a synonym of `\ ` as mandoc does _should_ be trivial. Yup, it looks like it is. https://github.com/9fans/plan9port/blob/master/src/cmd/troff/n1.c#L515 > > I would pessimistically assume that most or all proprietary Unix > > troffs branched off from V7 Unix troff or early device-independent troff > > (maybe DWB 1.0 troff, ca. 1984 [?, 1]) lack support for `\~`. > > https://github.com/n-t-roff/Solaris10-ditroff/blob/master/troff/n1.c#L797 > > That does sound likely. As an example, look at Oracle Solaris 11: > > > uname -a > SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise > > printf "a\\\\~b\n" | nroff | head -n 1 > a~b > > printf "a\\\\~b\n" | groff -T ascii | head -n 1 > a b Yes. The rule is, if no semantics are defined for the function selector (the character after the escape character), then the character is treated as if it were not escaped. > > I further note that groff has a long tradition of inclusion in BSD > > Unix, https://minnie.tuhs.org/cgi-bin/utree.pl > > ?file=Net2/usr/src/usr.bin/groff/VERSION > > Yes. Cynthia already considered dropping support for Kernighan's > troff, but the CSRG vetoed that. Inclusion of groff wasn't > controversial even at a time when groff didn't have its own version > conrol yet. It seems strange now how revision control ever seemed like a luxury. For a few years I maintained Debian's XFree86 packages, which had _megabytes_ of patches on top of upstream, without using SCCS or RCS or CVS and even without a tool as nice as quilt. I was completely insane. On the other hand, it trained me to be pretty careful. Eventually, I acquired sanity and started using Subversion. > Frankly, i have no idea how to estimate the number of actively used > installations of Plan 9, Solaris (any version), and possibly > additional commercial systems like AIX and HP-UX, or how to check > what the latter support. Users/maintainers of these systems have to get involved and speak up. There is an unbounded quantity of Russell's Teapots labeled with names of Unix variants that have gone defunct. Without evidence, we must assume their numbers are too small to serve as a gate on development. That said, it remains polite to document changes that would affect them. > There might be more systems out there parsing manual pages (not > necessarily full-featured roff(7) implementations like those > you listed), but providing specific evidence of such systems > would likely be my job to back up my advice. I'm not searching > for them right now because we already have a few relevant examples. plan9port's troff seems like the only case for which we have concrete evidence, and Russ Cox has already been a pleasure to work with. I don't know that any user of OpenSolaris/Illumos troff has ever spoken up on the groff mailing list, which in spite of its implementation-specific name seems to be the water cooler for what remains of the global *roff community. The good news is that, both being descended from AT&T troff and, from what I've seen, neither having been re-architected, if someone comes up with `\~` support for plan9port troff, I predict that it will be mergeable into OpenSolaris/Illumos troff without much difficulty. ...especially the trivial `\ ` synonym version discussed above. > Even authors might disagree which is more important: > > (1) The typograpical difference between "\~" and "\ " > in PDF and PostScript output of manual pages. > > (2) Correctly rendering whitespace on Plan 9, Solaris, > and likely some other systems *at all*, for any output mode. > > I suspect that many would prefer (2) - of course, that claim is hard > to quantify. Another thing to consider is how bad the damage to comprehension is if a tilde shows up in place of a space. In a prose phrase, it is likely to be distracting and annoying but will not be a barrier to comprehension. [from groff_diff(7):] For example, if the current font is\~1 and font position\~1 is In synopses of commands and language features (like *roff requests or macros), I think anyone already familiar with Unix command lines or *roff languages, respectively, can still push their way past it, but it is worse. [from gdiffmk(1):] .RB [ \-a\~\c .RB [ \-c\~\c .RB [ \-d\~\c .RB [ \-x\~\c .BI \-a\~ add-mark .BI \-c\~ change-mark .BI \-d\~ delete-mark .BI \-M\~ "mark1 mark2" .BI \-x\~ diff-command .BI \-x\~ diff-command [from groff_diff(7):] .BI .chop\~ object .BI .class\~ "name c1 c2\~"\c .BI .close\~ stream .BI .composite\~ glyph1\~glyph2 .BI .color\~ n .BI .cp\~ n The tilde showing up in boldface would be especially disappointing. On the gripping hand, such aggressive use of `\~` is much more often seen in groff man pages than in (any?) others, and groff man pages can be expected to be formatted with groff or another `\~`-recognizing formatter much of the time. > It would probably be good to arrive at a consensus recommendation > for such cases because many manual page authors probably have little > interest in judging such questions themselves. Consensus seems > hard to reach though. So maybe the best we can do is to simply > state the fact that \~ is still not supported by a few not very widely > used, but still somewahat significant roff implementations like Plan 9 > and Solaris, even though that forces authors to draw their own > conclusion. I could easily copy the (now-corrected with respected to the age of mandoc's `\~` support) material about this escape sequence from our groff Texinfo manual to groff_man_style(1), where the "Portability" section quoted earlier in the thread is housed. As with the uptake of groff man(7) extension macros (be they 15 years old or more recent), a software project's documentors may be better placed than we are to assess the formatting capabilities of their users. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) 2022-08-12 22:10 ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson @ 2022-08-13 4:23 ` G. Branden Robinson 2022-08-14 14:15 ` Ingo Schwarze 2022-08-13 17:27 ` DJ Chase 1 sibling, 1 reply; 25+ messages in thread From: G. Branden Robinson @ 2022-08-13 4:23 UTC (permalink / raw) To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 600 bytes --] [self-follow-up] At 2022-08-12T17:10:35-0500, G. Branden Robinson wrote: > At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote: > > There are people using Plan 9 for practical work though, they have > > even occasionally posted on the groff and mandoc lists, so that is a > > bit more of a problem. plan9port's troff is no longer a problem, thanks to Dan Cross acting on my pull request at relativistic speed. https://github.com/9fans/plan9port/commit/93f814360076ccf28d33c9cb909fca7200ba4a7d I also have a PR pending with Illumos. https://github.com/illumos/illumos-gate/pull/83 Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) 2022-08-13 4:23 ` G. Branden Robinson @ 2022-08-14 14:15 ` Ingo Schwarze 2022-08-14 22:21 ` G. Branden Robinson 0 siblings, 1 reply; 25+ messages in thread From: Ingo Schwarze @ 2022-08-14 14:15 UTC (permalink / raw) To: g.branden.robinson; +Cc: Alejandro Colomar, linux-man, groff Hi Branden, G. Branden Robinson wrote on Fri, Aug 12, 2022 at 11:23:11PM -0500: > At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote: >> There are people using Plan 9 for practical work though, they have >> even occasionally posted on the groff and mandoc lists, so that is a >> bit more of a problem. > plan9port's troff is no longer a problem, thanks to Dan Cross acting on > my pull request at relativistic speed. > https://github.com/9fans/plan9port/commit/93f814360076ccf28d33c9cb909fca7200ba4a7d Nice. :-) > I also have a PR pending with Illumos. > https://github.com/illumos/illumos-gate/pull/83 Illumos isn't doing development on GitHub. Besides, Illumos is less of a problem because they have been using mandoc as the default manual page formatter since July 2014. All the same, getting \~ supported in their general-purpose roff implementation is no doubt nice to have, too. That reduces my converns mostly to commercial UNIXes and potentially to a few ad-hoc conversion tools we are not even aware of. Consequently, the converns aren't 100% resolved yet but getting closer to becoming theoretical concerns. If it's only commercial UNIXes and unknown tools that may break, the improved typesetting quality may be worth the risk. Yours, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) 2022-08-14 14:15 ` Ingo Schwarze @ 2022-08-14 22:21 ` G. Branden Robinson 0 siblings, 0 replies; 25+ messages in thread From: G. Branden Robinson @ 2022-08-14 22:21 UTC (permalink / raw) To: Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1957 bytes --] At 2022-08-14T16:15:54+0200, Ingo Schwarze wrote: > > I also have a PR pending with Illumos. > > https://github.com/illumos/illumos-gate/pull/83 > > Illumos isn't doing development on GitHub. Yeah, I promptly got a lengthy follow-up from a member of the core team pointing me to even more lengthy contribution procedures. (I guess this explains the "-gate" suffix in the GH project name.) > Besides, Illumos is less of a problem because they have been using > mandoc as the default manual page formatter since July 2014. Ahh, so the general Illumos user won't suffer mishandling of `\~` anyway--not in man pages, at least. > All the same, getting \~ supported in their general-purpose > roff implementation is no doubt nice to have, too. Yes. But I don't have the spoons to go through their formal contribution procedure. I think my PR will have to sit there as a form of incompatibility notice, and someone else will need to pick up the patch and advocate for its incorporation. I also have a serious handicap in that I can't test my patch; I don't run Illumos. (Plan 9 from User Space makes it easy to test _in situ_.) I don't blame them for having a lot of process; their concerns are surely more with sexy but delicate, high-stakes stuff like ZFS and DTrace. Not post-1989 developments in troff. > That reduces my converns mostly to commercial UNIXes and potentially > to a few ad-hoc conversion tools we are not even aware of. > Consequently, the converns aren't 100% resolved yet but getting > closer to becoming theoretical concerns. If it's only commercial > UNIXes and unknown tools that may break, the improved typesetting > quality may be worth the risk. And we don't know how many, if any, of those are even _maintained_, so even if the knowledge of what to patch were available, the will may be lacking. I'll take my easy win and move on to the next problem. :D Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) 2022-08-12 22:10 ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson 2022-08-13 4:23 ` G. Branden Robinson @ 2022-08-13 17:27 ` DJ Chase 2022-08-14 13:56 ` Standardize roff (was: *roff `\~` support) Ingo Schwarze 1 sibling, 1 reply; 25+ messages in thread From: DJ Chase @ 2022-08-13 17:27 UTC (permalink / raw) To: G. Branden Robinson, Ingo Schwarze; +Cc: Alejandro Colomar, linux-man, groff On Fri Aug 12, 2022 at 6:10 PM EDT, G. Branden Robinson wrote: > Hi Ingo, > > At 2022-08-12T16:30:01+0200, Ingo Schwarze wrote: > > G. Branden Robinson wrote on Thu, Aug 11, 2022 at 03:17:14PM -0500: > > > At 2022-08-11T14:48:51+0200, Ingo Schwarze wrote: > > >> The former is portable and the latter is a GNU extension. > > > > > ...that is over 30 years old and supported by Heirloom Doctools > > > troff for 17 years now, neatroff for about six, and your mandoc for > > > three. > > > > Actually, mandoc supports \~ at least since Sep 17 2009: > > https://cvsweb.bsd.lv/mandoc/Attic/chars.in?rev=1.1&content-type=text/x-cvsweb-markup > > Whoops! I regret the error, and will update groff's Texinfo manual to > correct this. > > > > plan9port troff doesn't either, and its laudable introduction > > > of a man(7) MR macro notwithstanding, its activity level is > > > not high. > > > > There are people using Plan 9 for practical work though, they have > > even occasionally posted on the groff and mandoc lists, so that is a > > bit more of a problem. > > […] But, if > that's what it takes to get this escape sequence de facto standardized, > and no one else will do it, that will move it up the priority queue. Have we ever considered a de jure *roff standard? If not, here are just some reasons: • [the obvious benefits of standardizing anything] • A standard could lead to more implementations because developers would not have to be intimately familiar with the {groff,heirloom,neatroff} toolchain before implementing a *roff toolchain themselves. • It could also lead to more users & use cases because existing users could count on systems supporting certain features, so they could use *roff in more situations, which would lead to more exposure. Cheers, -- DJ Chase They, Them, Theirs PS: It’s ridiculous that *roff isn’t part of POSIX when it was Unix’s killer feature. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Standardize roff (was: *roff `\~` support) 2022-08-13 17:27 ` DJ Chase @ 2022-08-14 13:56 ` Ingo Schwarze 2022-08-14 14:49 ` DJ Chase 2022-08-15 0:20 ` Sam Varshavchik 0 siblings, 2 replies; 25+ messages in thread From: Ingo Schwarze @ 2022-08-14 13:56 UTC (permalink / raw) To: DJ Chase; +Cc: g.branden.robinson, Alejandro Colomar, linux-man, groff Hi, DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: > Have we ever considered a de jure *roff standard? No, i think that would be pure madness given the amount of working time available in any of the roff projects. I expect the amount of effort required to be significantly larger than the amount of effort that would be required for rewriting the entire groff documentation from scratch because: 1. You would have to study all features of all the major roff implementations (groff, Heirloom, neatroff, Plan 9, and possibly some others, maybe even historical ones) and compare the features. For every difference (i.e. typically multiple times for almost every feature), you would have to descide which behaviour to standardize and what to leave unspecified. 2. Discussions of the kind mentioned in item 1 are typically lengthy and often heated. If you don't believe me, just buy several pounds of popcorn and watch the Austin list, where maintenance of the POSIX standard is being discussed. Even discussions of the most minute details tend to be complicated and extended. 3. Even after deciding what you want to specify, looking at the manuals typically provides very little help because a standard document requires a completely different style. User and even reference documentation is optimized for clarity, comprehensibility, and usefulness in practice; a standard document needs to be optimized for formal precision, whereas comprehensibility and conciseness matters much less. 4. Even when you have the text - almost certainly after many years of work by many people - be prepared for huge amounts of red tape, like dealing with elected decision-making bodies of professional associations, for example the IEEE. Be prepared for having to know things like what technical societies, technical councils, and technical committees are, and how to deal with each of them. You are certainly in for a lot of committee work, and i would count you lucky if you got away without having to deal with lawyers, paying membership fees, buying expensive standard documents you need for your work, and so on and so forth. Even when you submit a technically perfect proposal, it will typically be rejected without even being considered until you secure the official sponsorship of at least one of the following: the IEEE, the Open Group, or ISO/IEC JTC 1/SC 22. Of course, your milage may vary depending on what exactly you want to standardize and how, but since roff(1) is arguably the most famous UNIX program, i wouldn't be surprised if you were if for an uanbridged POSIX-style Odyssey. 5. The above is not helped by standards committee work being typically conducted in ways that are technically ridiculously outdated, and i'm saying that as an avid user of cvs(1) who somewhat dislikes git(1) as overengineered and very strongly detests GitHub. Take the Austin groups as an example. Most of its work is changing the content of technical documents, but the group *never* uses diff(1), never uses patch(1), and never makes diffs available even after they have been approved. They are very firmly stuck in the 1980ies regarding the technolgies they are using and missed even most of the 1990ies innovations. They do have some kind of version control system internally, but no web interface of such version control ins publicly available, nor any other public read-only access to that version control. Even the source code of the finished version of the standard is typically not made available to the public (at least not without forcing people to jump through hoops). > A standard could lead to more implementations because > developers would not have to be intimately familiar with the > {groff,heirloom,neatroff} toolchain before implementing a > *roff toolchain themselves. That's not even wishful thinking. Better maintenance of the existing implementations would be so much more useful than yet another implementation. > It could also lead to more users & use cases because existing > users could count on systems supporting certain features, so > they could use *roff in more situations, which would lead to > more exposure. You appear to massively overrate the importance end-users typically attribute to standardization. Even people *implementing* a system rarely put such an emphasis on standardization. > It’s ridiculous that *roff isn’t part of POSIX when it was Unix’s > killer feature. You are welcome to spend the many years required to change that. But be aware that some standardization efforts that are part of POSIX resulted in parts of the standard that are barely useable for practical work. One famous example is make(1). Don't get me wrong: i think standardization is very nice to have, should be taken very seriously when available, and provides some value even when the standardization effort mostly failed, like in the case of make(1). But standardization is absolutely not cheap. To the contrary, it is usually significantly more expensive than implementation and documentation. Yours, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 13:56 ` Standardize roff (was: *roff `\~` support) Ingo Schwarze @ 2022-08-14 14:49 ` DJ Chase 2022-08-14 16:32 ` Alejandro Colomar 2022-08-14 22:35 ` G. Branden Robinson 2022-08-15 0:20 ` Sam Varshavchik 1 sibling, 2 replies; 25+ messages in thread From: DJ Chase @ 2022-08-14 14:49 UTC (permalink / raw) To: Ingo Schwarze; +Cc: g.branden.robinson, Alejandro Colomar, linux-man, groff On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote: > Hi, > > DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: > > > Have we ever considered a de jure *roff standard? > > No, i think that would be pure madness given the amount of working > time available in any of the roff projects. > > […] This is very sad to hear. > > It could also lead to more users & use cases because existing > > users could count on systems supporting certain features, so > > they could use *roff in more situations, which would lead to > > more exposure. > > You appear to massively overrate the importance end-users > typically attribute to standardization. That’s probably because *I* massively overrate the importance of standardization (I mean I literally carry a standards binder with me). Still, though, it’s rather annoying that end users — especially programmers — don’t value standards as much. > > It’s ridiculous that *roff isn’t part of POSIX when it was Unix’s > > killer feature. > > You are welcome to spend the many years required to change that. > But be aware that some standardization efforts that are part of > POSIX resulted in parts of the standard that are barely useable > for practical work. One famous example is make(1). > > Don't get me wrong: i think standardization is very nice to have, > should be taken very seriously when available, and provides some > value even when the standardization effort mostly failed, like in > the case of make(1). But standardization is absolutely not cheap. > To the contrary, it is usually significantly more expensive than > implementation and documentation. Would an informal de jure standard be of any use? Like how TOML just has a specification, but it’s somewhat usable as a standard because it’s been pretty stable and because it’s written clearly enough. Cheers, -- DJ Chase They, Them, Theirs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 14:49 ` DJ Chase @ 2022-08-14 16:32 ` Alejandro Colomar 2022-08-14 19:43 ` DJ Chase 2022-08-14 22:35 ` G. Branden Robinson 1 sibling, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-08-14 16:32 UTC (permalink / raw) To: DJ Chase, Ingo Schwarze; +Cc: g.branden.robinson, linux-man, groff [-- Attachment #1.1: Type: text/plain, Size: 7383 bytes --] Hi, On 8/14/22 16:49, DJ Chase wrote: > On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote: >> Hi, >> >> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: >> >>> Have we ever considered a de jure *roff standard? >> >> No, i think that would be pure madness given the amount of working >> time available in any of the roff projects. >> >> […] > > This is very sad to hear. > >>> It could also lead to more users & use cases because existing >>> users could count on systems supporting certain features, so >>> they could use *roff in more situations, which would lead to >>> more exposure. >> >> You appear to massively overrate the importance end-users >> typically attribute to standardization. > > That’s probably because *I* massively overrate the importance of > standardization (I mean I literally carry a standards binder with me). > Still, though, it’s rather annoying that end users — especially > programmers — don’t value standards as much. (Official) standardization isn't necessarily a good thing. With C, it was originally good, in the times of ISO C89. Now, it's doing more damage to the language and current implementations than any good (it's still doing some good, but a lot of bad). The best that a standardization process can do is limit itself to describe _only_ features already existing in the language, being a kind of arbiter that decides on which behavior is best for a given feature, so that all implementations follow the best existing one. Where different implementations might have good reasons to do it differently, the standard should describe the behavior as implementation-specific. And of course, a standard should only standardize features that are expected to be good for every implementation, with optional features either not being standardized, or being marked optional by the standard (like Annex K was; although that one was broken, so it was later removed for good). But that shouldn't be necessary if implementors had some decency and didn't implement features so that they are completely incompatible with those of other systems. I.e., if an existing system has 'foo(int a);', you don't provide 'foo(int *b);'; you go for 'foo2(int *b);' or 'bar(int *b);'. There's plenty of cases where this has happened, and in some cases it might be due to an accident, but in some other cases, it's just due to incompetence. See an example that bit me a month ago: <https://github.com/nginx/unit/issues/737>. And the bad things that standardization can do are several: By reserving the power to centrally decide the future of a language, they take power from implementations, which now can't add some features, by fear that they might contradict a future standard. This is very sad, because while the implementations are guided by usefulness and worthiness, and try to come up with the best feature for them (and by natural selection, implementations are then used or not used, depending on their quality), standards have a large part of bureaucracy, and that doesn't provide the best features. A few examples of that are: a %b printf specifier for binary was rejected by glibc on the terms of something like "the feature is good, and the implementation seems correct, but %b is reserved by the standard, so we don't want to possibly conflict with a future standard"; luckily, the standard defined that, and the feature was added a few years later. One example that is much more necessary is a way to get the size of an array, which currently is impossible in portable C (at least not in a way that safely rejects to compile on non-arrays). I also proposed an addition to glibc, and the reasons to reject it were of the same kind, and arguing that the standard was discussing about adding such a feature; guess what? the standard hasn't added such a feature for C23, and we still have no portable way to do it (and the unportable ways are more cumbersome than what one would expect). I hope C3x adds _Lengthof(arr), but who knows. <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf> <https://stackoverflow.com/questions/37538/how-do-i-determine-the-size-of-my-array-in-c/57537491#57537491> And then we have another problem of standardization committees: their priorities are so broken, that they prefer inventing a completely new feature for C, with nothing even remotely resemblant to it within the existing language (I'm talking about nullptr and nullptr_t), rather than standardizing an existing good feature such as POSIX's NULL ((void *) 0). So now we have 0, NULL, and nullptr for referring to a null pointer constant in C. And none of them is perfect. 0 needs to be casted when passed to variadic functions, and has readability issues. NULL is perfect within the POSIX world, but if you go out of POSIX, it's as bad as 0. nullptr, apart from being incomprehensible, it is unsafe; okay it's not unsafe by itself, and if it were the only way to refer to a null pointer constant, it would be great, but it's not, and even the committee recognizes that it will never be. <https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269/48> Many existing projects that use NULL (especially POSIX projects), are not going to change their whole codebase to use nullptr. nullptr_t adds some features that add safety against null pointer constants based on the type of the constant (by means of _Generic); but that means that one can easily bypass those features by using NULL or 0, which means that it's not really safe, and it might give a sense of safety that it has not. So, without extending my rant about nullptr much more, it's just a feature broken from day 0, invented by the ISO C committee. Maybe one of the worst problems of the committee (WG14) is that many of its members are also members of the WG21, and as such, they may have incompatible priorities. I don't see standardization as good as it may seem at first glance. And of course following the standard should come with a pinch of salt: one should follow the standard, when the standard isn't broken. But then, the standard isn't better than any other implementation. So, as a programmer, I think programs should target their expected systems, and not more (unless it's easy). If a program is to be run on Linux, then target GNU C. If you can add some partial support for ISO C without interfering in your way significantly, then okay, go for it; but complete ISO C support is unthinkable; a program conforming to ISO C is useless, or unnecessarily complex, or even unsafe. I implement things thinking on my system first, then if it's easy, I can support other FOSS Unix systems, if it's easy, but only if it is. Commercial systems are automatically out of support. I'm not spending a single minute of my time to be nice to those systems when their not nice to me. I think it's better to let natural selection to work out its way. If a feature is good, other implementations will pick it, and maybe even improve it. If a feature is not good (or it's not needed by other systems), it will not be portable. Cheers, Alex -- Alejandro Colomar <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 16:32 ` Alejandro Colomar @ 2022-08-14 19:43 ` DJ Chase 2022-08-15 11:59 ` Alejandro Colomar 0 siblings, 1 reply; 25+ messages in thread From: DJ Chase @ 2022-08-14 19:43 UTC (permalink / raw) To: Alejandro Colomar, Ingo Schwarze; +Cc: g.branden.robinson, linux-man, groff On Sun Aug 14, 2022 at 12:32 PM EDT, Alejandro Colomar wrote: > On 8/14/22 16:49, DJ Chase wrote: > > On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote: > >> You appear to massively overrate the importance end-users > >> typically attribute to standardization. > > > > That’s probably because *I* massively overrate the importance of > > standardization (I mean I literally carry a standards binder with me). > > Still, though, it’s rather annoying that end users — especially > > programmers — don’t value standards as much. > > (Official) standardization isn't necessarily a good thing. With C, it > was originally good, in the times of ISO C89. Now, it's doing more > damage to the language and current implementations than any good (it's > still doing some good, but a lot of bad). > > [Snipped because I’m not going to quote the whole email — see previous > message for argument] > > I think it's better to let natural selection to work out its way. If a > feature is good, other implementations will pick it, and maybe even > improve it. If a feature is not good (or it's not needed by other > systems), it will not be portable. True; prescriptive standards can certainly make some things worse. As a further example, ISO 8601 sucks. I mean, its core specification is great, but there are so many different ways that are allowed that the full standard is almost completely unparseable. It also uses a slash between the start and end times of a period instead of something sensible, like, I don’t know, an en-dash! Which means that periods can be written with a slash (because that’s the standard) but also with an en-dash (because that’s how ranges work in English), but also that one can’t properly write a period in a file name or URI. Still, though, I think descriptive standards can be net-positive. The POSIX shell utilities comes to mind. Sure, they certainly have some issues, but because it’s a trailing standard, implementers are free to fix them. Do you think that a descriptive/trailing standard could be beneficial or would you still say that it could mostly hinder *roff implementations? Cheers, -- DJ Chase They, Them, Theirs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 19:43 ` DJ Chase @ 2022-08-15 11:59 ` Alejandro Colomar 2022-08-16 11:48 ` Ingo Schwarze 0 siblings, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-08-15 11:59 UTC (permalink / raw) To: DJ Chase; +Cc: g.branden.robinson, linux-man, groff, Ingo Schwarze [-- Attachment #1.1: Type: text/plain, Size: 2506 bytes --] Hi, On 8/14/22 21:43, DJ Chase wrote: > True; prescriptive standards can certainly make some things worse. As a > further example, ISO 8601 sucks. I mean, its core specification is > great, but there are so many different ways that are allowed that the > full standard is almost completely unparseable. It also uses a slash > between the start and end times of a period instead of something > sensible, like, I don’t know, an en-dash! Which means that periods can > be written with a slash (because that’s the standard) but also with an > en-dash (because that’s how ranges work in English), but also that one > can’t properly write a period in a file name or URI. > > Still, though, I think descriptive standards can be net-positive. The > POSIX shell utilities comes to mind. Sure, they certainly have some > issues, but because it’s a trailing standard, implementers are free to > fix them. > > Do you think that a descriptive/trailing standard could be beneficial > or would you still say that it could mostly hinder *roff > implementations? Well, a standard that truly recognizes the authority of implementations to drive the language and doesn't do anything else but describe the best already-implemented ways to achive things is a good thing. It can't hinder future implementations, because it doesn't have the power to drive the future of the language, only describes the past. POSIX C has been doing good in that; much better than ISO C. I don't understand how POSIX works internally, though. If some entity can fund (and is interested in) such a standardization process, it could bring some good. But yeah, it will likely be very costly in time and money. Worth it? I don't know. But we can achieve something very similar by documenting the differences between known roff alternatives somewhere. And that's likely to be much easier. In the Linux man-pages we document when a function is in ISO C or in POSIX, but also when it's not standardized but present in other Unix systems (so that it has some degree of portability), or when it is Linux-only. Maybe having something similar in groff's manual pages would be effective. For example, for .MR, we were discussing that probably it would be good to add a note like "(since groff 1.23.0)" and maybe it could also state which other roff (or mandoc) implementations support it. Cheers, Alex -- Alejandro Colomar <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-15 11:59 ` Alejandro Colomar @ 2022-08-16 11:48 ` Ingo Schwarze 0 siblings, 0 replies; 25+ messages in thread From: Ingo Schwarze @ 2022-08-16 11:48 UTC (permalink / raw) To: Alejandro Colomar; +Cc: DJ Chase, g.branden.robinson, linux-man, groff Hi Alejandro, Alejandro Colomar wrote on Mon, Aug 15, 2022 at 01:59:24PM +0200: > On 8/14/22 21:43, DJ Chase wrote: >> Do you think that a descriptive/trailing standard could be beneficial >> or would you still say that it could mostly hinder *roff >> implementations? When prepared with diligence and without falling for featurism, it might be useful because the common subset of the major roff implementations is large enough that it would likely be possibly prepare portable roff documents following such a standard. However, such a standard could likely *not* include *any* of the best features of any of the implementations: yes, implementations have diverged that much - not quite as bad as make(1), but still more than many other classical Unix programs. Consequently, only authors with modest needs could possibly consider adhering to the standard. To provide some striking examples, the standard could include neither the mom(7) macro set - which is a killer feature of groff - nor the mdoc(7) macro set - which has been an important feature of groff for more than 30 years and of mandoc for more than 10 years. This is all theoretical though - as i explained, the effort required for developing such a (necessarily seriously stunted) standard is prohibitive. [...] > But we can achieve something very similar by documenting the differences > between known roff alternatives somewhere. And that's likely to be much > easier. That's a much lower bar than a standard, but don't underestimate the effort involved even in that. A few very small parts of that already exist. For example, https://mandoc.bsd.lv/man/man.options.1.html documents command line options of some roff(1) and man(1) implementations, mostly intended for people who see themselves forced to invent a new command line option - which should of course be avoided if at all possible because the tangle of existing options is already terrifying. For example, https://man.openbsd.org/roff.7 documents roff requests and roff escape sequences; search for "extension" in that page. Even though this page focusses on groff, Heirloom, and mandoc and does not mention Plan 9, neatroff, or other implementations, the amount of compatibility information scattered around that page is already larger than what would seem healthy for most user-facing documentation. It's OK here because this page is geared more towards developers than towards users. Also, note that this page is already very long even though it is extremely terse - so terse that it is insufficient for learning how to use most of the features mentioned. > In the Linux man-pages we document when a function is in ISO C or in > POSIX, but also when it's not standardized but present in other Unix > systems (so that it has some degree of portability), or when it is > Linux-only. Maybe having something similar in groff's manual pages > would be effective. Except that the bulk, and in particular the core, of groff functionality is *not* described in manual pages in the first place. Would you want to litter groff.texi with compatibility information throughout? That would likely cause a significant increase in size, almost certainly a very signifant decrease in maintainability, and possibly it might also somewhat decrease readability. > For example, for .MR, we were discussing that probably it would be good > to add a note like "(since groff 1.23.0)" and maybe it could also state > which other roff (or mandoc) implementations support it. But that feels like an exception rather than the rule. It seems warranted for this particular case because we are introducing a new feature without consideration for compatibility that will cause information loss for end-users unless something unusual is done about it. Hopefully, we are not going to turn that vice into a habit. The particular case of .MR is somewhat specific to manual pages, too. If people prepare a typeset document using many advanced features with groff or Heirloom, they are used to the fact that it won't work with the other, nor with Plan 9. That's not a major problem because most of the time, the author is the only person who really needs to typeset a document. Nowadays, the average reader will only read the PDF version, which is totally different from the situation with manual pages. Yours, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 14:49 ` DJ Chase 2022-08-14 16:32 ` Alejandro Colomar @ 2022-08-14 22:35 ` G. Branden Robinson 2022-08-14 22:58 ` DJ Chase 1 sibling, 1 reply; 25+ messages in thread From: G. Branden Robinson @ 2022-08-14 22:35 UTC (permalink / raw) To: DJ Chase; +Cc: Ingo Schwarze, Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 2183 bytes --] At 2022-08-14T14:49:10+0000, DJ Chase wrote: > On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote: > > DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: > > > > > Have we ever considered a de jure *roff standard? > > > > No, i think that would be pure madness given the amount of working > > time available in any of the roff projects. Mark your calendars--Ingo and I are in substantial agreement. ;-) > This is very sad to hear. I think the take-away here is that the decision to formally standardize a technology, like many things, is an economic one. There are costs and benefits. Being seduced by the benefits without a full understanding of the costs often leads to remorse. (And, in many domains, fat commissions for sales personnel.) > That’s probably because *I* massively overrate the importance of > standardization (I mean I literally carry a standards binder with me). > Still, though, it’s rather annoying that end users — especially > programmers — don’t value standards as much. I think it is less that programmers value standards in the wrong amount, than that they disregard them for the wrong reasons--like "moving fast" and building fragile solutions that will cost more on the back end after higher-paid decision makers have moved on to greener pastures. Nothing succeeds like handing your successor a trash fire. > Would an informal de jure standard You just defined "de facto standard". ;-) "De jure" is Latin for "of the law". If something is not codified in "law", or a normative document like a formal standard, then what is "standard" is simply the intersection of prevailing practices. > be of any use? Like how TOML just has a specification, but it’s > somewhat usable as a standard because it’s been pretty stable and > because it’s written clearly enough. A purely descriptive document, mainly comprising a matrix of features with escape sequence, request, and predefined register names on one axis and the names of implementations on the other, with version numbers and commentary populating the elements, could be a useful thing to have. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 22:35 ` G. Branden Robinson @ 2022-08-14 22:58 ` DJ Chase 0 siblings, 0 replies; 25+ messages in thread From: DJ Chase @ 2022-08-14 22:58 UTC (permalink / raw) To: G. Branden Robinson; +Cc: Ingo Schwarze, Alejandro Colomar, linux-man, groff On Sun Aug 14, 2022 at 6:35 PM EDT, G. Branden Robinson wrote: > At 2022-08-14T14:49:10+0000, DJ Chase wrote: > > On Sun Aug 14, 2022 at 9:56 AM EDT, Ingo Schwarze wrote: > > > DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: > > > > > > > Have we ever considered a de jure *roff standard? > > > > > > No, i think that would be pure madness given the amount of working > > > time available in any of the roff projects. > > Mark your calendars--Ingo and I are in substantial agreement. ;-) > > > This is very sad to hear. > > I think the take-away here is that the decision to formally standardize > a technology, like many things, is an economic one. There are costs and > benefits. Being seduced by the benefits without a full understanding of > the costs often leads to remorse. (And, in many domains, fat > commissions for sales personnel.) > > > That’s probably because *I* massively overrate the importance of > > standardization (I mean I literally carry a standards binder with me). > > Still, though, it’s rather annoying that end users — especially > > programmers — don’t value standards as much. > > I think it is less that programmers value standards in the wrong amount, > than that they disregard them for the wrong reasons--like "moving fast" > and building fragile solutions that will cost more on the back end after > higher-paid decision makers have moved on to greener pastures. > > Nothing succeeds like handing your successor a trash fire. > > > Would an informal de jure standard > > You just defined "de facto standard". ;-) > > "De jure" is Latin for "of the law". If something is not codified in > "law", or a normative document like a formal standard, then what is > "standard" is simply the intersection of prevailing practices. By “informal de jure”, I meant ‘de jure, but written in an informal manner’. > > be of any use? Like how TOML just has a specification, but it’s > > somewhat usable as a standard because it’s been pretty stable and > > because it’s written clearly enough. > > A purely descriptive document, mainly comprising a matrix of features > with escape sequence, request, and predefined register names on one axis > and the names of implementations on the other, with version numbers and > commentary populating the elements, could be a useful thing to have. I’m on it (except not really, because we’re in the middle of a move, school resumes shortly, and etc. But eventually™, I’m on it). Cheers, -- DJ Chase They, Them, Theirs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff (was: *roff `\~` support) 2022-08-14 13:56 ` Standardize roff (was: *roff `\~` support) Ingo Schwarze 2022-08-14 14:49 ` DJ Chase @ 2022-08-15 0:20 ` Sam Varshavchik 2022-08-16 12:52 ` Standardize roff Ingo Schwarze 1 sibling, 1 reply; 25+ messages in thread From: Sam Varshavchik @ 2022-08-15 0:20 UTC (permalink / raw) To: Ingo Schwarze Cc: DJ Chase, g.branden.robinson, Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1211 bytes --] Ingo Schwarze writes: > Hi, > > DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: > > > Have we ever considered a de jure *roff standard? > > No, i think that would be pure madness given the amount of working > time available in any of the roff projects. > > I expect the amount of effort required to be significantly larger > than the amount of effort that would be required for rewriting > the entire groff documentation from scratch because: I tinkered with something like this some years ago, but I took a slightly different approach. I converted man pages from 'roff source to Docbook XML using a … pretty large Perl script. Once a year, or so, when I have nothing better to do I pull the current man page tarball and reconvert it. I usually need to tinker the Perl script, here and there, each time. The Docbook folks provide a stylesheet that converts Docbook XML back to 'roff. The end result you get is standardized 'roff, whatever that means. But, yes, the effort require to clean up and standardize the formatting of man pages would be mammoth. There's more inconsistency across the various man pages, from various sources, than consistency. [-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff 2022-08-15 0:20 ` Sam Varshavchik @ 2022-08-16 12:52 ` Ingo Schwarze 2022-08-16 23:46 ` Sam Varshavchik 0 siblings, 1 reply; 25+ messages in thread From: Ingo Schwarze @ 2022-08-16 12:52 UTC (permalink / raw) To: Sam Varshavchik Cc: DJ Chase, g.branden.robinson, Alejandro Colomar, linux-man, groff Hi San, Sam Varshavchik wrote on Sun, Aug 14, 2022 at 08:20:34PM -0400: > Ingo Schwarze writes: >> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: >>> Have we ever considered a de jure *roff standard? >> No, i think that would be pure madness given the amount of working >> time available in any of the roff projects. > I tinkered with something like this some years ago, but I took a slightly > different approach. > > I converted man pages What kind of manual pages? > from 'roff source to Docbook XML using a … pretty large Perl script. That sounds very foolish on several levels. First, and most obviously, you seem to be duplicating esr@'s work on doclifter: http://www.catb.org/~esr/doclifter/ https://gitlab.com/esr/doclifter/-/blob/master/doclifter Second, quick and dirty Perl-style parsing is usually not good enough to parse roff code, and a huge script is not particularly good for readability and maintainability. Yes, i know the same resevations would apply to esr@'s work, which is a giant Python 3 script. But at least there is some evidence that his work was able to find significant numbers of real issues in real manual pages. > Once a year, or so, when I have nothing better to do I pull the current > man page tarball and reconvert it. I usually need to tinker the Perl > script, here and there, each time. > > The Docbook folks provide a stylesheet that converts Docbook XML > back to 'roff. Yikes. That thing is by far the worst man(7) code generator existing on this planet. If at all possible, you should avoid that toolchain like the plague. It is so bad that for years, bogus reports caused by that totally broken toolchain have caused the majority of invalid mandoc bug reports. > The end result you get is standardized 'roff, whatever that means. Absolutely not. The result is utter crap. It is rarely even syntactically valid, let alone reasonable style. > But, yes, the effort require to clean up and standardize the formatting > of man pages would be mammoth. There's more inconsistency across the > various man pages, from various sources, than consistency. That isn't completely untrue, but all the same, mandoc copes well enough with more than 95% of valid real-world manual pages, and groff with 100%. In a nutshell, the only stuff that breaks with groff is manual pages that are completely invalid, usually coming from the official DocBook XML toolchain, and in rarer cases coming from other broken man(7) generators. All this is barely related to the question of standardizing roff(7), though. Roff is much more than manual pages. Yours, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Standardize roff 2022-08-16 12:52 ` Standardize roff Ingo Schwarze @ 2022-08-16 23:46 ` Sam Varshavchik 0 siblings, 0 replies; 25+ messages in thread From: Sam Varshavchik @ 2022-08-16 23:46 UTC (permalink / raw) To: Ingo Schwarze Cc: DJ Chase, g.branden.robinson, Alejandro Colomar, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 3545 bytes --] Ingo Schwarze writes: > Hi San, > > Sam Varshavchik wrote on Sun, Aug 14, 2022 at 08:20:34PM -0400: > > Ingo Schwarze writes: > >> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: > > >>> Have we ever considered a de jure *roff standard? > > >> No, i think that would be pure madness given the amount of working > >> time available in any of the roff projects. > > > I tinkered with something like this some years ago, but I took a slightly > > different approach. > > > > I converted man pages > > What kind of manual pages? The ones that are the subject of discussions on linux-man@vger.kernel.org. > > from 'roff source to Docbook XML using a … pretty large Perl script. > > That sounds very foolish on several levels. Well, I had some free time the other day, and had nothing better to do. > First, and most obviously, you seem to be duplicating esr@'s work > on doclifter: > > http://www.catb.org/~esr/doclifter/ > https://gitlab.com/esr/doclifter/-/blob/master/doclifter Seems so, except that I tailored my logic to man pages, and specifically to the linux-man@vger.kernel.org manpages. > Second, quick and dirty Perl-style parsing is usually not good > enough to parse roff code, and a huge script is not particularly > good for readability and maintainability. Yes, arbitrary roff code will not fly very far. But something that's tailored can produce productive results. > Yes, i know the same resevations would apply to esr@'s work, > which is a giant Python 3 script. But at least there is some > evidence that his work was able to find significant numbers of > real issues in real manual pages. Yes, there are plenty of issues there. I fed quite a few patches to Mr. Kerrisk when he maintained them, based on my scripts chewing through them. There were plenty of mismatched .nf/.fi, and other things of that sort. > > Once a year, or so, when I have nothing better to do I pull the current > > man page tarball and reconvert it. I usually need to tinker the Perl > > script, here and there, each time. > > > > The Docbook folks provide a stylesheet that converts Docbook XML > > back to 'roff. > > Yikes. That thing is by far the worst man(7) code generator existing > on this planet. If at all possible, you should avoid that toolchain > like the plague. I do not view it as an authoritative source of man sources, but more of backwards compatibility. I believe that for man pages, roff should've been replaced by Docbook XML a long time ago. That was really the original impetus for my Perl hacking: to see how feasible it would be to convert the existing man pages to Docbook XML. My end result showed that at least that it was doable; and I think that the Docbook XML stylesheet for man pages would've been an acceptable way to get some roff source generated from Docbook XML that's shown by the man command. > > The end result you get is standardized 'roff, whatever that means. > > Absolutely not. The result is utter crap. It is rarely even > syntactically valid, let alone reasonable style. I should've used "consistent" instead of "standardized". Different man pages from different sources use different ways of rendering the same content, i.e. function names. Sometimes it's in bold. Sometimes it's in italic. Sometimes it's something else. With consistent semantic markup a <function> in every man page would've produced the same markup in the generated roff source. [-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-08-16 23:46 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-29 11:45 [PATCH 4/6] xattr.7: wfix Štěpán Němec 2022-07-29 20:58 ` G. Branden Robinson 2022-07-30 14:15 ` Štěpán Němec 2022-07-30 17:53 ` Alejandro Colomar (man-pages) 2022-07-30 17:59 ` Alejandro Colomar (man-pages) 2022-08-01 13:28 ` Alejandro Colomar 2022-08-11 12:48 ` Ingo Schwarze 2022-08-11 20:17 ` G. Branden Robinson 2022-08-12 14:30 ` Ingo Schwarze 2022-08-12 22:10 ` *roff `\~` support (was: [PATCH 4/6] xattr.7: wfix) G. Branden Robinson 2022-08-13 4:23 ` G. Branden Robinson 2022-08-14 14:15 ` Ingo Schwarze 2022-08-14 22:21 ` G. Branden Robinson 2022-08-13 17:27 ` DJ Chase 2022-08-14 13:56 ` Standardize roff (was: *roff `\~` support) Ingo Schwarze 2022-08-14 14:49 ` DJ Chase 2022-08-14 16:32 ` Alejandro Colomar 2022-08-14 19:43 ` DJ Chase 2022-08-15 11:59 ` Alejandro Colomar 2022-08-16 11:48 ` Ingo Schwarze 2022-08-14 22:35 ` G. Branden Robinson 2022-08-14 22:58 ` DJ Chase 2022-08-15 0:20 ` Sam Varshavchik 2022-08-16 12:52 ` Standardize roff Ingo Schwarze 2022-08-16 23:46 ` Sam Varshavchik
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.