All of lore.kernel.org
 help / color / mirror / Atom feed
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: наб <nabijaczleweli@nabijaczleweli.xyz>, linux-man@vger.kernel.org
Subject: Re: [PATCH 4/5] tm.3type: describe tm_zone, tm_gmtoff
Date: Thu, 21 Jul 2022 22:33:53 -0500	[thread overview]
Message-ID: <20220722033353.ap7aqxh6uhghdcxo@illithid> (raw)
In-Reply-To: <90beebd3-2636-21d5-323b-766c8d81d6d3@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 10416 bytes --]

Hi Alex,

At 2022-07-19T14:17:15+0200, Alejandro Colomar wrote:
> Hi, наб and Branden!

I'm not exactly sure what you wanted me to comment on in this patch
submission.

Keep in mind that I am a bear of little brain--please be clear what it
is you're asking of me.  ;-)

I will assume that it is my *roff/man(7) expertise (such as it is) and
will respond on that assumption.

I will also comment on English usage because I can't help myself.

> > diff --git a/man3/tm.3type b/man3/tm.3type

Oh, bother.  Bash autocompletion for "man" on my Debian bullseye is too
dumb to recognize this new man page suffix.  I trust someone reading
this is aware of the problem and is fixing it for the next Debian
release.  (Has someone filed this as a bug with the Debian BTS?)

Other distributions may have similar concerns.

> > index 1931d890d..8b6f8d9bf 100644
> > --- a/man3/tm.3type
> > +++ b/man3/tm.3type
> > @@ -25,8 +25,26 @@ Standard C library
> >   .BR "    int  tm_yday;" \
> >   "   /* Day of the year  [" 0 ", " 365 "] (Jan/01 = " 0 ") */"
> >   .BR "    int  tm_isdst;" "  /* Daylight savings flag */"
> > +
> > +.BR "    long tm_gmtoff;" " /* Seconds East of UTC */"
> > +.BR "    char*tm_zone;" "   /* Timezone abbreviation */"
> 
> Please add cosmetic whitespace (at least 1 for every member, possibly
> 2, depending on your taste) :)

Hmmm.  I'm attaching a screenshot of Okular's rendering of the current
state of tm(3type) in the Linux man-pages Git repository to PostScript.

Recall the advice in groff's Texinfo manual.  This is from groff 1.22.4.

5.1.6 Input Conventions
-----------------------

[...]
   * Do not try to do any formatting in a WYSIWYG manner (i.e., don't
     try using spaces to get proper indentation).

Synopses in man pages, whether for section [168] commands or section
[23] C function calls or data types, are not typically set in a
monospaced typeface, nor do I think they should be.  A proportional
typeface generally looks better.

The price of that improved appearance is that the use of sequences of
spaces to get columnar alignment breaks as soon as there is variation in
the content.

The traditional solution to this problem in the *roff language is to set
tab stops.  However, man-pages(7) calls out tab stop manipulation as
unportable man(7) usage.

       *  Example programs should be laid out according  to  Kernighan
          and  Ritchie style, with 4‐space indents.  (Avoid the use of
          TAB characters in source code!)

Now, section 2 and 3 synopses are not _example program_ source code, so
a defense of tab usage could be made here, but a man page author simply
trying to get their stuff documented could be forgiven for feeling that
drawing such a distinction is hair-splitting.

Using spaces is, however, in my opinion, worse simply due to the effect
on rendered output for everything that isn't a terminal.

There are a few ways to address this issue.

A. Don't worry about it and let HTML/PostScript/PDF output look ugly.

B. Stick synopses, at least for section 2 and 3 man pages, in EX/EE
   blocks, which switch the typeface to Courier on typesetting output
   devices (which includes HTML if the groff project fixes grohtml to
   change font families--it's _supposed_ to, but something broke a long
   time ago).  My recollection is that Michael Kerrisk opposed this
   practice.  I too don't think it's a great idea; the average glyph
   width is lower in proportional fonts, so using it, you can fit more
   content on an output line.

C. Use tabs anyway.  For results that will actually get what you want,
   you will need to set the tab stops to ensure they're wide enough to
   achieve the desired alignment.  The use of custom tab stops requires
   invoking the `ta` request, and this is warned against in the
   "Portability" section of groff_man(7) (to be part of
   groff_man_style(7) in groff 1.23).  But by invoking the `nf` and `fi`
   requests for other reasons, this project's pages have already crossed
   that bridge.

C1. Actually selecting values for the tab stops can be tedious.  You can
    hard-code measurements, but it will be hard to maintain consistency
    among contributors (will you use ens, ems, inches, or centimeters as
    the scaling unit?) and, much worse, the size of the rendered
    typeface can vary.  groff_man(7) explicitly countenances selection
    of a 10-, 11-, or 12-point typefaces.  At present, no means of
    changing the default font family for body text is exposed, but it
    might be in the future.  So I expect the temptation will be to set
    tab stops for 10-point Times (but see below), which will lead to
    ugly results for other family/size selections.

C2. Clever roff writers (sometimes too clever) reach for the \w escape
    sequence to overcome this problem.  So instead of hard-coding tab
    stop lengths, they have the formatter compute them based on sample
    inputs.  For the page under discussion, this practice would lead to
    requests that look like the following.

    .ta \w'char*' \w'tm_gmtoff'

    What's happening here is that the "longest" item within each tab
    stop is getting its length computed, and those computed lengths used
    as the tab stop values.  In practice this won't quite do because it
    will leave no space between the items in the event the same row has
    two of the longest column entries adjacent, so you more often see
    something like this.

    .ta \w'char*'+1n \w'tm_gmtoff'+1n

    This ensures that the tab stops have extra one "en" of space between
    them.  It doesn't suck, but at this point your man page renderer
    needs to be sophisticated enough to include an arithmetic expression
    evaluator.  This provokes grumbles from folks like Ingo who maintain
    non-roff man page formatters.

    It is true that we could add a macro to man(7) that conceals a bit
    of this complexity.  Like this.

    .TA char gmtoff

    This certainly looks much cleaner, and in fact it closely resembles
    Texinfo's @multitable command.  But it is just a mask over the
    `ta` request of frightening appearance above, not a silver bullet.

C3. The above has the problem that it relies upon the writer to know
    which pieces of text between the tab stops are the longest.  This
    sounds like an obvious thing that no one would ever screw up.
    I think that assumption would be swiftly overturned.

    There are two big problems.  The first is maintenance.  Considering
    potential applications in Linux man-pages, you will often have
    situations where someone adds a new function or struct member to a
    synopsis.  A contributor may already be at the limit of their man(7)
    knowledge.  They may not look far enough up the page to see the `ta`
    request, may not understand it, and may not think to consider that
    they've just added a new longest item, and thus need to update that
    `ta` request.  Because that request may be outside the scope of the
    diff context, it will be easy for reviewers to overlook, too.

    The other issue is more subtle.  I predict that contributors are
    likely to reckon widths in terms of character cells, not the
    horizontal measurement of rendered text.  Because a proportional
    font is used for rendering, the results can be surprising.

	$ groff
	.nr m \w'mmm'
	.nr i \w'iiii'
	.tm m=\nm, i=\ni
	m=23340, i=11120

    In 10-point Times, "mmm" is over twice as wide as "iiii".  I dare
    say few man page contributors are going to think of this.  Not
    having Times roman's font metrics and a full adder operating in
    their heads when they're thinking about documenting an API, they
    will frequently fail to correctly select the "longest" content
    within a particular tab stop for an argument to \w in a `ta`
    request.

    Sorting this kind of thing out is a pain.  Why don't we have
    something that recognizes when we're using a series of lines with
    tabs, then reads them all and computes the tab stops necessary to
    separate them nicely?

D. Congratulations, you've discovered tbl(1).[1]

I guess my advice is to choose your poison.  I'll advise as best I can.

> I tend to prefer the em dash to be next to (no whitespace) the
> enclosed clause.  That makes it easier to mentally associate (as in a
> set of parentheses) to the clause.  I'm not sure if it's a thing of
> mine, or if it's standard practise?

"Spacing around an em dash varies. Most newspapers insert a space before
and after the dash, and many popular magazines do the same, but most
books and journals omit spacing, closing whatever comes before and after
the em dash right up next to it. This website prefers the latter, its
style requiring the closely held em dash in running text."

https://www.merriam-webster.com/words-at-play/em-dash-en-dash-how-to-use

In the groff man pages, I too "close up" any space around em dashes, but
I freely admit that this (1) doesn't look all that great in terminal
rendering [it too closely resembles other dashes--a "fullwidth" dash
taking two character cells would be preferable on purely esthetic
grounds, and probably a nightmare to get terminal emulators to cope
with] and (2) it frustrates my input style; since I don't want to use
the `\c` escape sequence, I end up putting the words immediately outside
the em-dashed aside on the "wrong" lines semantically.  Maybe I should
just get over my allergy to `\c` now that I understand how it
works.[citation needed]

> What is "&a."?  Is that documented somewhere?  I didn't know that
> abbreviature.

Having seen наб's reply, it seems of a piece with "&c.", which was
in English formerly (ca. 150 years ago) a common abbreviation for the
Latin "et cetera".  Nowadays "etc." has fully supplanted "&c." while
many native English speakers are shaky on what, exactly, it abbreviates,
even spelling it "ect." because that better aligns with English language
phonotactics.

I admit never having seen "&a." before in English writing.  Like
Germans' use of "resp.", it may be a thing non-native speakers assume
"ports" into English, but doesn't.

Regards,
Branden

[1] https://git.savannah.gnu.org/cgit/groff.git/tree/src/preproc/tbl/tbl.1.man

[-- Attachment #1.2: tm.3type.ps.png --]
[-- Type: image/png, Size: 180 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2022-07-22  3:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-19  1:56 [PATCH 1/5] tm.3type: tfix наб
2022-07-19  1:56 ` [PATCH 2/5] tm.3type: align ranges наб
2022-07-19 11:44   ` Alejandro Colomar
2022-07-19 14:14     ` наб
2022-07-19 14:25       ` Alejandro Colomar
2022-07-19  1:56 ` [PATCH 3/5] tm.3type: tm_year is year minus 1900, not since наб
2022-07-19 11:47   ` Alejandro Colomar
2022-07-19 11:50   ` Alejandro Colomar
2022-07-19  1:56 ` [PATCH 4/5] tm.3type: describe tm_zone, tm_gmtoff наб
2022-07-19 12:17   ` Alejandro Colomar
2022-07-19 12:19     ` Alejandro Colomar
2022-07-19 15:28     ` наб
2022-07-22  3:33     ` G. Branden Robinson [this message]
2022-07-22 10:57       ` Alejandro Colomar (man-pages)
2022-07-22 22:20         ` man(7) DS and DE macros (was: [PATCH 4/5] tm.3type: describe tm_zone, tm_gmtoff) G. Branden Robinson
2022-07-22 23:47           ` Alejandro Colomar
2022-07-19  1:58 ` [PATCH 5/5] ctime.3: remove struct tm vestigia наб
2022-07-19 12:18   ` Alejandro Colomar
2022-07-19 11:38 ` [PATCH 1/5] tm.3type: tfix Alejandro Colomar
2022-07-19 15:35 ` [PATCH v2 1/4] tm.3type: align ranges наб
2022-07-19 18:38   ` [PATCH v3 1/2] tm.3type: describe tm_zone, tm_gmtoff наб
2022-07-19 20:33     ` Alejandro Colomar
2022-07-19 21:36       ` наб
2022-07-19 22:09         ` Alejandro Colomar
2022-07-19 22:35     ` [PATCH v4 " наб
2022-07-19 22:36     ` [PATCH v4 2/2] ctime.3: remove struct tm vestigia наб
2022-07-20  8:30       ` Alejandro Colomar
2022-07-19 18:39   ` [PATCH v3 " наб
2022-07-19 15:35 ` [PATCH v2 2/4] tm.3type: widen member alignment to take up to const char * наб
2022-07-19 15:59   ` Alejandro Colomar
2022-07-19 15:35 ` [PATCH v2 3/4] tm.3type: describe tm_zone, tm_gmtoff наб
2022-07-19 17:11   ` Alejandro Colomar
2022-07-19 15:35 ` [PATCH v2 4/4] ctime.3: remove struct tm vestigia наб
2022-07-19 17:12   ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220722033353.ap7aqxh6uhghdcxo@illithid \
    --to=g.branden.robinson@gmail.com \
    --cc=alx.manpages@gmail.com \
    --cc=linux-man@vger.kernel.org \
    --cc=nabijaczleweli@nabijaczleweli.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.