All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: mtk.manpages@gmail.com, linux-man <linux-man@vger.kernel.org>,
	groff@gnu.org
Subject: Re: Escaping hyphens ("real" minus signs in groff)
Date: Thu, 21 Jan 2021 12:03:13 +0100	[thread overview]
Message-ID: <a1af3f5c-f3e9-4bf3-cad5-389571c45d27@gmail.com> (raw)
In-Reply-To: <20210121061158.5ul7226fgbrmodbt@localhost.localdomain>

Hello Branden,

On 1/21/21 7:12 AM, G. Branden Robinson wrote:
> [looping in groff@ because I'm characterizing an unresolved argument and
> people may want to dispute my claims]
> 
> Hi Michael!
> 
> At 2021-01-20T22:03:12+0100, Michael Kerrisk (man-pages) wrote:
>> Hi Branden,
>>
>> I wonder if I might ask for your input...
>>
>> For some time now, man-pages(7) has the text (mostly put there by me):
>>
>>    Generating optimal glyphs
>>        Where a real minus character is required (e.g., for  numbers  such
>>        as  -1,  for  man  page cross references such as utf-8(7), or when
>>        writing options that have a leading dash, such as in  ls -l),  use
>>        the following form in the man page source:
>>
>>            \-
>>
>>        This guideline applies also to code examples.
>>
>> (You even helped with this text a little, adding the piece about
>> manual page cross-references.)
>>
>> I'm having some doubts about this text. The doubts were triggered
>> after I noticed that many code snippets (inside .EX/.EE blocks) don't
>> follow this recommendation. I was about to apply a large patch that
>> fixed that when I began to wonder: is it even necessary?
> 
> Short answer: yes, I would do that.

I appreciate your long answer *very* much. But, I'm glad you started 
with the short answer :-). I have made the change.

> Long answer
> ===========
> 
> There are people who would argue (I've heard mostly from BSD people)
> that man pages should "DWIM", and always render a "-" as an ASCII 45
> hyphen-minus regardless of context, and while we're at it, it should
> stop having non-ASCII glyph mappings for `, ', ^, and ~ as well.  I
> resist this, as it's contrary to troff's semantics for these characters
> since the early 1970s.
> 
> My most recent contretemps with people about this can be found starting
> here:
> 	https://lists.gnu.org/archive/html/groff/2020-10/msg00158.html
> 
> The former groff maintainer and lead developer, Werner Lemberg, agrees
> with me on this point, but some people whose *roff horizons seem to
> extend only as far as man pages are passionately opposed.
> 
> The issue was not resolved on the groff mailing list and may not ever
> be; the instant discussion got derailed by several peoples' fascination
> with the Sun Gallant Demi font.  :-/
> 
> I share all this because it is a contentious issue and I cannot pretend
> to represent my view as a universal consensus.  It is, however, I think,
> the opinion shared by people with a fair knowledge of *roff systems and
> who perceive the man(7) macro language as an application of a
> typesetting system and not an isolated domain-specific language for man
> pages.
> 
> I got fatigued of the fight before I could share my findings about
> historical Unix manuals going back to Version 2.  I get the feeling
> people don't really care; they'll happily wield the club of historical
> continuity when it works in their favor, and discard it as irrelevant
> when it doesn't.  But I can't say _I've_ never been guilty of that
> inconsistency...

Thanks for the background.

>> Some thoughts/questions:
>>
>> * I believe that when rendering to a terminal, the use of "\-" is
>> equivalent to just "-"; they both render as a real minus sign (ASCISS
>> 055). Right?
> 
> It depends on the capabilities of the terminal, and specifically whether
> it supports any hyphen, dash, or minus glyphs apart from ASCII 055.
> None of ASCII or the ISO 8859 encodings did, and Windows-1252, which
> does, is not a popular terminal encoding among Unix/Linux users.
> 
> But Unicode also does, and Unicode _is_ popular.  If you write a "raw"
> roff document and render it to a UTF-8 terminal, you will be able to see
> a difference.

Thanks for that info on Unicode/UTF-8 terminals...

> 
> Example:
> 
> $ printf "UTF-8 \\-1\n" | groff -Tutf8 | cat -s

GOt it.

> Back when people started using UTF-8 terminals, confusion of - and \- in
> man pages was even more rampant than it is today, and groff added
> directives to the man(7) implementation[1] to deliberately degrade
> glyphs to ASCII.
> 
> .\" For UTF-8, map some characters conservatively for the sake
> .\" of easy cut and paste.
> .
> .if '\*[.T]'utf8' \{\
> .  rchar \- - ' `
> .
> .  char \- \N'45'
> .  char  - \N'45'
> .  char  ' \N'39'
> .  char  ` \N'96'
> .\}
> 
> It was intended as a stopgap measure, but thanks to development on groff
> slowing down and its maintainer retiring from the role, it's remained
> the case for about a decade, and some people now regard the stopgap as
> an eternal truth that must be preserved, lest all writers of
> documentation defect to Markdown or something.
> 
> The above probably should have been placed in the man.local file
> instead[2][3], to encourage system administrators to make transitions
> away from the stopgap as their sites or distributions deemed suitable.
> I have proposed this very thing for the next groff release, 1.23.0, but
> even that met with stiff resistance from the BSD camp.  They want cement
> poured over the code snippet above.
> 
>> * When rendering to PDF, then "\-" and "-" certainly produce different
>> results: the former produces a long dash, while the other produces a
>> rather short dash.
> 
> Yes.  Specifically, the issue depends on whether a _font_ distinguishes
> a hyphen from a minus sign.  (To a typographer, there's _no such thing_
> as a "hyphen-minus", the ISO name for ASCII 055--or at least there
> wasn't until computer character encodings forced compromises onto the
> world.) But matters are made muddy by the fact that terminal emulators
> impose another layer between the typesetter (*roff) and the fonts used
> to draw glyphs.  groff's solution is to use the encoding of the locale
> as a proxy for font coverage, which works well only if the font has
> coverage for all the glyphs of interest to a document.  Over time this
> has become increasingly true for fonts widely used in terminal emulators
> and glyphs commonly encountered in practical documents like man
> pages.[4]
> 
>> Certainly, when writing say "-1" in running text (i.e., not in a
>> .EX/.EE code example), one should use "\-1", since without the "\",
>> the dash in front of the "1" is rather anaemically small when rendered
>> in PDF.
> 
> Yes.
> 
>> The same is true when writing options strings such as "ls -l". We
>> should use "ls \-l" to avoid an anaemic hyphen in PDF.
> 
> Yes.
> 
>> When writing man-pages xrefs (e.g., utf-8), the use of "\-" produces a
>> dash that is almost too long for my taste, but is preferable to the
>> result from using "-", where the rendered dash is too small.
> 
> I share your discomfort with the length of the dash in man page xrefs,
> and also your assessment that it's the lesser evil.
> 
> Another issue to consider is that as PDF rendering technology has
> improved on Linux, it has become possible to copy and paste from PDF
> documents into a terminal window.  In my opinion we should make this
> work as well as we can.  Expert Linux users may not ever do this,
> wondering why anyone would ever try; new Linux users will quite
> reasonably expect to be able to do it.

Agreed.

>> Inside code blocks (.EX/.EE) is there any reason to use "\-" rather
>> than just "-"? Long ago I think I convinced myself that "\-" should be
>> used, but now I am not at all sure that it's necessary. Maybe I forgot
>> something, and you might remind me why "\-" is needed (and I will make
>> sure to add the reason to man-pages(7)).
> 
> Yes; the main reason is so that copy-and-paste from code examples in
> your man pages will work if people _don't_ use the degraded character
> translations in man.local, which are marked as optional.

Got it.

> And I mean copy-and-paste not just from PDF but from a terminal window.

Yes, but I have a question: "\-1" renders in PDF as a long dash 
followed by a "1". This looks okay in PDF, but if I copy and paste
into a terminal, I don't get an ASCII 45. Seems seems to contradict
what you are saying about cut-and-paste above. What am I missing?

> .EX and .EE, originating in the Version 9 Research Unix man macros, are
> "semantic" but they don't _do_ very much.  They don't change
> character-to-glyph mappings; they change the font family (on typesetter
> devices like PDF, not terminals) and turn off filling.
> 
>> Are there any other things I've missed with respect to "\-" vs "="?
> 
> Probably, but nothing I can think of right now.  <laugh>  It's a vexing
> issue.
> 
> To get back to the question you originally posed, I think the change you
> suggested (to consistently use \- in .EX/.EE regions) is sound, and will
> not frustrate correct rendering even on systems that flatten the
> distinction between the minus (\-) and hyphen (-) characters.
> 
> Please follow up with any further questions and I will do my best to
> answer them.

I don't really have any other questions, but I have tried to distill 
the  above into some text in man-pages(7) to remind myself for the
future:

[[
.PP
The use of real minus signs serves the following purposes:
.IP * 3
To provide better renderings on various targets other than
ASCII terminals,
notably in PDF and on Unicode/UTF\-8-capable terminals.
.IP *
To generate glyphs that when copied from rendered pages will
produce real minus signs when pasted into a terminal.
]]

Seem okay?

> [1] tmac/an-old.tmac
> [2] Debian does this in its /etc/groff/man.local:
> 
> [...]
>   .if n \{\
> [...]
>   .  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and the
>   .  \" former may not always be rendered in the form expected for things like
>   .  \" command-line options.  Uncomment this if you want to make sure that
>   .  \" manual pages you're writing are clear of this problem.
>   .\" uncommented by Branden, 2019-06-16 --GBR
>   .   if '\*[.T]'utf8' \
>   .     char - \[hy]
>   .
>   .  \" Debian: "\-" is more commonly used for option dashes than for minus
>   .  \" signs in manual pages, so map it to plain "-" for HTML/XHTML output
>   .  \" rather than letting it be rendered as "&minus;".
>   .  ie '\*[.T]'html' \
>   .    char \- \N'45'
>   .  el \{\
>   .    if '\*[.T]'xhtml' \
>   .      char \- \N'45'
>   .  \}
>   .\}
> 
> As you can see, I uncommented my local copy so that I could see if the
> wrong glyphs were being used in man pages.  A large part of my work on
> groff upstream has been on making the man pages better examples for
> other man page writers to follow.
> 
> [3] As can be seen from the groff mailing list thread, Ingo Schwarze of
> OpenBSD rejects the notion of man.local as a file suitable for site
> administrators to customize.  I don't know enough about OpenBSD to
> rationalize this view.
> 
> [4] To check the coverage of your terminal emulator's font, try the
> command "man groff_char".  It contains a specimen of every defined groff
> "special character" and in my opinion is a reasonable test of practical
> glyph coverage[5].  For man pages, it's probably overpowered, even, but
> man pages are merely the leading application of *roff, not the only one.

Thanks for that pointer.

> [5] I've largely rewritten the page for groff 1.23.0 (forthcoming)
> because I was unhappy with what I perceived as its lack of clarity.  A
> recent snapshot at the man-pages Web site[6] is a useful preview, but
> (unless you use something like lynx or w3m) it won't tell you anything
> about the glyph coverage of your _terminal_'s font.  In any event, the
> glyph repertoire has not changed from groff 1.22.4.
> 
> [6] https://man7.org/linux/man-pages/man7/groff_char.7.html

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2021-01-21 12:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 21:03 Escaping hyphens ("real" minus signs in groff) Michael Kerrisk (man-pages)
2021-01-21  6:12 ` G. Branden Robinson
2021-01-21 11:03   ` Michael Kerrisk (man-pages) [this message]
2021-01-21 17:42     ` Deri
2021-01-22  8:08       ` Michael Kerrisk (man-pages)
2021-01-22  3:56     ` G. Branden Robinson
2021-01-22 16:27       ` Deri
2021-01-22 17:02         ` G. Branden Robinson
2021-03-07  0:06       ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1af3f5c-f3e9-4bf3-cad5-389571c45d27@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=g.branden.robinson@gmail.com \
    --cc=groff@gnu.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.