linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: mtk.manpages@gmail.com, linux-man <linux-man@vger.kernel.org>,
	groff@gnu.org
Subject: Re: Escaping hyphens ("real" minus signs in groff)
Date: Thu, 21 Jan 2021 12:03:13 +0100	[thread overview]
Message-ID: <a1af3f5c-f3e9-4bf3-cad5-389571c45d27@gmail.com> (raw)
In-Reply-To: <20210121061158.5ul7226fgbrmodbt@localhost.localdomain>

Hello Branden,

On 1/21/21 7:12 AM, G. Branden Robinson wrote:
> [looping in groff@ because I'm characterizing an unresolved argument and
> people may want to dispute my claims]
> 
> Hi Michael!
> 
> At 2021-01-20T22:03:12+0100, Michael Kerrisk (man-pages) wrote:
>> Hi Branden,
>>
>> I wonder if I might ask for your input...
>>
>> For some time now, man-pages(7) has the text (mostly put there by me):
>>
>>    Generating optimal glyphs
>>        Where a real minus character is required (e.g., for  numbers  such
>>        as  -1,  for  man  page cross references such as utf-8(7), or when
>>        writing options that have a leading dash, such as in  ls -l),  use
>>        the following form in the man page source:
>>
>>            \-
>>
>>        This guideline applies also to code examples.
>>
>> (You even helped with this text a little, adding the piece about
>> manual page cross-references.)
>>
>> I'm having some doubts about this text. The doubts were triggered
>> after I noticed that many code snippets (inside .EX/.EE blocks) don't
>> follow this recommendation. I was about to apply a large patch that
>> fixed that when I began to wonder: is it even necessary?
> 
> Short answer: yes, I would do that.

I appreciate your long answer *very* much. But, I'm glad you started 
with the short answer :-). I have made the change.

> Long answer
> ===========
> 
> There are people who would argue (I've heard mostly from BSD people)
> that man pages should "DWIM", and always render a "-" as an ASCII 45
> hyphen-minus regardless of context, and while we're at it, it should
> stop having non-ASCII glyph mappings for `, ', ^, and ~ as well.  I
> resist this, as it's contrary to troff's semantics for these characters
> since the early 1970s.
> 
> My most recent contretemps with people about this can be found starting
> here:
> 	https://lists.gnu.org/archive/html/groff/2020-10/msg00158.html
> 
> The former groff maintainer and lead developer, Werner Lemberg, agrees
> with me on this point, but some people whose *roff horizons seem to
> extend only as far as man pages are passionately opposed.
> 
> The issue was not resolved on the groff mailing list and may not ever
> be; the instant discussion got derailed by several peoples' fascination
> with the Sun Gallant Demi font.  :-/
> 
> I share all this because it is a contentious issue and I cannot pretend
> to represent my view as a universal consensus.  It is, however, I think,
> the opinion shared by people with a fair knowledge of *roff systems and
> who perceive the man(7) macro language as an application of a
> typesetting system and not an isolated domain-specific language for man
> pages.
> 
> I got fatigued of the fight before I could share my findings about
> historical Unix manuals going back to Version 2.  I get the feeling
> people don't really care; they'll happily wield the club of historical
> continuity when it works in their favor, and discard it as irrelevant
> when it doesn't.  But I can't say _I've_ never been guilty of that
> inconsistency...

Thanks for the background.

>> Some thoughts/questions:
>>
>> * I believe that when rendering to a terminal, the use of "\-" is
>> equivalent to just "-"; they both render as a real minus sign (ASCISS
>> 055). Right?
> 
> It depends on the capabilities of the terminal, and specifically whether
> it supports any hyphen, dash, or minus glyphs apart from ASCII 055.
> None of ASCII or the ISO 8859 encodings did, and Windows-1252, which
> does, is not a popular terminal encoding among Unix/Linux users.
> 
> But Unicode also does, and Unicode _is_ popular.  If you write a "raw"
> roff document and render it to a UTF-8 terminal, you will be able to see
> a difference.

Thanks for that info on Unicode/UTF-8 terminals...

> 
> Example:
> 
> $ printf "UTF-8 \\-1\n" | groff -Tutf8 | cat -s

GOt it.

> Back when people started using UTF-8 terminals, confusion of - and \- in
> man pages was even more rampant than it is today, and groff added
> directives to the man(7) implementation[1] to deliberately degrade
> glyphs to ASCII.
> 
> .\" For UTF-8, map some characters conservatively for the sake
> .\" of easy cut and paste.
> .
> .if '\*[.T]'utf8' \{\
> .  rchar \- - ' `
> .
> .  char \- \N'45'
> .  char  - \N'45'
> .  char  ' \N'39'
> .  char  ` \N'96'
> .\}
> 
> It was intended as a stopgap measure, but thanks to development on groff
> slowing down and its maintainer retiring from the role, it's remained
> the case for about a decade, and some people now regard the stopgap as
> an eternal truth that must be preserved, lest all writers of
> documentation defect to Markdown or something.
> 
> The above probably should have been placed in the man.local file
> instead[2][3], to encourage system administrators to make transitions
> away from the stopgap as their sites or distributions deemed suitable.
> I have proposed this very thing for the next groff release, 1.23.0, but
> even that met with stiff resistance from the BSD camp.  They want cement
> poured over the code snippet above.
> 
>> * When rendering to PDF, then "\-" and "-" certainly produce different
>> results: the former produces a long dash, while the other produces a
>> rather short dash.
> 
> Yes.  Specifically, the issue depends on whether a _font_ distinguishes
> a hyphen from a minus sign.  (To a typographer, there's _no such thing_
> as a "hyphen-minus", the ISO name for ASCII 055--or at least there
> wasn't until computer character encodings forced compromises onto the
> world.) But matters are made muddy by the fact that terminal emulators
> impose another layer between the typesetter (*roff) and the fonts used
> to draw glyphs.  groff's solution is to use the encoding of the locale
> as a proxy for font coverage, which works well only if the font has
> coverage for all the glyphs of interest to a document.  Over time this
> has become increasingly true for fonts widely used in terminal emulators
> and glyphs commonly encountered in practical documents like man
> pages.[4]
> 
>> Certainly, when writing say "-1" in running text (i.e., not in a
>> .EX/.EE code example), one should use "\-1", since without the "\",
>> the dash in front of the "1" is rather anaemically small when rendered
>> in PDF.
> 
> Yes.
> 
>> The same is true when writing options strings such as "ls -l". We
>> should use "ls \-l" to avoid an anaemic hyphen in PDF.
> 
> Yes.
> 
>> When writing man-pages xrefs (e.g., utf-8), the use of "\-" produces a
>> dash that is almost too long for my taste, but is preferable to the
>> result from using "-", where the rendered dash is too small.
> 
> I share your discomfort with the length of the dash in man page xrefs,
> and also your assessment that it's the lesser evil.
> 
> Another issue to consider is that as PDF rendering technology has
> improved on Linux, it has become possible to copy and paste from PDF
> documents into a terminal window.  In my opinion we should make this
> work as well as we can.  Expert Linux users may not ever do this,
> wondering why anyone would ever try; new Linux users will quite
> reasonably expect to be able to do it.

Agreed.

>> Inside code blocks (.EX/.EE) is there any reason to use "\-" rather
>> than just "-"? Long ago I think I convinced myself that "\-" should be
>> used, but now I am not at all sure that it's necessary. Maybe I forgot
>> something, and you might remind me why "\-" is needed (and I will make
>> sure to add the reason to man-pages(7)).
> 
> Yes; the main reason is so that copy-and-paste from code examples in
> your man pages will work if people _don't_ use the degraded character
> translations in man.local, which are marked as optional.

Got it.

> And I mean copy-and-paste not just from PDF but from a terminal window.

Yes, but I have a question: "\-1" renders in PDF as a long dash 
followed by a "1". This looks okay in PDF, but if I copy and paste
into a terminal, I don't get an ASCII 45. Seems seems to contradict
what you are saying about cut-and-paste above. What am I missing?

> .EX and .EE, originating in the Version 9 Research Unix man macros, are
> "semantic" but they don't _do_ very much.  They don't change
> character-to-glyph mappings; they change the font family (on typesetter
> devices like PDF, not terminals) and turn off filling.
> 
>> Are there any other things I've missed with respect to "\-" vs "="?
> 
> Probably, but nothing I can think of right now.  <laugh>  It's a vexing
> issue.
> 
> To get back to the question you originally posed, I think the change you
> suggested (to consistently use \- in .EX/.EE regions) is sound, and will
> not frustrate correct rendering even on systems that flatten the
> distinction between the minus (\-) and hyphen (-) characters.
> 
> Please follow up with any further questions and I will do my best to
> answer them.

I don't really have any other questions, but I have tried to distill 
the  above into some text in man-pages(7) to remind myself for the
future:

[[
.PP
The use of real minus signs serves the following purposes:
.IP * 3
To provide better renderings on various targets other than
ASCII terminals,
notably in PDF and on Unicode/UTF\-8-capable terminals.
.IP *
To generate glyphs that when copied from rendered pages will
produce real minus signs when pasted into a terminal.
]]

Seem okay?

> [1] tmac/an-old.tmac
> [2] Debian does this in its /etc/groff/man.local:
> 
> [...]
>   .if n \{\
> [...]
>   .  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and the
>   .  \" former may not always be rendered in the form expected for things like
>   .  \" command-line options.  Uncomment this if you want to make sure that
>   .  \" manual pages you're writing are clear of this problem.
>   .\" uncommented by Branden, 2019-06-16 --GBR
>   .   if '\*[.T]'utf8' \
>   .     char - \[hy]
>   .
>   .  \" Debian: "\-" is more commonly used for option dashes than for minus
>   .  \" signs in manual pages, so map it to plain "-" for HTML/XHTML output
>   .  \" rather than letting it be rendered as "&minus;".
>   .  ie '\*[.T]'html' \
>   .    char \- \N'45'
>   .  el \{\
>   .    if '\*[.T]'xhtml' \
>   .      char \- \N'45'
>   .  \}
>   .\}
> 
> As you can see, I uncommented my local copy so that I could see if the
> wrong glyphs were being used in man pages.  A large part of my work on
> groff upstream has been on making the man pages better examples for
> other man page writers to follow.
> 
> [3] As can be seen from the groff mailing list thread, Ingo Schwarze of
> OpenBSD rejects the notion of man.local as a file suitable for site
> administrators to customize.  I don't know enough about OpenBSD to
> rationalize this view.
> 
> [4] To check the coverage of your terminal emulator's font, try the
> command "man groff_char".  It contains a specimen of every defined groff
> "special character" and in my opinion is a reasonable test of practical
> glyph coverage[5].  For man pages, it's probably overpowered, even, but
> man pages are merely the leading application of *roff, not the only one.

Thanks for that pointer.

> [5] I've largely rewritten the page for groff 1.23.0 (forthcoming)
> because I was unhappy with what I perceived as its lack of clarity.  A
> recent snapshot at the man-pages Web site[6] is a useful preview, but
> (unless you use something like lynx or w3m) it won't tell you anything
> about the glyph coverage of your _terminal_'s font.  In any event, the
> glyph repertoire has not changed from groff 1.22.4.
> 
> [6] https://man7.org/linux/man-pages/man7/groff_char.7.html

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2021-01-21 12:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 21:03 Escaping hyphens ("real" minus signs in groff) Michael Kerrisk (man-pages)
2021-01-21  6:12 ` G. Branden Robinson
2021-01-21 11:03   ` Michael Kerrisk (man-pages) [this message]
2021-01-21 17:42     ` Deri
2021-01-22  8:08       ` Michael Kerrisk (man-pages)
2021-01-22  3:56     ` G. Branden Robinson
2021-01-22 16:27       ` Deri
2021-01-22 17:02         ` G. Branden Robinson
2021-03-07  0:06       ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1af3f5c-f3e9-4bf3-cad5-389571c45d27@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=g.branden.robinson@gmail.com \
    --cc=groff@gnu.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).