All of lore.kernel.org
 help / color / mirror / Atom feed
* Escaping hyphens ("real" minus signs in groff)
@ 2021-01-20 21:03 Michael Kerrisk (man-pages)
  2021-01-21  6:12 ` G. Branden Robinson
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2021-01-20 21:03 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: linux-man

Hi Branden,

I wonder if I might ask for your input...

For some time now, man-pages(7) has the text (mostly put there by me):

   Generating optimal glyphs
       Where a real minus character is required (e.g., for  numbers  such
       as  -1,  for  man  page cross references such as utf-8(7), or when
       writing options that have a leading dash, such as in  ls -l),  use
       the following form in the man page source:

           \-

       This guideline applies also to code examples.

(You even helped with this text a little, adding the piece about
manual page cross-references.)

I'm having some doubts about this text. The doubts were triggered
after I noticed that many code snippets (inside .EX/.EE blocks) don't
follow this recommendation. I was about to apply a large patch that
fixed that when I began to wonder: is it even necessary?

Some thoughts/questions:

* I believe that when rendering to a terminal, the use of "\-" is
equivalent to just "-"; they both render as a real minus sign (ASCISS
055). Right?

* When rendering to PDF, then "\-" and "-" certainly produce different
results: the former produces a long dash, while the other produces a
rather short dash.

Certainly, when writing say "-1" in running text (i.e., not in a
.EX/.EE code example), one should use "\-1", since without the "\",
the dash in front of the "1" is rather anaemically small when rendered
in PDF.

The same is true when writing options strings such as "ls -l". We
should use "ls \-l" to avoid an anaemic hyphen in PDF.

When writing man-pages xrefs (e.g., utf-8), the use of "\-" produces a
dash that is almost too long for my taste, but is preferable to the
result from using "-", where the rendered dash is too small.

Inside code blocks (.EX/.EE) is there any reason to use "\-" rather
than just "-"? Long ago I think I convinced myself that "\-" should be
used, but now I am not at all sure that it's necessary. Maybe I forgot
something, and you might remind me why "\-" is needed (and I will make
sure to add the reason to man-pages(7)).

Are there any other things I've missed with respect to "\-" vs "="?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-20 21:03 Escaping hyphens ("real" minus signs in groff) Michael Kerrisk (man-pages)
@ 2021-01-21  6:12 ` G. Branden Robinson
  2021-01-21 11:03   ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 9+ messages in thread
From: G. Branden Robinson @ 2021-01-21  6:12 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 10356 bytes --]

[looping in groff@ because I'm characterizing an unresolved argument and
people may want to dispute my claims]

Hi Michael!

At 2021-01-20T22:03:12+0100, Michael Kerrisk (man-pages) wrote:
> Hi Branden,
> 
> I wonder if I might ask for your input...
> 
> For some time now, man-pages(7) has the text (mostly put there by me):
> 
>    Generating optimal glyphs
>        Where a real minus character is required (e.g., for  numbers  such
>        as  -1,  for  man  page cross references such as utf-8(7), or when
>        writing options that have a leading dash, such as in  ls -l),  use
>        the following form in the man page source:
> 
>            \-
> 
>        This guideline applies also to code examples.
> 
> (You even helped with this text a little, adding the piece about
> manual page cross-references.)
> 
> I'm having some doubts about this text. The doubts were triggered
> after I noticed that many code snippets (inside .EX/.EE blocks) don't
> follow this recommendation. I was about to apply a large patch that
> fixed that when I began to wonder: is it even necessary?

Short answer: yes, I would do that.

Long answer
===========

There are people who would argue (I've heard mostly from BSD people)
that man pages should "DWIM", and always render a "-" as an ASCII 45
hyphen-minus regardless of context, and while we're at it, it should
stop having non-ASCII glyph mappings for `, ', ^, and ~ as well.  I
resist this, as it's contrary to troff's semantics for these characters
since the early 1970s.

My most recent contretemps with people about this can be found starting
here:
	https://lists.gnu.org/archive/html/groff/2020-10/msg00158.html

The former groff maintainer and lead developer, Werner Lemberg, agrees
with me on this point, but some people whose *roff horizons seem to
extend only as far as man pages are passionately opposed.

The issue was not resolved on the groff mailing list and may not ever
be; the instant discussion got derailed by several peoples' fascination
with the Sun Gallant Demi font.  :-/

I share all this because it is a contentious issue and I cannot pretend
to represent my view as a universal consensus.  It is, however, I think,
the opinion shared by people with a fair knowledge of *roff systems and
who perceive the man(7) macro language as an application of a
typesetting system and not an isolated domain-specific language for man
pages.

I got fatigued of the fight before I could share my findings about
historical Unix manuals going back to Version 2.  I get the feeling
people don't really care; they'll happily wield the club of historical
continuity when it works in their favor, and discard it as irrelevant
when it doesn't.  But I can't say _I've_ never been guilty of that
inconsistency...

> Some thoughts/questions:
> 
> * I believe that when rendering to a terminal, the use of "\-" is
> equivalent to just "-"; they both render as a real minus sign (ASCISS
> 055). Right?

It depends on the capabilities of the terminal, and specifically whether
it supports any hyphen, dash, or minus glyphs apart from ASCII 055.
None of ASCII or the ISO 8859 encodings did, and Windows-1252, which
does, is not a popular terminal encoding among Unix/Linux users.

But Unicode also does, and Unicode _is_ popular.  If you write a "raw"
roff document and render it to a UTF-8 terminal, you will be able to see
a difference.

Example:

$ printf "UTF-8 \\-1\n" | groff -Tutf8 | cat -s

Back when people started using UTF-8 terminals, confusion of - and \- in
man pages was even more rampant than it is today, and groff added
directives to the man(7) implementation[1] to deliberately degrade
glyphs to ASCII.

.\" For UTF-8, map some characters conservatively for the sake
.\" of easy cut and paste.
.
.if '\*[.T]'utf8' \{\
.  rchar \- - ' `
.
.  char \- \N'45'
.  char  - \N'45'
.  char  ' \N'39'
.  char  ` \N'96'
.\}

It was intended as a stopgap measure, but thanks to development on groff
slowing down and its maintainer retiring from the role, it's remained
the case for about a decade, and some people now regard the stopgap as
an eternal truth that must be preserved, lest all writers of
documentation defect to Markdown or something.

The above probably should have been placed in the man.local file
instead[2][3], to encourage system administrators to make transitions
away from the stopgap as their sites or distributions deemed suitable.
I have proposed this very thing for the next groff release, 1.23.0, but
even that met with stiff resistance from the BSD camp.  They want cement
poured over the code snippet above.

> * When rendering to PDF, then "\-" and "-" certainly produce different
> results: the former produces a long dash, while the other produces a
> rather short dash.

Yes.  Specifically, the issue depends on whether a _font_ distinguishes
a hyphen from a minus sign.  (To a typographer, there's _no such thing_
as a "hyphen-minus", the ISO name for ASCII 055--or at least there
wasn't until computer character encodings forced compromises onto the
world.) But matters are made muddy by the fact that terminal emulators
impose another layer between the typesetter (*roff) and the fonts used
to draw glyphs.  groff's solution is to use the encoding of the locale
as a proxy for font coverage, which works well only if the font has
coverage for all the glyphs of interest to a document.  Over time this
has become increasingly true for fonts widely used in terminal emulators
and glyphs commonly encountered in practical documents like man
pages.[4]

> Certainly, when writing say "-1" in running text (i.e., not in a
> .EX/.EE code example), one should use "\-1", since without the "\",
> the dash in front of the "1" is rather anaemically small when rendered
> in PDF.

Yes.

> The same is true when writing options strings such as "ls -l". We
> should use "ls \-l" to avoid an anaemic hyphen in PDF.

Yes.

> When writing man-pages xrefs (e.g., utf-8), the use of "\-" produces a
> dash that is almost too long for my taste, but is preferable to the
> result from using "-", where the rendered dash is too small.

I share your discomfort with the length of the dash in man page xrefs,
and also your assessment that it's the lesser evil.

Another issue to consider is that as PDF rendering technology has
improved on Linux, it has become possible to copy and paste from PDF
documents into a terminal window.  In my opinion we should make this
work as well as we can.  Expert Linux users may not ever do this,
wondering why anyone would ever try; new Linux users will quite
reasonably expect to be able to do it.

> Inside code blocks (.EX/.EE) is there any reason to use "\-" rather
> than just "-"? Long ago I think I convinced myself that "\-" should be
> used, but now I am not at all sure that it's necessary. Maybe I forgot
> something, and you might remind me why "\-" is needed (and I will make
> sure to add the reason to man-pages(7)).

Yes; the main reason is so that copy-and-paste from code examples in
your man pages will work if people _don't_ use the degraded character
translations in man.local, which are marked as optional.

And I mean copy-and-paste not just from PDF but from a terminal window.

.EX and .EE, originating in the Version 9 Research Unix man macros, are
"semantic" but they don't _do_ very much.  They don't change
character-to-glyph mappings; they change the font family (on typesetter
devices like PDF, not terminals) and turn off filling.

> Are there any other things I've missed with respect to "\-" vs "="?

Probably, but nothing I can think of right now.  <laugh>  It's a vexing
issue.

To get back to the question you originally posed, I think the change you
suggested (to consistently use \- in .EX/.EE regions) is sound, and will
not frustrate correct rendering even on systems that flatten the
distinction between the minus (\-) and hyphen (-) characters.

Please follow up with any further questions and I will do my best to
answer them.

Regards,
Branden

[1] tmac/an-old.tmac
[2] Debian does this in its /etc/groff/man.local:

[...]
  .if n \{\
[...]
  .  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and the
  .  \" former may not always be rendered in the form expected for things like
  .  \" command-line options.  Uncomment this if you want to make sure that
  .  \" manual pages you're writing are clear of this problem.
  .\" uncommented by Branden, 2019-06-16 --GBR
  .   if '\*[.T]'utf8' \
  .     char - \[hy]
  .
  .  \" Debian: "\-" is more commonly used for option dashes than for minus
  .  \" signs in manual pages, so map it to plain "-" for HTML/XHTML output
  .  \" rather than letting it be rendered as "&minus;".
  .  ie '\*[.T]'html' \
  .    char \- \N'45'
  .  el \{\
  .    if '\*[.T]'xhtml' \
  .      char \- \N'45'
  .  \}
  .\}

As you can see, I uncommented my local copy so that I could see if the
wrong glyphs were being used in man pages.  A large part of my work on
groff upstream has been on making the man pages better examples for
other man page writers to follow.

[3] As can be seen from the groff mailing list thread, Ingo Schwarze of
OpenBSD rejects the notion of man.local as a file suitable for site
administrators to customize.  I don't know enough about OpenBSD to
rationalize this view.

[4] To check the coverage of your terminal emulator's font, try the
command "man groff_char".  It contains a specimen of every defined groff
"special character" and in my opinion is a reasonable test of practical
glyph coverage[5].  For man pages, it's probably overpowered, even, but
man pages are merely the leading application of *roff, not the only one.

[5] I've largely rewritten the page for groff 1.23.0 (forthcoming)
because I was unhappy with what I perceived as its lack of clarity.  A
recent snapshot at the man-pages Web site[6] is a useful preview, but
(unless you use something like lynx or w3m) it won't tell you anything
about the glyph coverage of your _terminal_'s font.  In any event, the
glyph repertoire has not changed from groff 1.22.4.

[6] https://man7.org/linux/man-pages/man7/groff_char.7.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-21  6:12 ` G. Branden Robinson
@ 2021-01-21 11:03   ` Michael Kerrisk (man-pages)
  2021-01-21 17:42     ` Deri
  2021-01-22  3:56     ` G. Branden Robinson
  0 siblings, 2 replies; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2021-01-21 11:03 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: mtk.manpages, linux-man, groff

Hello Branden,

On 1/21/21 7:12 AM, G. Branden Robinson wrote:
> [looping in groff@ because I'm characterizing an unresolved argument and
> people may want to dispute my claims]
> 
> Hi Michael!
> 
> At 2021-01-20T22:03:12+0100, Michael Kerrisk (man-pages) wrote:
>> Hi Branden,
>>
>> I wonder if I might ask for your input...
>>
>> For some time now, man-pages(7) has the text (mostly put there by me):
>>
>>    Generating optimal glyphs
>>        Where a real minus character is required (e.g., for  numbers  such
>>        as  -1,  for  man  page cross references such as utf-8(7), or when
>>        writing options that have a leading dash, such as in  ls -l),  use
>>        the following form in the man page source:
>>
>>            \-
>>
>>        This guideline applies also to code examples.
>>
>> (You even helped with this text a little, adding the piece about
>> manual page cross-references.)
>>
>> I'm having some doubts about this text. The doubts were triggered
>> after I noticed that many code snippets (inside .EX/.EE blocks) don't
>> follow this recommendation. I was about to apply a large patch that
>> fixed that when I began to wonder: is it even necessary?
> 
> Short answer: yes, I would do that.

I appreciate your long answer *very* much. But, I'm glad you started 
with the short answer :-). I have made the change.

> Long answer
> ===========
> 
> There are people who would argue (I've heard mostly from BSD people)
> that man pages should "DWIM", and always render a "-" as an ASCII 45
> hyphen-minus regardless of context, and while we're at it, it should
> stop having non-ASCII glyph mappings for `, ', ^, and ~ as well.  I
> resist this, as it's contrary to troff's semantics for these characters
> since the early 1970s.
> 
> My most recent contretemps with people about this can be found starting
> here:
> 	https://lists.gnu.org/archive/html/groff/2020-10/msg00158.html
> 
> The former groff maintainer and lead developer, Werner Lemberg, agrees
> with me on this point, but some people whose *roff horizons seem to
> extend only as far as man pages are passionately opposed.
> 
> The issue was not resolved on the groff mailing list and may not ever
> be; the instant discussion got derailed by several peoples' fascination
> with the Sun Gallant Demi font.  :-/
> 
> I share all this because it is a contentious issue and I cannot pretend
> to represent my view as a universal consensus.  It is, however, I think,
> the opinion shared by people with a fair knowledge of *roff systems and
> who perceive the man(7) macro language as an application of a
> typesetting system and not an isolated domain-specific language for man
> pages.
> 
> I got fatigued of the fight before I could share my findings about
> historical Unix manuals going back to Version 2.  I get the feeling
> people don't really care; they'll happily wield the club of historical
> continuity when it works in their favor, and discard it as irrelevant
> when it doesn't.  But I can't say _I've_ never been guilty of that
> inconsistency...

Thanks for the background.

>> Some thoughts/questions:
>>
>> * I believe that when rendering to a terminal, the use of "\-" is
>> equivalent to just "-"; they both render as a real minus sign (ASCISS
>> 055). Right?
> 
> It depends on the capabilities of the terminal, and specifically whether
> it supports any hyphen, dash, or minus glyphs apart from ASCII 055.
> None of ASCII or the ISO 8859 encodings did, and Windows-1252, which
> does, is not a popular terminal encoding among Unix/Linux users.
> 
> But Unicode also does, and Unicode _is_ popular.  If you write a "raw"
> roff document and render it to a UTF-8 terminal, you will be able to see
> a difference.

Thanks for that info on Unicode/UTF-8 terminals...

> 
> Example:
> 
> $ printf "UTF-8 \\-1\n" | groff -Tutf8 | cat -s

GOt it.

> Back when people started using UTF-8 terminals, confusion of - and \- in
> man pages was even more rampant than it is today, and groff added
> directives to the man(7) implementation[1] to deliberately degrade
> glyphs to ASCII.
> 
> .\" For UTF-8, map some characters conservatively for the sake
> .\" of easy cut and paste.
> .
> .if '\*[.T]'utf8' \{\
> .  rchar \- - ' `
> .
> .  char \- \N'45'
> .  char  - \N'45'
> .  char  ' \N'39'
> .  char  ` \N'96'
> .\}
> 
> It was intended as a stopgap measure, but thanks to development on groff
> slowing down and its maintainer retiring from the role, it's remained
> the case for about a decade, and some people now regard the stopgap as
> an eternal truth that must be preserved, lest all writers of
> documentation defect to Markdown or something.
> 
> The above probably should have been placed in the man.local file
> instead[2][3], to encourage system administrators to make transitions
> away from the stopgap as their sites or distributions deemed suitable.
> I have proposed this very thing for the next groff release, 1.23.0, but
> even that met with stiff resistance from the BSD camp.  They want cement
> poured over the code snippet above.
> 
>> * When rendering to PDF, then "\-" and "-" certainly produce different
>> results: the former produces a long dash, while the other produces a
>> rather short dash.
> 
> Yes.  Specifically, the issue depends on whether a _font_ distinguishes
> a hyphen from a minus sign.  (To a typographer, there's _no such thing_
> as a "hyphen-minus", the ISO name for ASCII 055--or at least there
> wasn't until computer character encodings forced compromises onto the
> world.) But matters are made muddy by the fact that terminal emulators
> impose another layer between the typesetter (*roff) and the fonts used
> to draw glyphs.  groff's solution is to use the encoding of the locale
> as a proxy for font coverage, which works well only if the font has
> coverage for all the glyphs of interest to a document.  Over time this
> has become increasingly true for fonts widely used in terminal emulators
> and glyphs commonly encountered in practical documents like man
> pages.[4]
> 
>> Certainly, when writing say "-1" in running text (i.e., not in a
>> .EX/.EE code example), one should use "\-1", since without the "\",
>> the dash in front of the "1" is rather anaemically small when rendered
>> in PDF.
> 
> Yes.
> 
>> The same is true when writing options strings such as "ls -l". We
>> should use "ls \-l" to avoid an anaemic hyphen in PDF.
> 
> Yes.
> 
>> When writing man-pages xrefs (e.g., utf-8), the use of "\-" produces a
>> dash that is almost too long for my taste, but is preferable to the
>> result from using "-", where the rendered dash is too small.
> 
> I share your discomfort with the length of the dash in man page xrefs,
> and also your assessment that it's the lesser evil.
> 
> Another issue to consider is that as PDF rendering technology has
> improved on Linux, it has become possible to copy and paste from PDF
> documents into a terminal window.  In my opinion we should make this
> work as well as we can.  Expert Linux users may not ever do this,
> wondering why anyone would ever try; new Linux users will quite
> reasonably expect to be able to do it.

Agreed.

>> Inside code blocks (.EX/.EE) is there any reason to use "\-" rather
>> than just "-"? Long ago I think I convinced myself that "\-" should be
>> used, but now I am not at all sure that it's necessary. Maybe I forgot
>> something, and you might remind me why "\-" is needed (and I will make
>> sure to add the reason to man-pages(7)).
> 
> Yes; the main reason is so that copy-and-paste from code examples in
> your man pages will work if people _don't_ use the degraded character
> translations in man.local, which are marked as optional.

Got it.

> And I mean copy-and-paste not just from PDF but from a terminal window.

Yes, but I have a question: "\-1" renders in PDF as a long dash 
followed by a "1". This looks okay in PDF, but if I copy and paste
into a terminal, I don't get an ASCII 45. Seems seems to contradict
what you are saying about cut-and-paste above. What am I missing?

> .EX and .EE, originating in the Version 9 Research Unix man macros, are
> "semantic" but they don't _do_ very much.  They don't change
> character-to-glyph mappings; they change the font family (on typesetter
> devices like PDF, not terminals) and turn off filling.
> 
>> Are there any other things I've missed with respect to "\-" vs "="?
> 
> Probably, but nothing I can think of right now.  <laugh>  It's a vexing
> issue.
> 
> To get back to the question you originally posed, I think the change you
> suggested (to consistently use \- in .EX/.EE regions) is sound, and will
> not frustrate correct rendering even on systems that flatten the
> distinction between the minus (\-) and hyphen (-) characters.
> 
> Please follow up with any further questions and I will do my best to
> answer them.

I don't really have any other questions, but I have tried to distill 
the  above into some text in man-pages(7) to remind myself for the
future:

[[
.PP
The use of real minus signs serves the following purposes:
.IP * 3
To provide better renderings on various targets other than
ASCII terminals,
notably in PDF and on Unicode/UTF\-8-capable terminals.
.IP *
To generate glyphs that when copied from rendered pages will
produce real minus signs when pasted into a terminal.
]]

Seem okay?

> [1] tmac/an-old.tmac
> [2] Debian does this in its /etc/groff/man.local:
> 
> [...]
>   .if n \{\
> [...]
>   .  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and the
>   .  \" former may not always be rendered in the form expected for things like
>   .  \" command-line options.  Uncomment this if you want to make sure that
>   .  \" manual pages you're writing are clear of this problem.
>   .\" uncommented by Branden, 2019-06-16 --GBR
>   .   if '\*[.T]'utf8' \
>   .     char - \[hy]
>   .
>   .  \" Debian: "\-" is more commonly used for option dashes than for minus
>   .  \" signs in manual pages, so map it to plain "-" for HTML/XHTML output
>   .  \" rather than letting it be rendered as "&minus;".
>   .  ie '\*[.T]'html' \
>   .    char \- \N'45'
>   .  el \{\
>   .    if '\*[.T]'xhtml' \
>   .      char \- \N'45'
>   .  \}
>   .\}
> 
> As you can see, I uncommented my local copy so that I could see if the
> wrong glyphs were being used in man pages.  A large part of my work on
> groff upstream has been on making the man pages better examples for
> other man page writers to follow.
> 
> [3] As can be seen from the groff mailing list thread, Ingo Schwarze of
> OpenBSD rejects the notion of man.local as a file suitable for site
> administrators to customize.  I don't know enough about OpenBSD to
> rationalize this view.
> 
> [4] To check the coverage of your terminal emulator's font, try the
> command "man groff_char".  It contains a specimen of every defined groff
> "special character" and in my opinion is a reasonable test of practical
> glyph coverage[5].  For man pages, it's probably overpowered, even, but
> man pages are merely the leading application of *roff, not the only one.

Thanks for that pointer.

> [5] I've largely rewritten the page for groff 1.23.0 (forthcoming)
> because I was unhappy with what I perceived as its lack of clarity.  A
> recent snapshot at the man-pages Web site[6] is a useful preview, but
> (unless you use something like lynx or w3m) it won't tell you anything
> about the glyph coverage of your _terminal_'s font.  In any event, the
> glyph repertoire has not changed from groff 1.22.4.
> 
> [6] https://man7.org/linux/man-pages/man7/groff_char.7.html

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-21 11:03   ` Michael Kerrisk (man-pages)
@ 2021-01-21 17:42     ` Deri
  2021-01-22  8:08       ` Michael Kerrisk (man-pages)
  2021-01-22  3:56     ` G. Branden Robinson
  1 sibling, 1 reply; 9+ messages in thread
From: Deri @ 2021-01-21 17:42 UTC (permalink / raw)
  To: groff; +Cc: Michael Kerrisk (man-pages), G. Branden Robinson, linux-man

On Thursday, 21 January 2021 11:03:13 GMT Michael Kerrisk (man-pages) wrote:
> > And I mean copy-and-paste not just from PDF but from a terminal window.
> 
> Yes, but I have a question: "\-1" renders in PDF as a long dash
> followed by a "1". This looks okay in PDF, but if I copy and paste
> into a terminal, I don't get an ASCII 45. Seems seems to contradict
> what you are saying about cut-and-paste above. What am I missing?

If I do:-

echo "- \- \[fi]"|groff -Tpdf | okular -

I see a hyphen, minus and fi ligature. Copying to a text document gives hyphen 
hyphen f i. The reason is because gropdf adds a ToUnicode CMAP entry to fonts 
which used the text.enc encoding when created with afmtodit. You can see a 
difference if you run:-

echo "- \- \[fi]"|groff -Tpdf -P-u | okular -

Which prevents the CMAP entry, and when you copy to text the minus unicode cha
character is seen. (On my system the fi ligature is separated into f i still 
but I suspect that is KDE being "helpful").

Cheers

Deri




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-21 11:03   ` Michael Kerrisk (man-pages)
  2021-01-21 17:42     ` Deri
@ 2021-01-22  3:56     ` G. Branden Robinson
  2021-01-22 16:27       ` Deri
  2021-03-07  0:06       ` Alejandro Colomar
  1 sibling, 2 replies; 9+ messages in thread
From: G. Branden Robinson @ 2021-01-22  3:56 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 3638 bytes --]

Hi Michael!

At 2021-01-21T12:03:13+0100, Michael Kerrisk (man-pages) wrote:
> I appreciate your long answer *very* much. But, I'm glad you started
> with the short answer :-).

Cool!  But beware, from such pressures is the practice of top-replying
born...  ;-)

> > Another issue to consider is that as PDF rendering technology has
> > improved on Linux, it has become possible to copy and paste from PDF
> > documents into a terminal window.  In my opinion we should make this
> > work as well as we can.  Expert Linux users may not ever do this,
> > wondering why anyone would ever try; new Linux users will quite
> > reasonably expect to be able to do it.
[...]
> > And I mean copy-and-paste not just from PDF but from a terminal
> > window.
> 
> Yes, but I have a question: "\-1" renders in PDF as a long dash 
> followed by a "1". This looks okay in PDF, but if I copy and paste
> into a terminal, I don't get an ASCII 45. Seems seems to contradict
> what you are saying about cut-and-paste above. What am I missing?

The gap between aspiration and implementation.  I don't think the
"copy-and-paste from PDF to terminal window" matter is completely sorted
out yet.

I'm a strident prescriptionist about preserving the distinction between
"-" and "\-" in roff documents, notably including man pages in part
because it affords us more room to design around this problem.

ASCII and ISO 8859 unified the hyphen and minus characters.  AT&T troff
and all of its descendants distinguished them.  Unicode also
distinguishes them.  But Unix has a habit of calling ASCII 055 (45
decimal) a "dash", and moreover, to much software, only the numerical
value of the code point is important.

It's quite possible that for man(7) documents rendering to PDF, we
should perform the following mapping (in the man macros).

.if '\*[.T]'pdf' \
.  char \- \N'45'

This didn't come up in my argument with (mostly?) BSD people because (1)
the immediate issue that raised concern had to do with the grave accent
and apostrophe instead and (2) everybody in that camp who spoke up on
the matter said they seldom, if ever, render man pages to PostScript or
PDF.  By that token, the above 2-liner may not be a controversial matter
to the people I was arguing with.  :)

Consider what would happen to the appearance of PDF-rendered man pages
if we encouraged all \- escaped hyphens to be rewritten as plain hyphens
in the source first, and did the following to mandate uniformity.

.if '\*[.T]'pdf' \{\
.  char \- \N'45'
.  char - \N'45'
.\}

...just as is currently done for the 'utf8' output driver, whose second
line I want kill off.

I feel that responsible stewardship of the groff man macro
implementation means considering the needs of diverse audiences.

> I don't really have any other questions, but I have tried to distill 
> the  above into some text in man-pages(7) to remind myself for the
> future:
> 
> [[
> .PP
> The use of real minus signs serves the following purposes:
> .IP * 3
> To provide better renderings on various targets other than
> ASCII terminals,
> notably in PDF and on Unicode/UTF\-8-capable terminals.
> .IP *
> To generate glyphs that when copied from rendered pages will
> produce real minus signs when pasted into a terminal.
> ]]
> 
> Seem okay?

What a "real minus sign" is is a fraught issue[1], but if for the
purposes of man-pages(7) it means the ASCII/ISO hyphen-minus, then yes,
I think it's good enough.

Regards,
Branden

[1] especially in light of the \[mi] special character escape and the
    existence of U+2212 :-/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-21 17:42     ` Deri
@ 2021-01-22  8:08       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2021-01-22  8:08 UTC (permalink / raw)
  To: Deri; +Cc: groff, G. Branden Robinson, linux-man

Hi Deri,

On Thu, 21 Jan 2021 at 18:42, Deri <deri@chuzzlewit.myzen.co.uk> wrote:
>
> On Thursday, 21 January 2021 11:03:13 GMT Michael Kerrisk (man-pages) wrote:
> > > And I mean copy-and-paste not just from PDF but from a terminal window.
> >
> > Yes, but I have a question: "\-1" renders in PDF as a long dash
> > followed by a "1". This looks okay in PDF, but if I copy and paste
> > into a terminal, I don't get an ASCII 45. Seems seems to contradict
> > what you are saying about cut-and-paste above. What am I missing?
>
> If I do:-
>
> echo "- \- \[fi]"|groff -Tpdf | okular -
>
> I see a hyphen, minus and fi ligature. Copying to a text document gives hyphen
> hyphen f i. The reason is because gropdf adds a ToUnicode CMAP entry to fonts
> which used the text.enc encoding when created with afmtodit. You can see a
> difference if you run:-
>
> echo "- \- \[fi]"|groff -Tpdf -P-u | okular -
>
> Which prevents the CMAP entry, and when you copy to text the minus unicode cha
> character is seen. (On my system the fi ligature is separated into f i still
> but I suspect that is KDE being "helpful").

Thanks! That's a helpful explanation!

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-22  3:56     ` G. Branden Robinson
@ 2021-01-22 16:27       ` Deri
  2021-01-22 17:02         ` G. Branden Robinson
  2021-03-07  0:06       ` Alejandro Colomar
  1 sibling, 1 reply; 9+ messages in thread
From: Deri @ 2021-01-22 16:27 UTC (permalink / raw)
  To: groff; +Cc: G. Branden Robinson, Michael Kerrisk (man-pages), linux-man

On Friday, 22 January 2021 03:56:00 GMT G. Branden Robinson wrote:
> The gap between aspiration and implementation.  I don't think the
> "copy-and-paste from PDF to terminal window" matter is completely sorted
> out yet.

Hi Branden,

I can't seem to make this not work. In my last email I explained how a default 
ucmap is installed in the pdfs produced by gropdf, so assuming the pdf viewer 
supports the pdf standard it should not require a change to the man macros you 
favour. I have tested using 'xpdf' as the viewer which pastes:-

- − fi	<== without ucmap
- - fi	<== with ucmap

Of course, if the pdf is produced by using grops and ghostscript the result 
will be the same as using gropdf with no ucmap, i.e. '-' and '\-' will be 
pasted as different characters.

Cheers 

Deri




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-22 16:27       ` Deri
@ 2021-01-22 17:02         ` G. Branden Robinson
  0 siblings, 0 replies; 9+ messages in thread
From: G. Branden Robinson @ 2021-01-22 17:02 UTC (permalink / raw)
  To: Deri; +Cc: groff, Michael Kerrisk (man-pages), linux-man

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

Hi Deri!

At 2021-01-22T16:27:38+0000, Deri wrote:
> On Friday, 22 January 2021 03:56:00 GMT G. Branden Robinson wrote:
> > The gap between aspiration and implementation.  I don't think the
> > "copy-and-paste from PDF to terminal window" matter is completely
> > sorted out yet.
> 
> Hi Branden,
> 
> I can't seem to make this not work. In my last email I explained how a
> default ucmap is installed in the pdfs produced by gropdf, so assuming
> the pdf viewer supports the pdf standard it should not require a
> change to the man macros you favour. I have tested using 'xpdf' as the
> viewer which pastes:-
> 
> - − fi	<== without ucmap
> - - fi	<== with ucmap
> 
> Of course, if the pdf is produced by using grops and ghostscript the
> result will be the same as using gropdf with no ucmap, i.e. '-' and
> '\-' will be pasted as different characters.

You're right!  It works for me with both evince (my usual viewer) and
xpdf as it does for you.  I had had a problem with PDF man pages in the
past but couldn't remember clearly what it was, and had thought it was
this.

But I was able to copy-and-paste and run the "ls -l" from the attached
trivial man page from the PDF without trouble:

$ groff -Tpdf -man hyphen-minus.man > hm.pdf
$ evince hm.pdf

This is actually a relief to me.  I feared that special-casing the \-
would become a camel's nose that would support the recent lobbying
effort for permanent degradation of traditional *roff glyphs to ASCII
"just for man pages".

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Escaping hyphens ("real" minus signs in groff)
  2021-01-22  3:56     ` G. Branden Robinson
  2021-01-22 16:27       ` Deri
@ 2021-03-07  0:06       ` Alejandro Colomar
  1 sibling, 0 replies; 9+ messages in thread
From: Alejandro Colomar @ 2021-03-07  0:06 UTC (permalink / raw)
  To: G. Branden Robinson, Michael Kerrisk (man-pages); +Cc: linux-man, groff

Hey Michael & Branden!

On 1/22/21 4:56 AM, G. Branden Robinson wrote:
> Hi Michael!
> 
> At 2021-01-21T12:03:13+0100, Michael Kerrisk (man-pages) wrote:
>> I appreciate your long answer *very* much. But, I'm glad you started
>> with the short answer :-).
> 
> Cool!  But beware, from such pressures is the practice of top-replying
> born...  ;-)
> 
>>> Another issue to consider is that as PDF rendering technology has
>>> improved on Linux, it has become possible to copy and paste from PDF
>>> documents into a terminal window.  In my opinion we should make this
>>> work as well as we can.  Expert Linux users may not ever do this,
>>> wondering why anyone would ever try; new Linux users will quite
>>> reasonably expect to be able to do it.
> [...]
>>> And I mean copy-and-paste not just from PDF but from a terminal
>>> window.
>>
>> Yes, but I have a question: "\-1" renders in PDF as a long dash 
>> followed by a "1". This looks okay in PDF, but if I copy and paste
>> into a terminal, I don't get an ASCII 45. Seems seems to contradict
>> what you are saying about cut-and-paste above. What am I missing?
> 
> The gap between aspiration and implementation.  I don't think the
> "copy-and-paste from PDF to terminal window" matter is completely sorted
> out yet.
> 
> I'm a strident prescriptionist about preserving the distinction between
> "-" and "\-" in roff documents, notably including man pages in part
> because it affords us more room to design around this problem.
> 
> ASCII and ISO 8859 unified the hyphen and minus characters.  AT&T troff
> and all of its descendants distinguished them.  Unicode also
> distinguishes them.  But Unix has a habit of calling ASCII 055 (45
> decimal) a "dash", and moreover, to much software, only the numerical
> value of the code point is important.
> 
> It's quite possible that for man(7) documents rendering to PDF, we
> should perform the following mapping (in the man macros).
> 
> .if '\*[.T]'pdf' \
> .  char \- \N'45'
> 
> This didn't come up in my argument with (mostly?) BSD people because (1)
> the immediate issue that raised concern had to do with the grave accent
> and apostrophe instead and (2) everybody in that camp who spoke up on
> the matter said they seldom, if ever, render man pages to PostScript or
> PDF.  By that token, the above 2-liner may not be a controversial matter
> to the people I was arguing with.  :)
> 
> Consider what would happen to the appearance of PDF-rendered man pages
> if we encouraged all \- escaped hyphens to be rewritten as plain hyphens
> in the source first, and did the following to mandate uniformity.
> 
> .if '\*[.T]'pdf' \{\
> .  char \- \N'45'
> .  char - \N'45'
> .\}
> 
> ...just as is currently done for the 'utf8' output driver, whose second
> line I want kill off.
> 
> I feel that responsible stewardship of the groff man macro
> implementation means considering the needs of diverse audiences.
> 
>> I don't really have any other questions, but I have tried to distill 
>> the  above into some text in man-pages(7) to remind myself for the
>> future:
>>
>> [[
>> .PP
>> The use of real minus signs serves the following purposes:
>> .IP * 3
>> To provide better renderings on various targets other than
>> ASCII terminals,
>> notably in PDF and on Unicode/UTF\-8-capable terminals.
>> .IP *
>> To generate glyphs that when copied from rendered pages will
>> produce real minus signs when pasted into a terminal.
>> ]]
>>
>> Seem okay?
> 
> What a "real minus sign" is is a fraught issue[1], but if for the
> purposes of man-pages(7) it means the ASCII/ISO hyphen-minus, then yes,
> I think it's good enough.
> 
> Regards,
> Branden
> 
> [1] especially in light of the \[mi] special character escape and the
>     existence of U+2212 :-/
> 

I just found another good reason to use '\-'.  I was searching for an
option of curl in their man page, and I used '/    -s', as I usually do
when I search for those.  To my surprise, it didn't find anything, in
fact, '/-' just showed two appearances of the minus sign.  However, if I
copy and paste the character from one of the options and paste it into
the pager search command line, then it finds the options.  I already
reported the bug to them.

I checked that in our pages, we can search options (see time.1).  I
wonder if there are some cases where we're producing some weird
character that can't be easily searched for.

Regards,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-07  0:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 21:03 Escaping hyphens ("real" minus signs in groff) Michael Kerrisk (man-pages)
2021-01-21  6:12 ` G. Branden Robinson
2021-01-21 11:03   ` Michael Kerrisk (man-pages)
2021-01-21 17:42     ` Deri
2021-01-22  8:08       ` Michael Kerrisk (man-pages)
2021-01-22  3:56     ` G. Branden Robinson
2021-01-22 16:27       ` Deri
2021-01-22 17:02         ` G. Branden Robinson
2021-03-07  0:06       ` Alejandro Colomar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.