All of lore.kernel.org
 help / color / mirror / Atom feed
* .B, .I disable hyphenation?
@ 2021-09-12 12:56 Alejandro Colomar (man-pages)
  2021-09-12 14:47 ` Ingo Schwarze
  2021-09-12 17:27 ` G. Branden Robinson
  0 siblings, 2 replies; 5+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-09-12 12:56 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: groff, linux-man

Hi Branden,

Usually, when a manual page highlights a term, either in bold or 
italics, it usually is a special identifier (macro, function, command 
name or argument), for which hyphenation can hurt readability and even 
worse, turn it into a different valid identifier.

What about disabling hyphenation for .B and .I?
Are there any inconveniences in doing so that I can't see?


Thanks,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: .B, .I disable hyphenation?
  2021-09-12 12:56 .B, .I disable hyphenation? Alejandro Colomar (man-pages)
@ 2021-09-12 14:47 ` Ingo Schwarze
  2021-09-12 17:27 ` G. Branden Robinson
  1 sibling, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2021-09-12 14:47 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: G. Branden Robinson, linux-man, groff

Hi Alejandro,

Alejandro Colomar (man-pages) wrote on Sun, Sep 12, 2021 at 02:56:39PM +0200:

> Usually, when a manual page highlights a term, either in bold or 
> italics, it usually is a special identifier (macro, function, command 
> name or argument), for which hyphenation can hurt readability and even 
> worse, turn it into a different valid identifier.
> 
> What about disabling hyphenation for .B and .I?

I would welcome such a change.

Needless to say, that is insufficient for getting it implemented.
A change of that kind requires consensus, or at least an overwhelming
majority, among groff developers.

> Are there any inconveniences in doing so that I can't see?

I don't expect any downside at all.

For comparison, the mandoc implementation of man(1) globally disables
any kind of automatic hyphenation, even in running text not containing
any markup, even if documents explicitely request hyphenation, and
provides no way to override that global choice, neither via
compile-time nor via run-time configuration, options, or any other
means.  I don't recall user complaints about the lack of hyphenation.

In technical documentation, i think the occasional confusion that
automatic hyphenation may cause, and the occasional ugliness of
output caused by automatic hyphenation, both outweigh the potential
benefits that automatic hyphenation has in texts that are not
technical documentation (yes, i did write an implementation of
automatic hyphenation around 1984 or 1985 because i do see benefits
of automatic hyphenation for some texts outside the domain of
technical documentation).

The mandoc implementation of man(1) even goes some steps further.
It globally disables line-breaking even at *existing* hyphens
whenever the hyphen appears on (almost) *any* macro or request line,
and also if the character on either side of the hyphen is not an
ASCII letter.  Again, i do not recall complaints by users that they
desire more line-breaking at existing hyphens.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: .B, .I disable hyphenation?
  2021-09-12 12:56 .B, .I disable hyphenation? Alejandro Colomar (man-pages)
  2021-09-12 14:47 ` Ingo Schwarze
@ 2021-09-12 17:27 ` G. Branden Robinson
  2021-09-12 20:09   ` Alejandro Colomar (man-pages)
  1 sibling, 1 reply; 5+ messages in thread
From: G. Branden Robinson @ 2021-09-12 17:27 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: groff, linux-man

[-- Attachment #1: Type: text/plain, Size: 12663 bytes --]

Hi, Alex!

At 2021-09-12T14:56:39+0200, Alejandro Colomar (man-pages) wrote:
> Hi Branden,
> 
> Usually, when a manual page highlights a term, either in bold or
> italics, it usually is a special identifier (macro, function, command
> name or argument), for which hyphenation can hurt readability and even
> worse, turn it into a different valid identifier.
> 
> What about disabling hyphenation for .B and .I?
> Are there any inconveniences in doing so that I can't see?

The problem that arises is that the font styling macros are
presentational, not semantic, so it's hard to know whether someone is
using them for emphasis or to suggest syntactical information.  This is
why you made a statistical argument ("usually").

I'm hesitant to adopt your suggestion, because if we did make this
change, it would be difficult to override in the other direction, e.g.,
the page author is thinking, "yes, I'm putting emphasis here--hyphenate
the words as necessary, darn it!".  Because hyphenation had already been
suppressed, it would have to be manually added back in to every word in
the arguments to .B and .I.  Most writers of English, for good reason,
cannot be bothered to learn where the hyphenation points are and
understandably leave that chore to a computer.  (Moreover, U.S. and
Commonwealth English seem to apply different hyphenation rules.)

In my opinion it is easier, in terms of maintaining flexibility and
getting reliable results, to do what I do in the groff man page
corpus--disable hyphenation on a per-word basis when necessary.  To be
concrete, I populated the shell variable "MANS" with the man source
document files in the groff tree, and then performed this grep.

$ grep '^\.[BIR]\([BIR]\) \\%' $MANS

I got 434 matches in groff Git HEAD.  Here are 3 of them.

./src/utils/lkbib/lkbib.1.man:.IR \%@g@indxbib (@MAN1EXT@)
./src/utils/lkbib/lkbib.1.man:.IR \%@g@refer (@MAN1EXT@),
./src/utils/lkbib/lkbib.1.man:.IR \%@g@lookbib (@MAN1EXT@),

A whopping number of these are like the above: they are man page cross
references.  The `MR` macro I've been talking about for (over?) a year
now would render this usage of \% unnecessary, because MR would be
semantic and we know we _never_ want to hyphenate the name of a man
page[1].

The manual suppression of hyphenation is not necessary if you know a
word won't be hyphenated.  A trick that's been passed around on the
groff list is to have a shell one-liner handy that tells you all of the
automatic hyphenation points groff thinks a word has.

Here's the version of the "hyphen" script I use.

#!/bin/sh

for W
do
    printf ".hy 4\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' | tr -d '\n'
    echo
done

I don't have to remember or reason out which of "indxbib", "refer", or
"lookbib" will be hyphenated.  I can ask groff.

$ hyphen indxbib refer lookbib
in‐dxbib
re‐fer
look‐bib

Yup, they all need hyphens, so a leading \% is advised.  [Aside: What's
that "@g@" thing, you may ask?  Like the man page section number, it is
not groff syntax, but fodder for a sed script that replaces it during
make(1) with the command prefix configured by the person who builds
groff.  (When groff was first written in 1989-1991, it often had to
share a disk with a proprietary troff installation, and needed to stay
out of the latter's way.)  Since I can't know at man page maintenance
time what the builder will choose for a prefix, I have to assume that it
is something that is hyphenable, and so I suppress its hyphenation.]

By contrast, I don't need to suppress hyphenation for the following.
[Aside: These command names also don't collide with historical troff
names, so they don't need the command prefix, either.]

./src/utils/tfmtodit/tfmtodit.1.man:.IR groff (@MAN1EXT@),
./src/utils/tfmtodit/tfmtodit.1.man:.IR grodvi (@MAN1EXT@),

In my view, this is really not much work; I spend much more time
thinking about and recasting at the sentence or paragraph level--or
realizing there's some concept that we haven't explained adequately at
all and drafting a presentation of it...and, for that matter, composing
emails like this one--than I do worrying about hyphenation points.
Nevertheless, I recognize that many contributors of man pages to the
Linux man-pages project are _profoundly_ uninterested in typography--in
fact they may have hyphenation disabled altogether in their man page
renderer[3]--and regard every single thing they are required to learn
about *roff or man(7) syntax as one more nudge in the direction of
Markdown or some other alternative that they imagine will deliver them
to an effortless utopia where documentation practically writes itself.

I acknowledge that placement of these hyphenation control escapes looks
tedious (and it is, slightly).  If we want to fix this in the man(7)
macro language, then, in my opinion, the right way is to cross the
Rubicon and add more semantic macros.  I have never forwarded a serious
proposal along these lines because I still have full-thickness burns
over 90% of my body from exposure to DocBook 25 years ago.  The problem
that mortified me is that as soon as people get their hands on a
semantic tag they have, all too often, deployed it the highest
syntactical level of the implementation language they can locate.  In
HTML, for example, that is the element.

If we had a pair of macros that meant "my argument is a keyword" or "my
argument represents user-replaceable text", respectively, then we could
easily and reliably solve the problem at the level you're tempted to.
(Though as a matter of fact, I would _not_ disable hyphenation for
nonliterals...why should we?  They don't need to be copy and pasted
as-is--if they are, they get replaced anyway by definition--and
descriptive nonliterals are sometimes long[4], as anyone who's read a
few BNF grammars can attest.)

The smallest, tightest solution I have been managed to contemplate that
does not bloat the name space of man(7) is something along these likes.

.KW keyword [tag-space]
.VA metavar [tag-space]

Here is a straw-man example.

.KW strlen function
and
.KW strnlen
return
.KW size_t type
and take an argument
.VA s variable
that is expected to be a
.KW "const char *" type

...which would render as

strlen and strnlen return size_t and take an argument s that is expected
to be a const char *

"strlen", "strnlen", "size_t" and "const char *" would be styled as
directed elsewhere (with defaults in an.tmac or an-ext.tmac, but
possibly overridden in man.local to suite distributor or site tastes).

I wanted to end the example sentence with a period, but right away we
hit one of the problems that has prevented me from advancing this
proposal, which is the question of how to handle adjacent
punctuation...add yet another macro argument for it, or encourage usage
of the output line continuation escape \c, which historically terrifies
people?  Support multiple optional arguments, and force people to learn
to quote empty macro arguments, an inconvenience that man(7) largely
already spares them from if they practice good style[5]?

In case it needs to be pointed out, I think it's impractical for
man(7)--as a macro package--to prescribe descriptors for the "tag" name
space.  mdoc(7) somewhat notoriously maintains large catalogs of a
proliferating number of BSD-descended operating system names and
releases, a source of ongoing tedium and maintainability
frustrations[6].  DocBook's attempt to boil this ocean is what drove me
away from it and I don't want to bloat groff man(7) with something
that's going to demand community consensus--and, I expect, some amount
of heated debate--to resolve.  The virtues of _having_ a tag name space
are, I trust, well understood, and their availability is a point Ingo
takes some justified pride in with the support thereof in mandoc(1).

The Linux man-pages project is much better suited than the groff project
is to design and promulgate a set of canonical tags; to point out just
one blind spot, groff doesn't ship _any_ section 2 or 3 man pages,
whereas these sections are Linux man-pages' bread and butter (though the
long-neglected section 7 is looking better all the time and at last
fulfilling its decades-old potential).

I don't have answers to the questions I've raised, so in the meantime, I
practice the discipline of using the hyphenation control escape sequence
with the font style macros.

To conclude this epistle with some possible next steps to take, I
foresee a few possibilities.

1. Despair of popularizing this knowledge.  Encourage people to continue
   to do as they have always done, and trust more detail-oriented
   contributors like yourself to clean up .B and .I calls with
   hyphenation control escapes as required.
2. Teach people about correct usage of the \% escape in man-pages(7),
   and remind contributors about this subject about as often as you have
   to do regarding semantic newlines.
3. Lobby for a change to man(7) implementations as you originally
   suggested.  I know I've voiced some resistance to this idea, but your
   bigger challenge may be getting a hold of any maintainers of
   non-groff man(7) implementations to even field the proposal.  On the
   other hand, if groff and mandoc are all you care about, you've
   already reached the right people.  :)
4. Have Linux man-pages provide its own implementations of .B and .I to
   do what you want.  (Every Linux man-pages document could use the
   `.so` request to load such overrides.)  This might represent an
   irreconcilable conflict between your project's needs and groff, and
   I'm pretty sure no one wants to see that happen, but in the spirit of
   frankness I have to point out that this is a possibility, and one
   that may not have occurred to many Linux man-pages contributors.
5. Cross the Rubicon and develop semantic macros for man(7).  The payoff
   here is huge but the effort required will not be small.
   (Implementation is not the hard part; socializing the change and
   providing a smooth transition/deployment path for umpteen
   distributors who won't ship Linux man-pages releases in synchrony
   with any other particular thing will be much more challenging, I
   predict.  And that's not even counting the issue of standardizing a
   lexicon for the tag name space.)
6. [ObIngoSchwarze: Switch to mdoc(7).]

Regards,
Branden

[1] Erlang developers may disagree.[2]  :-|
[2] https://savannah.gnu.org/bugs/?43532
[3] Or would, if they knew it was possible.  See the `HY` register in
    the "Options" section of groff_man(7) or the `--nh` option of man-db
    man(1).
[4] Here's an example from groff_font(5) in groff Git HEAD.

       papersize format‐or‐dimension‐pair‐or‐file‐name
              Set the dimensions of the physical output medium according
              to  the argument, which is either a standard paper format,
              a pair of dimensions, or the name of  a  plain  text  file
              containing either of the foregoing.  Recognized paper for‐
              mats are the ISO and DIN formats A0–A7, B0–B7, C0–C7,  and
              D0–D7;  the  U.S.  formats letter, legal, tabloid, ledger,
              statement, and executive; and the envelope formats  com10,
              monarch, and DL.  Case is not significant for the argument
              if it holds predefined paper types.

              Alternatively, the argument can be a custom paper size  in
              the  format  length,width  (with no spaces before or after
              the comma).  Both length and width must have  a  unit  ap‐
              pended;  valid  units are “i” for inches, “c” for centime‐
              ters,  “p”  for  points,  and  “P”  for  picas.   Example:
              “12c,235p”.   An  argument that starts with a digit is al‐
              ways treated as a custom paper format.

              Finally, the argument can be a file name  (e.g.,  /etc/pa‐
              persize); if the file can be opened, troff reads the first
              line and attempts to match the above  forms.   No  comment
              syntax is supported.

              More  than one argument can be specified; troff scans from
              left to right and uses the first  valid  paper  specifica‐
              tion.

[5] https://man7.org/linux/man-pages/man7/groff_man_style.7.html#Notes
[6] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867123

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: .B, .I disable hyphenation?
  2021-09-12 17:27 ` G. Branden Robinson
@ 2021-09-12 20:09   ` Alejandro Colomar (man-pages)
  2021-09-13  0:16     ` G. Branden Robinson
  0 siblings, 1 reply; 5+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-09-12 20:09 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: groff, linux-man, Ingo Schwarze

Hi Branden,

On 9/12/21 7:27 PM, G. Branden Robinson wrote:
> Hi, Alex!
> 
> At 2021-09-12T14:56:39+0200, Alejandro Colomar (man-pages) wrote:
>> Hi Branden,
>>
>> Usually, when a manual page highlights a term, either in bold or
>> italics, it usually is a special identifier (macro, function, command
>> name or argument), for which hyphenation can hurt readability and even
>> worse, turn it into a different valid identifier.
>>
>> What about disabling hyphenation for .B and .I?
>> Are there any inconveniences in doing so that I can't see?
> 
> The problem that arises is that the font styling macros are
> presentational, not semantic, so it's hard to know whether someone is
> using them for emphasis or to suggest syntactical information.  This is
> why you made a statistical argument ("usually").

Truly, even though most cases of .B/.I are identifiers (or literals), 
some are emphasized words or phrases.

I think no identifier should ever be hyphenated, if possible, mainly due 
to the confusion with other possibly valid identifiers.

I'd also argue that for the cases when the writer wants to emphasize a 
word, hyphenating it does the opposite.  The writer wanted it to stand 
out from the rest, but now it's broken into two incomplete pieces far 
apart from each other.

I think I really want to disable hyphenation everywhere I want a word to 
stand out from the rest, be it an identifier or just an emphasized word 
or phrase.

Ingo's option of disabling hyphenation _everywhere_ in man pages seems 
too drastic to me.  There's still a lot of prose, and it's not so 
important there (although I admit both ways have their benefits; not 
saying it's wrong).  But that adds a point against the only downside I 
can see:  disabling hyphenation may (in rare occasions where many long 
identifiers are together) produce an awkward number of spaces due to 
filling; but if no-one has complained against mandoc, I guess that's not 
so terrible or doesn't happen that much.

Regards,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: .B, .I disable hyphenation?
  2021-09-12 20:09   ` Alejandro Colomar (man-pages)
@ 2021-09-13  0:16     ` G. Branden Robinson
  0 siblings, 0 replies; 5+ messages in thread
From: G. Branden Robinson @ 2021-09-13  0:16 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: groff, linux-man, Ingo Schwarze

[-- Attachment #1: Type: text/plain, Size: 5729 bytes --]

Hi, Alex!

I realized only after sending my previous giant message on this subject,
that I may have overlooked the elementary step of ensuring that readers
know of the existence and function of the *roff hyphenation control
escape sequence.

groff_man_style(7)[1] explains it as follows.

       \%     Control hyphenation.  The location of this escape sequence
              within a word marks  a  hyphenation  point,  supplementing
              groff’s  automatic hyphenation patterns.  At the beginning
              of a word, it suppresses any automatic hyphenation  points
              within; any specified with \% are still honored.

This escape sequence is ultra-portable and has provenance all the way
back to Bell Labs in the 1970s.

At 2021-09-12T22:09:37+0200, Alejandro Colomar (man-pages) wrote:
> On 9/12/21 7:27 PM, G. Branden Robinson wrote:
> > At 2021-09-12T14:56:39+0200, Alejandro Colomar (man-pages) wrote:
> > > Usually, when a manual page highlights a term, either in bold or
> > > italics, it usually is a special identifier (macro, function,
> > > command name or argument), for which hyphenation can hurt
> > > readability and even worse, turn it into a different valid
> > > identifier.
> > > 
> > > What about disabling hyphenation for .B and .I?
> > > Are there any inconveniences in doing so that I can't see?
> > 
> > The problem that arises is that the font styling macros are
> > presentational, not semantic, so it's hard to know whether someone
> > is using them for emphasis or to suggest syntactical information.
> > This is why you made a statistical argument ("usually").
> 
> Truly, even though most cases of .B/.I are identifiers (or literals),
> some are emphasized words or phrases.
> 
> I think no identifier should ever be hyphenated, if possible, mainly
> due to the confusion with other possibly valid identifiers.

Agreed.  This is especially true for those who view man pages using the
legacy 8-bit output terminal devices (ascii, latin1, cp1047), none which
have a distinct hyphen glyph, and even for users of UTF-8 terminals
where the distributor has configured a remapping of the *roff hyphen
character - to the HYPHEN-MINUS code point, making it indistinguishable
from the *roff minus sign special character \- (which is also what has
been used for the Unix option dash since Ossanna first wrote troff).

> I'd also argue that for the cases when the writer wants to emphasize a
> word, hyphenating it does the opposite.  The writer wanted it to stand
> out from the rest, but now it's broken into two incomplete pieces far
> apart from each other.

I think that's only sometimes true.  If I'm emphasizing a long passage
or an entire sentence, switching off hyphenation can be superfluous.

For example:

man2/execve.2:.B "Do not take advantage of this nonstandard and nonportable misfeature!"
man2/getunwind.2:.I Note: this system call is obsolete.
man2/ptrace.2:.I This operation is deprecated; do not use it!
man2/sysctl.2:.B This system call no longer exists on current kernels!
man2/timerfd_create.2:.I "is successfully rearmed"
man3/mallinfo.3:.B Information is returned for only the main memory allocation area.
man3/random.3:.I Numerical Recipes in C: The Art of Scientific Computing
man7/mailaddr.7:.I "This behavior is deprecated."
man7/spufs.7:.I The Cell Broadband Engine Architecture (CBEA) specification

While I was performing the above search, I came across something
unfortunate, but on a different subject.[2]

> I think I really want to disable hyphenation everywhere I want a word
> to stand out from the rest, be it an identifier or just an emphasized
> word or phrase.

That's not my preference, particularly in cases of work titles or cases
where an entire sentence is emphasized, both of which are exemplified
above.

> Ingo's option of disabling hyphenation _everywhere_ in man pages seems
> too drastic to me.  There's still a lot of prose, and it's not so
> important there (although I admit both ways have their benefits; not
> saying it's wrong).  But that adds a point against the only downside I
> can see: disabling hyphenation may (in rare occasions where many long
> identifiers are together) produce an awkward number of spaces due to
> filling; but if no-one has complained against mandoc, I guess that's
> not so terrible or doesn't happen that much.

mandoc's audience may share the proclivities of its maintainer; Ingo's
made no secret that the only two output devices he prioritizes are
terminals and HTML.  (Ingo, please correct me on this if necessary.)
groff, like troff before it, is a typesetting engine, and PDF is one of
the formats in which I proofread our documentation.  With a little bit
of care, man pages can be written to a professional quality of
typesetting.  I'd like to preserve that trait.

Is it too much to ask man page writers to remember to type \% before
language keywords when they're using them with the font style macros?  I
dare say that the groff man(7) implementation is much better documented
than it was five years ago.  What else, in your opinion, needs to happen
to improve its ease of acquisition?

Regards,
Branden

[1] https://man7.org/linux/man-pages/man7/groff_man_style.7.html

[2] There's some eye-watering stuff in bpf-helpers(7), like this:

:.B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP

Unsurprisingly to me, there's a comment at the top of the file:

.\" Man page generated from reStructuredText.

...but it doesn't identify the precise tool used in generation.  Do you
know if it was rst2man, or something else?)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-13  0:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-12 12:56 .B, .I disable hyphenation? Alejandro Colomar (man-pages)
2021-09-12 14:47 ` Ingo Schwarze
2021-09-12 17:27 ` G. Branden Robinson
2021-09-12 20:09   ` Alejandro Colomar (man-pages)
2021-09-13  0:16     ` G. Branden Robinson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.