All of lore.kernel.org
 help / color / mirror / Atom feed
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: Alejandro Colomar <alx.manpages@gmail.com>,
	Ingo Schwarze <schwarze@usta.de>,
	linux-man <linux-man@vger.kernel.org>, groff <groff@gnu.org>
Subject: Re: All caps .TH page title
Date: Thu, 21 Jul 2022 20:34:35 -0500	[thread overview]
Message-ID: <20220722013435.mkzzfscdgtechzgx@illithid> (raw)
In-Reply-To: <Ytnt4dPmkrPmL1Sh@riva.ucam.org>

[-- Attachment #1: Type: text/plain, Size: 4061 bytes --]

At 2022-07-22T01:22:57+0100, Colin Watson wrote:
> On Fri, Jul 22, 2022 at 01:16:49AM +0200, Alejandro Colomar wrote:
> > On 7/21/22 20:36, G. Branden Robinson wrote:
> > > At 2022-07-21T16:29:21+0200, Alejandro Colomar wrote:
> > > > Also, does it have any functional implications?  I'm especially
> > > > interested in knowing if that may affect in any way the ability
> > > > of man(1) to find a page when invoked as `man TIMESPEC` for
> > > > example.
> > > 
> > > My understanding is that mandb(8) indexes based solely on the
> > > second argument to the `TH` macro call and (what it interprets as)
> > > the contents of the "Name" (or "NAME") section of the page.  It
> > > parses *roff itself as best it can to determine this.  So the fact
> > > that the _first_ argument to `TH` might be in full caps doesn't
> > > deter it.  (It might in fact have made mandb(8) authors' job
> > > easier if an "honest lettercase" practice had arisen back in the
> > > day--but it didn't).
[...]
> > > Since he's a mandb(8) author/maintainer, I would again defer to
> > > Colin Watson's knowledge and expertise in this area.
[...]
> 
> The above is not quite correct.  man-db doesn't index on the .TH
> section at all, and I don't believe I've encountered the practice of
> doing so in other indexers (I could be wrong, but I think that's
> something I would have remembered if I'd noticed it).  Rather, it
> parses the "NAME" (or "Name", or a number of localized variants)
> section of pages using the man macro set for "foo \- description"
> lines and uses the left-hand side of those for page names, or
> equivalently looks for .Nm requests in pages using the mdoc macro set.

Ah, thanks, Colin.  A quick consultation of ncurses man pages reveals
that mandb(8)'s idea of the manual section comes from its place in the
directory hierarchy, not from parsing the arguments to the `TH` call.
My error!

> With the exception of handling localized variants of that section
> name, which is a pretty ugly pile of special cases, I believe this to
> be fairly traditional behaviour.  I can't say I would have done it
> that way if I'd been designing the system from scratch since it really
> involves far too much half-arsed parsing, but it seemed to be the
> usual thing to do when I came on the scene.

We could have groff man(7) and mdoc(7) recognize a register, named
`INDEX`, `DB`, or `SUMMARIZE` or something, which would cause the
package(s) to emit the required information, derived solely from page
content, in a desirable format.  Say, JSON, maybe.  Upon seeing this
register and reporting the data, the package could then invoke `nx` to
move to the next input file.

Thus, potentially, the indexing data could be generated with great
speed--you could call groff (or nroff, it wouldn't matter) with as many
man page file arguments as desired, specifying no preprocessor options
(except maybe those for preconv), and a large percentage of page content
would never even be read, let alone formatted.

Why, I wonder, was the thing not done this way in the first place?
Possibly because what follows "Name" can be arbitrary roff language
input.  However...

The "Name" section's contents can be stored in a diversion.  In normal
circumstances, this diversion's contents would be emitted immediately
upon any other `SH` call (or, for degenerate pages that declare no
sections after "Name", when the page's end macro is called[1]).

Once in a diversion, these contents are subject to "sanitization", a
feature I'm chewing over adding to the formatter.[2]  The gist is that
all the garbage (font changes, special character escape sequences) you
currently spent time parsing or stripping away is already removed or
transformed for you, leaving clean, printable ASCII or UTF-8.

At this point I pause to let the wave of horror break over my audience.

Regards,
Branden

[1] andoc.tmac contrives for this to be the case when rendering multiple
    pages.
[2] https://savannah.gnu.org/bugs/?62787

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-07-22  1:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-21 14:29 All caps .TH page name Alejandro Colomar
2022-07-21 18:36 ` G. Branden Robinson
2022-07-21 23:16   ` All caps .TH page title Alejandro Colomar
2022-07-22  0:22     ` Colin Watson
2022-07-22  1:34       ` G. Branden Robinson [this message]
2022-07-22  4:07         ` G. Branden Robinson
2022-07-22 14:44       ` Ingo Schwarze
2022-07-22  2:14     ` G. Branden Robinson
2022-07-22 10:35       ` Alejandro Colomar (man-pages)
2022-07-22 11:46         ` Alejandro Colomar
2022-07-22 19:03           ` G. Branden Robinson
2022-07-22 22:20             ` Alejandro Colomar
2022-07-23 19:29           ` Ingo Schwarze
2022-07-24 11:20             ` Alejandro Colomar (man-pages)
2022-07-24 14:57               ` Ingo Schwarze
2022-07-24 15:44                 ` G. Branden Robinson
2022-07-24 17:07                   ` FHS and packaging (was: All caps .TH page title) Alejandro Colomar
2022-07-27 16:05                   ` All caps .TH page title Ingo Schwarze
2022-07-29 11:33                     ` man0, man3head (was: All caps .TH page title) Alejandro Colomar
2022-07-29 12:31                       ` Ingo Schwarze
2022-07-29 11:43                     ` BSD and GPL " Alejandro Colomar
2022-07-24 16:17                 ` man -M tcl " Alejandro Colomar
2022-07-27 15:32                   ` Ingo Schwarze
2022-07-29 12:03                     ` Alejandro Colomar
2022-07-29 13:22                       ` Ingo Schwarze
2022-07-29 13:27                         ` Alejandro Colomar
2022-07-22 16:19   ` All caps .TH page name Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220722013435.mkzzfscdgtechzgx@illithid \
    --to=g.branden.robinson@gmail.com \
    --cc=alx.manpages@gmail.com \
    --cc=groff@gnu.org \
    --cc=linux-man@vger.kernel.org \
    --cc=schwarze@usta.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.