linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Playground pager lsp(1)
@ 2023-03-25 20:37 Dirk Gouders
  2023-03-25 20:47 ` Dirk Gouders
  0 siblings, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-03-25 20:37 UTC (permalink / raw)
  To: Alejandro Colomar, linux-man

Hi Alejandro,

first of all, chances are that you consider this post as spam, because
this list is about linux manual pages and not pagers.  In that case
please accept my apologies and ignore this post.

My reasoning was that readers here have some interest in manual pages
and therefore probably also in pagers that claim to "understand" manual
pages.  My hope is that even if you consider this post inappropriate you
will perhaps suggest some more appropriate place for such discussion.

Not long ago, I noticed a discussion [1] about what pagers can and
cannot do.  That was interesting to me, because I am currently playing
with a pager that claims to have a focus on manual pages.

I will try to not waste your time and attach the manual page and a link
to a short (3:50) demo video.  To me it is absolutely OK should you just
ignore this spam post, but perhaps you find lsp(1) interesting enough
for further discussion.

Best regards,

Dirk

[1] https://www.spinics.net/lists/linux-man/index.html#24494
[2] https://youtu.be/syGT4POgTAw

LSP(1)                           User commands                          LSP(1)

NAME
       lsp - list pages (or least significant pager)

SYNOPSIS
       lsp [options] [file_name]...

       lsp -h

       lsp -v

DESCRIPTION
       lsp is a terminal pager that assists in paging through data, usually
       text — no more(1), no less(1).

       The given files are opened if file names are given as options.
       Otherwise lsp assumes input from stdin and tries to read from there.

       In addition to it’s ability to aid in paging through text files lsp has
       limited knowledge about manual pages and offers some help in viewing
       them:

       •   Manual pages usually refer to other manual pages and lsp allows to
           navigate those references and to visit them as new files with the
           ability to also navigate through all opened manual pages or other
           files.

           Here, lsp tries to minimize frustration caused by unavailable
           references and verifies their existance before offering them as
           references that can be visited.

       •   In windowing environments lsp does complete resizes when windows
           get resized. This means it also reloads the manual page to fit the
           new window size.

       •   Search for manual pages using apropos(1); in the current most basic
           form it lists all known manual pages ready for text search and
           visiting referenced manual pages.

       •   lsp has an experimental TOC mode.

           This is a three-level folding mode trying to list only section and
           sub-section names for quick navigation in manual pages.

           The TOC is created using naive heuristics which works well to some
           extend, but it might be incomplete. Users should keep that in mind.

OPTIONS
       All options can be given on the command line or via the environment
       variable LSP_OPTIONS. The short version of toggles can also be used as
       commands, e.g. you can input -i while paging through a file to toggle
       case sensitivity for searches.

       -a, --load-apropos
           Create an apropos pseudo-file.

       -c, --chop-lines
           Toggle chopping of lines that do not fit the current screen width.

       -h, --help
           Output help and exit.

       -i, --no-case
           Toggle case sensitivity in searches.

       -I, --man-case
           Turn on case sensitivity for names of manual pages.

           This is used for example to verify references to other manual
           pages.

       -l, --log-file
           Specify a path to where write debugging output.

       -n, --line-numbers
           Toggle visible line numbers.

       -s, --search-string
           Specify an initial search string.

       -v, --version
           Output version information of lsp and exit.

       --no-color
           Disable colored output.

       --reload-command
           Specify command to load manual pages. Default is man.

       --verify-command
           Specify command to verify the existance of references. Default is
           man -w.

       --verify-with-apropos
           Use the entries of the apropos pseudo-file for validation of
           references.

COMMANDS
       Pg-Down / Pg-Up
           Forward/backward one page, respectively.

       Key-Down / Key-Up / Mouse-Wheel down/up
           Forward/backward one line, respectively.

       CTRL-l
           In search mode: bring current match to top of the page.

       ESC
           Turn off current highlighting of matches.

       TAB / S-TAB
           Navigate to next/previous reference respectively.

       ENTER

           •   If previous command was TAB or S-TAB:

               Open reference at point, i.e. call `man <reference>'.

           •   In TOC-mode:

               Go to currently selected position in file.

       /
           Start a forward search for regular expression.

       ?
           Start a backward search for regular expression.

       B
           Change buffer; choose from list.

       a
           Create a pseudo-file with the output of `apropos .'.

           That pseudo-file contains short descriptions for all manual pages
           known to the system; those manual pages can also be opened with TAB
           / S-TAB and ENTER commands.

       b
           Backward one page

       c
           Close file currently paged.

           Exits lsp if it was the only/last file being paged.

       f
           Forward one page

       h
           Show online help with command summary.

       m
           Open another manual page.

       n
           Find next match in search.

       p
           Find previous match in search.

       q

           •   Exit lsp.

           •   In TOC-mode: switch back to normal view.

           •   In help-mode: close help file.

ENVIRONMENT
       LSP_OPTIONS
           All command line options can also be specified using this variable.

       LSP_OPEN / LESSOPEN
           Analogical to less(1), lsp supports an input preprocessor but
           currently just the two basic forms:

           One that provides the path to a replacement file and the one that
           writes the content to be paged to a pipe.

SEE ALSO
       apropos(1), less(1), man(1), more(1), pg(1)

BUGS
       Report bugs at https://github.com/dgouders/lsp

alpha-1.0e-42                     03/25/2023                            LSP(1)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Playground pager lsp(1)
  2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
@ 2023-03-25 20:47 ` Dirk Gouders
  2023-04-04 23:45   ` Alejandro Colomar
  0 siblings, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-03-25 20:47 UTC (permalink / raw)
  To: Alejandro Colomar, linux-man

Hi Alejandro,

first of all, chances are that you consider this post as spam, because
this list is about linux manual pages and not pagers.  In that case
please accept my apologies and ignore this post.

My reasoning was that readers here have some interest in manual pages
and therefore probably also in pagers that claim to "understand" manual
pages.  My hope is that even if you consider this post inappropriate you
will perhaps suggest some more appropriate place for such discussion.

Not long ago, I noticed a discussion [1] about what pagers can and
cannot do.  That was interesting to me, because I am currently playing
with a pager that claims to have a focus on manual pages.

I will try to not waste your time and attach the manual page and a link
to a short (3:50) demo video.  To me it is absolutely OK should you just
ignore this spam post, but perhaps you find lsp(1) interesting enough
for further discussion.

Best regards,

Dirk

[1] https://www.spinics.net/lists/linux-man/index.html#24494
[2] https://youtu.be/syGT4POgTAw

LSP(1)                           User commands                          LSP(1)

NAME
       lsp - list pages (or least significant pager)

SYNOPSIS
       lsp [options] [file_name]...

       lsp -h

       lsp -v

DESCRIPTION
       lsp is a terminal pager that assists in paging through data, usually
       text — no more(1), no less(1).

       The given files are opened if file names are given as options.
       Otherwise lsp assumes input from stdin and tries to read from there.

       In addition to it’s ability to aid in paging through text files lsp has
       limited knowledge about manual pages and offers some help in viewing
       them:

       •   Manual pages usually refer to other manual pages and lsp allows to
           navigate those references and to visit them as new files with the
           ability to also navigate through all opened manual pages or other
           files.

           Here, lsp tries to minimize frustration caused by unavailable
           references and verifies their existance before offering them as
           references that can be visited.

       •   In windowing environments lsp does complete resizes when windows
           get resized. This means it also reloads the manual page to fit the
           new window size.

       •   Search for manual pages using apropos(1); in the current most basic
           form it lists all known manual pages ready for text search and
           visiting referenced manual pages.

       •   lsp has an experimental TOC mode.

           This is a three-level folding mode trying to list only section and
           sub-section names for quick navigation in manual pages.

           The TOC is created using naive heuristics which works well to some
           extend, but it might be incomplete. Users should keep that in mind.

OPTIONS
       All options can be given on the command line or via the environment
       variable LSP_OPTIONS. The short version of toggles can also be used as
       commands, e.g. you can input -i while paging through a file to toggle
       case sensitivity for searches.

       -a, --load-apropos
           Create an apropos pseudo-file.

       -c, --chop-lines
           Toggle chopping of lines that do not fit the current screen width.

       -h, --help
           Output help and exit.

       -i, --no-case
           Toggle case sensitivity in searches.

       -I, --man-case
           Turn on case sensitivity for names of manual pages.

           This is used for example to verify references to other manual
           pages.

       -l, --log-file
           Specify a path to where write debugging output.

       -n, --line-numbers
           Toggle visible line numbers.

       -s, --search-string
           Specify an initial search string.

       -v, --version
           Output version information of lsp and exit.

       --no-color
           Disable colored output.

       --reload-command
           Specify command to load manual pages. Default is man.

       --verify-command
           Specify command to verify the existance of references. Default is
           man -w.

       --verify-with-apropos
           Use the entries of the apropos pseudo-file for validation of
           references.

COMMANDS
       Pg-Down / Pg-Up
           Forward/backward one page, respectively.

       Key-Down / Key-Up / Mouse-Wheel down/up
           Forward/backward one line, respectively.

       CTRL-l
           In search mode: bring current match to top of the page.

       ESC
           Turn off current highlighting of matches.

       TAB / S-TAB
           Navigate to next/previous reference respectively.

       ENTER

           •   If previous command was TAB or S-TAB:

               Open reference at point, i.e. call `man <reference>'.

           •   In TOC-mode:

               Go to currently selected position in file.

       /
           Start a forward search for regular expression.

       ?
           Start a backward search for regular expression.

       B
           Change buffer; choose from list.

       a
           Create a pseudo-file with the output of `apropos .'.

           That pseudo-file contains short descriptions for all manual pages
           known to the system; those manual pages can also be opened with TAB
           / S-TAB and ENTER commands.

       b
           Backward one page

       c
           Close file currently paged.

           Exits lsp if it was the only/last file being paged.

       f
           Forward one page

       h
           Show online help with command summary.

       m
           Open another manual page.

       n
           Find next match in search.

       p
           Find previous match in search.

       q

           •   Exit lsp.

           •   In TOC-mode: switch back to normal view.

           •   In help-mode: close help file.

ENVIRONMENT
       LSP_OPTIONS
           All command line options can also be specified using this variable.

       LSP_OPEN / LESSOPEN
           Analogical to less(1), lsp supports an input preprocessor but
           currently just the two basic forms:

           One that provides the path to a replacement file and the one that
           writes the content to be paged to a pipe.

SEE ALSO
       apropos(1), less(1), man(1), more(1), pg(1)

BUGS
       Report bugs at https://github.com/dgouders/lsp

alpha-1.0e-42                     03/25/2023                            LSP(1)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-03-25 20:47 ` Dirk Gouders
@ 2023-04-04 23:45   ` Alejandro Colomar
  2023-04-05  5:35     ` Eli Zaretskii
  2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
  0 siblings, 2 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-04 23:45 UTC (permalink / raw)
  To: Dirk Gouders, linux-man; +Cc: help-texinfo


[-- Attachment #1.1: Type: text/plain, Size: 9290 bytes --]

Hi Dirk.

On 3/25/23 21:47, Dirk Gouders wrote:
> Hi Alejandro,
> 
> first of all, chances are that you consider this post as spam, because
> this list is about linux manual pages and not pagers.

No, I don't.

>  In that case
> please accept my apologies and ignore this post.
> 
> My reasoning was that readers here have some interest in manual pages
> and therefore probably also in pagers that claim to "understand" manual
> pages.  My hope is that even if you consider this post inappropriate you
> will perhaps suggest some more appropriate place for such discussion.
> 
> Not long ago, I noticed a discussion [1] about what pagers can and
> cannot do.  That was interesting to me, because I am currently playing
> with a pager that claims to have a focus on manual pages.
> 
> I will try to not waste your time and attach the manual page and a link
> to a short (3:50) demo video.  To me it is absolutely OK should you just
> ignore this spam post, but perhaps you find lsp(1) interesting enough
> for further discussion.

If you had a Debian package, I might try it :)

Or maybe a Makefile to build from source...  What is this meson.build?

> 
> Best regards,
> 
> Dirk
> 
> [1] https://www.spinics.net/lists/linux-man/index.html#24494
> [2] https://youtu.be/syGT4POgTAw
> 
> LSP(1)                           User commands                          LSP(1)
> 
> NAME
>        lsp - list pages (or least significant pager)
> 
> SYNOPSIS
>        lsp [options] [file_name]...
> 
>        lsp -h
> 
>        lsp -v
> 
> DESCRIPTION
>        lsp is a terminal pager that assists in paging through data, usually
>        text — no more(1), no less(1).

I'd say it does quite a lot more than paging...  We could say this is some
info(1) equivalent for manual pages.

With the benefit that you don't need to implement such a system from scratch,
but just reusing the existing tools (apropos, man, whatis, ...).  It seems
something like what I would have written if I had to implement info(1) from
scratch.  I wish GNU guys had thought of this instead of developing their
own incompatible system.

> 
>        The given files are opened if file names are given as options.
>        Otherwise lsp assumes input from stdin and tries to read from there.
> 
>        In addition to it’s ability to aid in paging through text files lsp has
>        limited knowledge about manual pages and offers some help in viewing
>        them:
> 
>        •   Manual pages usually refer to other manual pages and lsp allows to
>            navigate those references and to visit them as new files with the
>            ability to also navigate through all opened manual pages or other
>            files.

Out of curiosity, is this implemented with heuristics?  Or do you rely on
semantic mdoc(7) macros?

If it's the first, how do you handle exit(1)?  Is it a reference, or is it
just code (with the meaning exit(EXIT_FAILURE))?

If it's the second, I guess it doesn't support that in man(7), right?  At
least until MR is released.

> 
>            Here, lsp tries to minimize frustration caused by unavailable
>            references and verifies their existance before offering them as
>            references that can be visited.

Do you mark these as broken references?  It is interesting to know that
there's a reference which you don't have installed.  It may prompt you to
install it and read it.  When I see a broken reference, I usually find it
with `apt-file find man3/page.3`, and then install the relevant package.

> 
>        •   In windowing environments lsp does complete resizes when windows
>            get resized. This means it also reloads the manual page to fit the
>            new window size.

Good.  This I miss it in less(1) often.  Not sure if they had any strong
reason to not support that.

> 
>        •   Search for manual pages using apropos(1); in the current most basic
>            form it lists all known manual pages ready for text search and
>            visiting referenced manual pages.

What does it bring that `apropos * | less` can't do?  If you're going the
of info(1) with full-blown system, it seems reasonable, but I never really
liked all that if it's just a new terminal and a command away from me.

> 
>        •   lsp has an experimental TOC mode.
> 
>            This is a three-level folding mode trying to list only section and
>            sub-section names for quick navigation in manual pages.

Nice, and this an important feature missing feature in info(1), as I
reported recently.  :)  Maybe they are interested in something similar.

> 
>            The TOC is created using naive heuristics which works well to some
>            extend, but it might be incomplete. Users should keep that in mind.

I guess the heuristics are just `^[^ ]` for SH and `^   [^ ]` for SS, right?
I tipically use something similar for searching for command flags, and as
you say, these just work.

Cheers,
Alex

> 
> OPTIONS
>        All options can be given on the command line or via the environment
>        variable LSP_OPTIONS. The short version of toggles can also be used as
>        commands, e.g. you can input -i while paging through a file to toggle
>        case sensitivity for searches.
> 
>        -a, --load-apropos
>            Create an apropos pseudo-file.
> 
>        -c, --chop-lines
>            Toggle chopping of lines that do not fit the current screen width.
> 
>        -h, --help
>            Output help and exit.
> 
>        -i, --no-case
>            Toggle case sensitivity in searches.
> 
>        -I, --man-case
>            Turn on case sensitivity for names of manual pages.
> 
>            This is used for example to verify references to other manual
>            pages.
> 
>        -l, --log-file
>            Specify a path to where write debugging output.
> 
>        -n, --line-numbers
>            Toggle visible line numbers.
> 
>        -s, --search-string
>            Specify an initial search string.
> 
>        -v, --version
>            Output version information of lsp and exit.
> 
>        --no-color
>            Disable colored output.
> 
>        --reload-command
>            Specify command to load manual pages. Default is man.
> 
>        --verify-command
>            Specify command to verify the existance of references. Default is
>            man -w.
> 
>        --verify-with-apropos
>            Use the entries of the apropos pseudo-file for validation of
>            references.
> 
> COMMANDS
>        Pg-Down / Pg-Up
>            Forward/backward one page, respectively.
> 
>        Key-Down / Key-Up / Mouse-Wheel down/up
>            Forward/backward one line, respectively.
> 
>        CTRL-l
>            In search mode: bring current match to top of the page.
> 
>        ESC
>            Turn off current highlighting of matches.
> 
>        TAB / S-TAB
>            Navigate to next/previous reference respectively.
> 
>        ENTER
> 
>            •   If previous command was TAB or S-TAB:
> 
>                Open reference at point, i.e. call `man <reference>'.
> 
>            •   In TOC-mode:
> 
>                Go to currently selected position in file.
> 
>        /
>            Start a forward search for regular expression.
> 
>        ?
>            Start a backward search for regular expression.
> 
>        B
>            Change buffer; choose from list.
> 
>        a
>            Create a pseudo-file with the output of `apropos .'.
> 
>            That pseudo-file contains short descriptions for all manual pages
>            known to the system; those manual pages can also be opened with TAB
>            / S-TAB and ENTER commands.
> 
>        b
>            Backward one page
> 
>        c
>            Close file currently paged.
> 
>            Exits lsp if it was the only/last file being paged.
> 
>        f
>            Forward one page
> 
>        h
>            Show online help with command summary.
> 
>        m
>            Open another manual page.
> 
>        n
>            Find next match in search.
> 
>        p
>            Find previous match in search.
> 
>        q
> 
>            •   Exit lsp.
> 
>            •   In TOC-mode: switch back to normal view.
> 
>            •   In help-mode: close help file.
> 
> ENVIRONMENT
>        LSP_OPTIONS
>            All command line options can also be specified using this variable.
> 
>        LSP_OPEN / LESSOPEN
>            Analogical to less(1), lsp supports an input preprocessor but
>            currently just the two basic forms:
> 
>            One that provides the path to a replacement file and the one that
>            writes the content to be paged to a pipe.
> 
> SEE ALSO
>        apropos(1), less(1), man(1), more(1), pg(1)
> 
> BUGS
>        Report bugs at https://github.com/dgouders/lsp
> 
> alpha-1.0e-42                     03/25/2023                            LSP(1)

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-04 23:45   ` Alejandro Colomar
@ 2023-04-05  5:35     ` Eli Zaretskii
  2023-04-06  1:10       ` Alejandro Colomar
  2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
  1 sibling, 1 reply; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-05  5:35 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: dirk, linux-man, help-texinfo

> Date: Wed, 5 Apr 2023 01:45:46 +0200
> Cc: help-texinfo@gnu.org
> From: Alejandro Colomar <alx.manpages@gmail.com>
> 
> With the benefit that you don't need to implement such a system from scratch,
> but just reusing the existing tools (apropos, man, whatis, ...).  It seems
> something like what I would have written if I had to implement info(1) from
> scratch.  I wish GNU guys had thought of this instead of developing their
> own incompatible system.

This last sentence is a misunderstanding.  The goal of Texinfo is not
to improve the man pages.  Texinfo is a completely different approach
to software documentation, which allows to write large books and then
produce various on-line and off-line formats to read and efficiently
search those books.

Man pages have no means of specifying structure and hyper-links except
by loosely-coupling pages via "SEE ALSO" cross-references at the end;
they have no means of quickly and efficiently finding some specific
subject except by text search (which usually produces a lot of false
positives).

By contrast, Texinfo documents have sectioning structure, have
cross-references that can appear where you need them and point
anywhere else in the document (or into another document).  They also
have indexing and commands that allow the reader to use the index in
order to find the subject he/she is interested in very quickly and
accurately, even if the text of the index entry doesn't appear
anywhere in the manual.

How can you document a large and flexible software package, such as
GDB or Texinfo or Emacs, in man pages?

It is a mistake to even compare these two documentation systems,
certainly in this way.

> >        •   In windowing environments lsp does complete resizes when windows
> >            get resized. This means it also reloads the manual page to fit the
> >            new window size.
> 
> Good.  This I miss it in less(1) often.  Not sure if they had any strong
> reason to not support that.

??? Why do you say 'less' doesn't support window resizing?  It does
here.

> >        •   lsp has an experimental TOC mode.
> > 
> >            This is a three-level folding mode trying to list only section and
> >            sub-section names for quick navigation in manual pages.
> 
> Nice, and this an important feature missing feature in info(1), as I
> reported recently.  :)

It isn't missing.  The TOC is presented as top-level menu in each
manual, and large manuals have also the "detailed menu" with all the
sub-nodes spelled out.  In addition, the Emacs Info reader has the
Info-toc command, which presents a structured menu with all the
sectioning levels of a manual even if the manual didn't produce it.

There are also more focused commands which present TOC-like lists
across all the manuals, which you can then navigate to read what you
deem appropriate.  See the description of "--all" command-line option
of the stand-alone Info reader.  For example, try this command:

  $ info --all e --index-search "init file"

There's also the index-apropos command from inside the stand-alone
reader (and the matching info-apropos in the Emacs Info reader).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-04 23:45   ` Alejandro Colomar
  2023-04-05  5:35     ` Eli Zaretskii
@ 2023-04-05 10:02     ` Dirk Gouders
  2023-04-05 14:19       ` Arsen Arsenović
  2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
  1 sibling, 2 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-05 10:02 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man, help-texinfo

[-- Attachment #1: Type: text/plain, Size: 5054 bytes --]

Hi Alex,

>> first of all, chances are that you consider this post as spam, because
>> this list is about linux manual pages and not pagers.
>
> No, I don't.

that's fine, thank you for taking the time give me feedback.

>> I will try to not waste your time and attach the manual page and a link
>> to a short (3:50) demo video.  To me it is absolutely OK should you just
>> ignore this spam post, but perhaps you find lsp(1) interesting enough
>> for further discussion.
>
> If you had a Debian package, I might try it :)
>
> Or maybe a Makefile to build from source...  What is this meson.build?

If you want to take a look at it: there is a branch "next" which you
might prefer as it closer resembles my current work.  There is a new
toggle "-V" that can be used to completely turn off validation.

I tried to assemble a Makefile that might work without a configure
script and attach it to the end.  A prefix /usr is the default value, if
your system prefers /usr/local you can use `make prefix=/usr/local
install`.  I hope I prepared some reasonable Makefile...

Concerning meson.build: I decided to have a look at meson as the
autobuild tool for lsp.  I am just gathering experiences with it and if
you have meson(1) installed you could use thes steps to (un)install lsp:

$ # cd to lsp directory
$ meson setup --prefix=/usr builddir ; cd builddir
$ ninja install # or uninstall

>>        •   Manual pages usually refer to other manual pages and lsp allows to
>>            navigate those references and to visit them as new files with the
>>            ability to also navigate through all opened manual pages or other
>>            files.
>
> Out of curiosity, is this implemented with heuristics?  Or do you rely on
> semantic mdoc(7) macros?

This is purely based on heuristics (regex) which is one reason for
validation of the found references.

> If it's the first, how do you handle exit(1)?  Is it a reference, or is it
> just code (with the meaning exit(EXIT_FAILURE))?

exit(1) gets recognized as a possible reference but validation will fail.

> If it's the second, I guess it doesn't support that in man(7), right?  At
> least until MR is released.

>> 
>>            Here, lsp tries to minimize frustration caused by unavailable
>>            references and verifies their existance before offering them as
>>            references that can be visited.
>
> Do you mark these as broken references?  It is interesting to know that
> there's a reference which you don't have installed.  It may prompt you to
> install it and read it.  When I see a broken reference, I usually find it
> with `apt-file find man3/page.3`, and then install the relevant package.

No, broken references aren't marked.  Usually those unavailable
references make sense, e.g. if a manual page references some program
that not everyone uses.

One example that I couldn't resolve so far is a reference to
getconf(1) for example in fpatchconf(3).  Up to now I was not able to
find out which package contains getconf(1)...

>> 
>>        •   In windowing environments lsp does complete resizes when windows
>>            get resized. This means it also reloads the manual page to fit the
>>            new window size.
>
> Good.  This I miss it in less(1) often.  Not sure if they had any strong
> reason to not support that.

Unfortunately, info(1) also doesn't do full resizes (on my system).

>> 
>>        •   Search for manual pages using apropos(1); in the current most basic
>>            form it lists all known manual pages ready for text search and
>>            visiting referenced manual pages.
>
> What does it bring that `apropos * | less` can't do?  If you're going the
> of info(1) with full-blown system, it seems reasonable, but I never really
> liked all that if it's just a new terminal and a command away from me.

You get a pseudo-file from where you can reach any manual page on the
system.  Originally I thought this to help novice users but since lsp is
my system's PAGER I use it more often than expected.  I'm missing the
ability to give keywords to apropos but that's just a matter of time to
get fixed.

>> 
>>        •   lsp has an experimental TOC mode.
>> 
>>            This is a three-level folding mode trying to list only section and
>>            sub-section names for quick navigation in manual pages.
>
> Nice, and this an important feature missing feature in info(1), as I
> reported recently.  :)  Maybe they are interested in something similar.
>
>> 
>>            The TOC is created using naive heuristics which works well to some
>>            extend, but it might be incomplete. Users should keep that in mind.
>
> I guess the heuristics are just `^[^ ]` for SH and `^   [^ ]` for SS, right?
> I tipically use something similar for searching for command flags, and as
> you say, these just work.

Yes, that is correct.  Only level 2 (0-based) does some additional
look-ahead.

Cheers,

Dirk


[-- Attachment #2: Makefile --]
[-- Type: application/octet-stream, Size: 576 bytes --]

version=\"$(shell cat .version)\"
CFLAGS := $(shell pkg-config --cflags ncursesw)
CFLAGS += -DLSP_VERSION=$(version)
LDFLAGS := $(shell pkg-config --libs ncursesw)

ifeq ($(prefix),)
	prefix := /usr
endif

lsp: lsp.c
	gcc $(CFLAGS) $(LDFLAGS) -o $@ $<

doc/lsp.1: doc/lsp.adoc
	a2x --doctype manpage --format manpage -a lsp-version=$(version) $<

.PHONY: uninstall install

install: lsp doc/lsp.1 doc/lsp-help.1
	install lsp $(prefix)/bin
	install doc/lsp.1 doc/lsp-help.1 $(prefix)/share/man/man1/

uninstall:
	rm $(prefix)/bin/lsp
	rm $(prefix)/share/man/man1/lsp{,-help}.1

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
@ 2023-04-05 14:19       ` Arsen Arsenović
  2023-04-05 18:01         ` Dirk Gouders
  2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
  1 sibling, 1 reply; 73+ messages in thread
From: Arsen Arsenović @ 2023-04-05 14:19 UTC (permalink / raw)
  To: Dirk Gouders; +Cc: Alejandro Colomar, linux-man, help-texinfo

[-- Attachment #1: Type: text/plain, Size: 5475 bytes --]


Dirk Gouders <dirk@gouders.net> writes:

> Hi Alex,
>
>>> first of all, chances are that you consider this post as spam, because
>>> this list is about linux manual pages and not pagers.
>>
>> No, I don't.
>
> that's fine, thank you for taking the time give me feedback.
>
>>> I will try to not waste your time and attach the manual page and a link
>>> to a short (3:50) demo video.  To me it is absolutely OK should you just
>>> ignore this spam post, but perhaps you find lsp(1) interesting enough
>>> for further discussion.
>>
>> If you had a Debian package, I might try it :)
>>
>> Or maybe a Makefile to build from source...  What is this meson.build?
>
> If you want to take a look at it: there is a branch "next" which you
> might prefer as it closer resembles my current work.  There is a new
> toggle "-V" that can be used to completely turn off validation.
>
> I tried to assemble a Makefile that might work without a configure
> script and attach it to the end.  A prefix /usr is the default value, if
> your system prefers /usr/local you can use `make prefix=/usr/local
> install`.  I hope I prepared some reasonable Makefile...
>
> Concerning meson.build: I decided to have a look at meson as the
> autobuild tool for lsp.  I am just gathering experiences with it and if
> you have meson(1) installed you could use thes steps to (un)install lsp:
>
> $ # cd to lsp directory
> $ meson setup --prefix=/usr builddir ; cd builddir
> $ ninja install # or uninstall
>
>>>        •   Manual pages usually refer to other manual pages and lsp allows to
>>>            navigate those references and to visit them as new files with the
>>>            ability to also navigate through all opened manual pages or other
>>>            files.
>>
>> Out of curiosity, is this implemented with heuristics?  Or do you rely on
>> semantic mdoc(7) macros?
>
> This is purely based on heuristics (regex) which is one reason for
> validation of the found references.
>
>> If it's the first, how do you handle exit(1)?  Is it a reference, or is it
>> just code (with the meaning exit(EXIT_FAILURE))?
>
> exit(1) gets recognized as a possible reference but validation will fail.
>
>> If it's the second, I guess it doesn't support that in man(7), right?  At
>> least until MR is released.
>
>>> 
>>>            Here, lsp tries to minimize frustration caused by unavailable
>>>            references and verifies their existance before offering them as
>>>            references that can be visited.
>>
>> Do you mark these as broken references?  It is interesting to know that
>> there's a reference which you don't have installed.  It may prompt you to
>> install it and read it.  When I see a broken reference, I usually find it
>> with `apt-file find man3/page.3`, and then install the relevant package.
>
> No, broken references aren't marked.  Usually those unavailable
> references make sense, e.g. if a manual page references some program
> that not everyone uses.
>
> One example that I couldn't resolve so far is a reference to
> getconf(1) for example in fpatchconf(3).  Up to now I was not able to
> find out which package contains getconf(1)...
>
>>> 
>>>        •   In windowing environments lsp does complete resizes when windows
>>>            get resized. This means it also reloads the manual page to fit the
>>>            new window size.
>>
>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
>> reason to not support that.
>
> Unfortunately, info(1) also doesn't do full resizes (on my system).

Do you mean the info pages' column limit or that the viewer itself
doesn't resize to fit the frame?  The latter would be a bug.

>>> 
>>>        •   Search for manual pages using apropos(1); in the current most basic
>>>            form it lists all known manual pages ready for text search and
>>>            visiting referenced manual pages.
>>
>> What does it bring that `apropos * | less` can't do?  If you're going the
>> of info(1) with full-blown system, it seems reasonable, but I never really
>> liked all that if it's just a new terminal and a command away from me.
>
> You get a pseudo-file from where you can reach any manual page on the
> system.  Originally I thought this to help novice users but since lsp is
> my system's PAGER I use it more often than expected.  I'm missing the
> ability to give keywords to apropos but that's just a matter of time to
> get fixed.
>
>>> 
>>>        •   lsp has an experimental TOC mode.
>>> 
>>>            This is a three-level folding mode trying to list only section and
>>>            sub-section names for quick navigation in manual pages.
>>
>> Nice, and this an important feature missing feature in info(1), as I
>> reported recently.  :)  Maybe they are interested in something similar.
>>
>>> 
>>>            The TOC is created using naive heuristics which works well to some
>>>            extend, but it might be incomplete. Users should keep that in mind.
>>
>> I guess the heuristics are just `^[^ ]` for SH and `^   [^ ]` for SS, right?
>> I tipically use something similar for searching for command flags, and as
>> you say, these just work.
>
> Yes, that is correct.  Only level 2 (0-based) does some additional
> look-ahead.
>
> Cheers,
>
> Dirk
>
> [2. Makefile --- application/octet-stream; Makefile.new]...


-- 
Arsen Arsenović

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-05 14:19       ` Arsen Arsenović
@ 2023-04-05 18:01         ` Dirk Gouders
  2023-04-05 19:07           ` Eli Zaretskii
  0 siblings, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-04-05 18:01 UTC (permalink / raw)
  To: Arsen Arsenović; +Cc: Alejandro Colomar, linux-man, help-texinfo

Arsen Arsenović <arsen@aarsen.me> writes:

>>>>        •   In windowing environments lsp does complete resizes when windows
>>>>            get resized. This means it also reloads the manual page to fit the
>>>>            new window size.
>>>
>>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
>>> reason to not support that.
>>
>> Unfortunately, info(1) also doesn't do full resizes (on my system).
>
> Do you mean the info pages' column limit or that the viewer itself
> doesn't resize to fit the frame?  The latter would be a bug.

Yes, I meant the column limit.  Sorry for not having expressed this very
clear.

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-05 18:01         ` Dirk Gouders
@ 2023-04-05 19:07           ` Eli Zaretskii
  2023-04-05 19:56             ` Dirk Gouders
  2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
  0 siblings, 2 replies; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-05 19:07 UTC (permalink / raw)
  To: Dirk Gouders; +Cc: arsen, alx.manpages, linux-man, help-texinfo

> From: Dirk Gouders <dirk@gouders.net>
> Cc: Alejandro Colomar <alx.manpages@gmail.com>, linux-man@vger.kernel.org,
>  help-texinfo@gnu.org
> Date: Wed, 05 Apr 2023 20:01:56 +0200
> 
> Arsen Arsenović <arsen@aarsen.me> writes:
> 
> >>>>        •   In windowing environments lsp does complete resizes when windows
> >>>>            get resized. This means it also reloads the manual page to fit the
> >>>>            new window size.
> >>>
> >>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
> >>> reason to not support that.
> >>
> >> Unfortunately, info(1) also doesn't do full resizes (on my system).
> >
> > Do you mean the info pages' column limit or that the viewer itself
> > doesn't resize to fit the frame?  The latter would be a bug.
> 
> Yes, I meant the column limit.  Sorry for not having expressed this very
> clear.

Info files are formatted already, you cannot ask the reader to
reformat them for a different line length.

With man pages this is only possible if you never keep the formatted
pages and reuse them once they were produced.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-05 19:07           ` Eli Zaretskii
@ 2023-04-05 19:56             ` Dirk Gouders
  2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
  1 sibling, 0 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-05 19:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: arsen, alx.manpages, linux-man, help-texinfo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Dirk Gouders <dirk@gouders.net>
>> Cc: Alejandro Colomar <alx.manpages@gmail.com>, linux-man@vger.kernel.org,
>>  help-texinfo@gnu.org
>> Date: Wed, 05 Apr 2023 20:01:56 +0200
>> 
>> Arsen Arsenović <arsen@aarsen.me> writes:
>> 
>> >>>>        •   In windowing environments lsp does complete resizes when windows
>> >>>>            get resized. This means it also reloads the manual page to fit the
>> >>>>            new window size.
>> >>>
>> >>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
>> >>> reason to not support that.
>> >>
>> >> Unfortunately, info(1) also doesn't do full resizes (on my system).
>> >
>> > Do you mean the info pages' column limit or that the viewer itself
>> > doesn't resize to fit the frame?  The latter would be a bug.
>> 
>> Yes, I meant the column limit.  Sorry for not having expressed this very
>> clear.
>
> Info files are formatted already, you cannot ask the reader to
> reformat them for a different line length.

Thank you for that explanation; I didn't know that and now understand
info(1)'s behavior.

Dirk

> With man pages this is only possible if you never keep the formatted
> pages and reuse them once they were produced.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* A less presumptive .info? (was: Re: Playground pager lsp(1))
  2023-04-05 19:07           ` Eli Zaretskii
  2023-04-05 19:56             ` Dirk Gouders
@ 2023-04-05 20:38             ` Arsen Arsenović
  2023-04-06  8:14               ` Eli Zaretskii
  1 sibling, 1 reply; 73+ messages in thread
From: Arsen Arsenović @ 2023-04-05 20:38 UTC (permalink / raw)
  To: Eli Zaretskii, Gavin Smith
  Cc: Dirk Gouders, alx.manpages, linux-man, help-texinfo

[-- Attachment #1: Type: text/plain, Size: 2296 bytes --]


Eli Zaretskii <eliz@gnu.org> writes:

> Info files are formatted already, you cannot ask the reader to
> reformat them for a different line length.
>
> With man pages this is only possible if you never keep the formatted
> pages and reuse them once they were produced.

I've been casually wondering if creating a new format that can host more
formatting options and uses more precise syntax than 'plaintext with
some binary tags' would be a decent thing to work on.

My thoughts were brief and undeveloped as this was thought of on the
commute, but something that retains the binary offsets for indices and
tags, but stores formatted data (perhaps as s-exprs, those would be easy
to parse).  It is always easier to remove information than to
reintroduce it.

Such a structure should resemble the input language, but with far less
complexity (e.g. something at the level of abstraction that HTML5 sits
at, so, macros would be expanded, and we'd be dealing with lists of
paragraphs and formatted blocks, etc.).

This would allow for the reflowing that was talked about in this thread,
and provide more readable output in graphical contexts, as it wouldn't
be data generated with the assumption of a monospace font (rather, the
format could store whether your context wants monospace or proportional
fonts at a given point), or data generated for a given screen size, or
with a given indentation size, or with the assumption of a lack of
features like italics, etc.

For instance, info2html used by the KDE info viewer currently produces
quite terrible results, because it fails to implement the heuristics the
Info viewers have properly.  This problem would be hard to have with a
better "at-rest" format for Info pages.

The alternative is, of course, bringing HTML up to par feature-wise
(wrt. indices etc), but that'd be on the other end of the extreme, where
instead of being too easy to parse and lacking important information,
it'd be oververbose with and difficult to parse (not that such a thing
should not be done too, so that folks using ordinary browsers can enjoy
documentation, and so that projects can provide more accessible
documentation by the merit of more people having HTML than Info
viewers).

WDYT folks?
-- 
Arsen Arsenović

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 381 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-05  5:35     ` Eli Zaretskii
@ 2023-04-06  1:10       ` Alejandro Colomar
  2023-04-06  8:11         ` Eli Zaretskii
  2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
  0 siblings, 2 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-06  1:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dirk, linux-man, help-texinfo


[-- Attachment #1.1: Type: text/plain, Size: 7586 bytes --]

Hi Eli!

On 4/5/23 07:35, Eli Zaretskii wrote:
>> Date: Wed, 5 Apr 2023 01:45:46 +0200
>> Cc: help-texinfo@gnu.org
>> From: Alejandro Colomar <alx.manpages@gmail.com>
>>
>> With the benefit that you don't need to implement such a system from scratch,
>> but just reusing the existing tools (apropos, man, whatis, ...).  It seems
>> something like what I would have written if I had to implement info(1) from
>> scratch.  I wish GNU guys had thought of this instead of developing their
>> own incompatible system.
> 
> This last sentence is a misunderstanding.  The goal of Texinfo is not
> to improve the man pages.  Texinfo is a completely different approach
> to software documentation, which allows to write large books and then
> produce various on-line and off-line formats to read and efficiently
> search those books.

"The manual was intended to be typeset; some detail is sacrificed on
terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_,
Eighth Edition, Volume 1, February 1985)

You mean books like this one?  Courtesy of groff(1)'s Deri James =)
<https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.04.01.pdf>

Or maybe you prefer HTML?
<https://man7.org/linux/man-pages/man1/intro.1.html>

As to efficiency, I'm not going to open that melon, because we're
both very biased to be efficient on the formats we each maintain.
I'll just say that I don't see an objective winner in those terms.

About variety of output formats, anything that can be produced by
groff(1), man(7) can be translated.  And groff(1) can do many formats.

> 
> Man pages have no means of specifying structure

.SH, .SS, .TP, .TQ, and very soon (hopefully weeks not months) .MR

Those can be used to produce very precise links such as this one
(one of my favourite references when reviewing man-pages patches):
<https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.04.01.pdf#pdf%3Abm11886>

And there's still room for improvement over what you'll see in that
PDF, or what you can see in <man7.org>.

> and hyper-links except
> by loosely-coupling pages via "SEE ALSO" cross-references at the end;
> they have no means of quickly and efficiently finding some specific
> subject except by text search (which usually produces a lot of false
> positives).

I guess you mean searching from the command line by the name of the
parameter to a function, or what?  I would be interested in a more
detailed description of what you want to be able to search in current
pages (hopefully ones that I maintain, so I can speak of them) that
you can't find easily?  Maybe I can help making something more
accessible.

lsp(1) helps a little bit making the structure of man pages navigable,
and it's currently implemented using heuristics, but if it worked
together with groff(1) to get the real source of truth, it could get
precise data without needing heuristics.

> 
> By contrast, Texinfo documents have sectioning structure, have
> cross-references that can appear where you need them and point
> anywhere else in the document (or into another document).

This was discussed as a possible extension to '.MR'.  We're just not
sure that there's a real need for that in manual pages (although
there's not a consensus on that regard, and Branden, which I'm sure
is reading this, may jump in at any moment :).

>  They also
> have indexing and commands that allow the reader to use the index in
> order to find the subject he/she is interested in very quickly and

You mean whatis(1) and apropos(1)?  lsp(1) makes use of those to be
able to navigate all pages in the system (I guess this is more or
less what info(1) does; with the obvious differences due to how
nodes are organized).

> accurately, even if the text of the index entry doesn't appear
> anywhere in the manual.

man pages have several ways:

-  Including keywords in the NAME section.
-  Link pages.
-  TH line.

Of course, this is for the terminal.  For PDF or HTML, you can
get hyperlinks to any subsection (and in the future maybe even
tagged paragraphs) within a page.

> 
> How can you document a large and flexible software package, such as
> GDB or Texinfo or Emacs, in man pages?

git is a huge program, yet its man pages are quite useful.
Just split your documentation at the right boundary, which
usually requires a good design for your software that allows
such division.

$ apt-file show git-man | wc -l
190

> 
> It is a mistake to even compare these two documentation systems,
> certainly in this way.

The fact that current man(1) implementations don't exploit
the whole power of man(7) doesn't mean you can't design a
software that does.

I'm sure you could build something similar to info(1) that
got man(7) pages as its input.

That PDF linked above is just a starter of what we want to
do in the not far future.  Hopefully we can also get some
time to work on HTML.

> 
>>>        •   In windowing environments lsp does complete resizes when windows
>>>            get resized. This means it also reloads the manual page to fit the
>>>            new window size.
>>
>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
>> reason to not support that.
> 
> ??? Why do you say 'less' doesn't support window resizing?  It does
> here.

Hmm, now that I think, it's probably an issue of coordinating
man(1) and less(1).  I sometimes wish that when I resize a
window where I'm reading a man page, it would reformat the page
from source.  Of course, that might be a problem for keeping
track of where I was, since lines moved around.  Not sure how
good lsp(1) is at that.

> 
>>>        •   lsp has an experimental TOC mode.
>>>
>>>            This is a three-level folding mode trying to list only section and
>>>            sub-section names for quick navigation in manual pages.
>>
>> Nice, and this an important feature missing feature in info(1), as I
>> reported recently.  :)
> 
> It isn't missing.  The TOC is presented as top-level menu in each
> manual, and large manuals have also the "detailed menu" with all the
> sub-nodes spelled out.  In addition, the Emacs Info reader has the
> Info-toc command, which presents a structured menu with all the
> sectioning levels of a manual even if the manual didn't produce it.

Ahh, yes, this is true.  What I found missing is a kind of a map for
knowing what I have available for navigating (also the fact that I
don't usually run info(1) makes me be a bit fuzzy on detailing what
is it that I miss from it).  So, info(1) has a map of the sections
available in a page, and does it also have a map of all the pages
in the system (or whatever you call your pages, I don't yet really
understand the organization of info manuals).

> 
> There are also more focused commands which present TOC-like lists
> across all the manuals, which you can then navigate to read what you
> deem appropriate.  See the description of "--all" command-line option
> of the stand-alone Info reader.  For example, try this command:
> 
>   $ info --all e --index-search "init file"
> 
> There's also the index-apropos command from inside the stand-alone
> reader (and the matching info-apropos in the Emacs Info reader).

It's nice to talk to you, even if we usually disagree in how we
find documentation more accessible :)

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
  2023-04-05 14:19       ` Arsen Arsenović
@ 2023-04-06  1:31       ` Alejandro Colomar
  2023-04-06  6:01         ` Dirk Gouders
  1 sibling, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-06  1:31 UTC (permalink / raw)
  To: Dirk Gouders; +Cc: linux-man, help-texinfo


[-- Attachment #1.1: Type: text/plain, Size: 4614 bytes --]

Hi Dirk,

On 4/5/23 12:02, Dirk Gouders wrote:
> Hi Alex,
> 
>>> first of all, chances are that you consider this post as spam, because
>>> this list is about linux manual pages and not pagers.
>>
>> No, I don't.
> 
> that's fine, thank you for taking the time give me feedback.
> 

:)

> If you want to take a look at it: there is a branch "next" which you
> might prefer as it closer resembles my current work.  There is a new
> toggle "-V" that can be used to completely turn off validation.
> 
> I tried to assemble a Makefile that might work without a configure
> script and attach it to the end.  A prefix /usr is the default value, if
> your system prefers /usr/local you can use `make prefix=/usr/local

The default prefix in GNU should be /usr/local
<https://www.gnu.org/prep/standards/html_node/Directory-Variables.html>

> install`.  I hope I prepared some reasonable Makefile...

I'll have a look.

> 
> Concerning meson.build: I decided to have a look at meson as the
> autobuild tool for lsp.  I am just gathering experiences with it and if
> you have meson(1) installed you could use thes steps to (un)install lsp:
> 
> $ # cd to lsp directory
> $ meson setup --prefix=/usr builddir ; cd builddir
> $ ninja install # or uninstall

>> If it's the first, how do you handle exit(1)?  Is it a reference, or is it
>> just code (with the meaning exit(EXIT_FAILURE))?
> 
> exit(1) gets recognized as a possible reference but validation will fail.

`man 'exit(1)'` works for me.  It brings the exit(1posix) page, from
manpages-posix.


> No, broken references aren't marked.  Usually those unavailable
> references make sense, e.g. if a manual page references some program
> that not everyone uses.
> 
> One example that I couldn't resolve so far is a reference to
> getconf(1) for example in fpatchconf(3).  Up to now I was not able to
> find out which package contains getconf(1)...

$ apt-file find /getconf.1
glibc-source: /usr/src/glibc/debian/local/manpages/getconf.1
libc-bin: /usr/share/man/man1/getconf.1.gz
manpages-fr: /usr/share/man/fr/man1/getconf.1.gz

It's in libc-bin.

BTW, did you mean fpathconf(3)?

> 
>>>
>>>        •   In windowing environments lsp does complete resizes when windows
>>>            get resized. This means it also reloads the manual page to fit the
>>>            new window size.
>>
>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
>> reason to not support that.
> 
> Unfortunately, info(1) also doesn't do full resizes (on my system).
> 
>>>
>>>        •   Search for manual pages using apropos(1); in the current most basic
>>>            form it lists all known manual pages ready for text search and
>>>            visiting referenced manual pages.
>>
>> What does it bring that `apropos * | less` can't do?  If you're going the
>> of info(1) with full-blown system, it seems reasonable, but I never really
>> liked all that if it's just a new terminal and a command away from me.
> 
> You get a pseudo-file from where you can reach any manual page on the
> system.  Originally I thought this to help novice users but since lsp is
> my system's PAGER I use it more often than expected.  I'm missing the
> ability to give keywords to apropos but that's just a matter of time to
> get fixed.

I guess that's a matter of preferring navigation in some interactive
program (to me, info(1) style), vs standalone simple commands where you
first find what you want, then run it.

I don't find that magic much more comfortable than

$ apropos sysctl
... oh, I find many freebsd pages, let's grep them out ...
$ apropos sysctl | grep -v freebsd
... hmm, let's see system ...
$ apropos system | grep -v freebsd
... okay, now this shows a lot of stuff, let's remove man1 ...
$ apropos system | grep -v -e freebsd -e '(1'
... I don't want systemd either ...
$ apropos system | grep -v -e freebsd -e '(1' -e systemd
... let's sort by section and navigate through that list ...
$ apropos system | grep -v -e freebsd -e '(1' -e systemd | sort -k2 | less

Find some pages that may be interesting, note them down, and open
them one by one, in different tabs, until I find I wanted to read
proc(5), and close everything else.

Which brings us to a valid point Eli raised.  Some pages are an
unreadable mess, and I think proc(5) is one of those that needs
a big split into smaller pages such as proc_pid_attr(5).

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
@ 2023-04-06  6:01         ` Dirk Gouders
  0 siblings, 0 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-06  6:01 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man, help-texinfo

[-- Attachment #1: Type: text/plain, Size: 5064 bytes --]

Hi Alex,

Alejandro Colomar <alx.manpages@gmail.com> writes:

>> If you want to take a look at it: there is a branch "next" which you
>> might prefer as it closer resembles my current work.  There is a new
>> toggle "-V" that can be used to completely turn off validation.
>> 
>> I tried to assemble a Makefile that might work without a configure
>> script and attach it to the end.  A prefix /usr is the default value, if
>> your system prefers /usr/local you can use `make prefix=/usr/local
>
> The default prefix in GNU should be /usr/local
> <https://www.gnu.org/prep/standards/html_node/Directory-Variables.html>
>
>> install`.  I hope I prepared some reasonable Makefile...
>
> I'll have a look.

Perhaps, I messed up the Makefile.  Some time ago, I test-compiled lsp
on Rasbpian and CentOS and -lutil was missing.  That got fixed in
meson.build but not in the Makefile I sent you.  I'll attach a new
one -- this time as plain text ;-)

>>> If it's the first, how do you handle exit(1)?  Is it a reference, or is it
>>> just code (with the meaning exit(EXIT_FAILURE))?
>> 
>> exit(1) gets recognized as a possible reference but validation will fail.
>
> `man 'exit(1)'` works for me.  It brings the exit(1posix) page, from
> manpages-posix.

Oh yes, I didn't have the POSIX manual pages installed -- now, exit(1)
gets recognized as a reference.  Thank you.

>> No, broken references aren't marked.  Usually those unavailable
>> references make sense, e.g. if a manual page references some program
>> that not everyone uses.
>> 
>> One example that I couldn't resolve so far is a reference to
>> getconf(1) for example in fpatchconf(3).  Up to now I was not able to
>> find out which package contains getconf(1)...
>
> $ apt-file find /getconf.1
> glibc-source: /usr/src/glibc/debian/local/manpages/getconf.1
> libc-bin: /usr/share/man/man1/getconf.1.gz
> manpages-fr: /usr/share/man/fr/man1/getconf.1.gz
>
> It's in libc-bin.
>
> BTW, did you mean fpathconf(3)?

Yes, that was a typo.  I'm on Gentoo and there is no libc-bin.  But now
I have a direction to search.  Thank you, again.

>
>> 
>>>>
>>>>        •   In windowing environments lsp does complete resizes when windows
>>>>            get resized. This means it also reloads the manual page to fit the
>>>>            new window size.
>>>
>>> Good.  This I miss it in less(1) often.  Not sure if they had any strong
>>> reason to not support that.
>> 
>> Unfortunately, info(1) also doesn't do full resizes (on my system).
>> 
>>>>
>>>>        •   Search for manual pages using apropos(1); in the current most basic
>>>>            form it lists all known manual pages ready for text search and
>>>>            visiting referenced manual pages.
>>>
>>> What does it bring that `apropos * | less` can't do?  If you're going the
>>> of info(1) with full-blown system, it seems reasonable, but I never really
>>> liked all that if it's just a new terminal and a command away from me.
>> 
>> You get a pseudo-file from where you can reach any manual page on the
>> system.  Originally I thought this to help novice users but since lsp is
>> my system's PAGER I use it more often than expected.  I'm missing the
>> ability to give keywords to apropos but that's just a matter of time to
>> get fixed.
>
> I guess that's a matter of preferring navigation in some interactive
> program (to me, info(1) style), vs standalone simple commands where you
> first find what you want, then run it.
>
> I don't find that magic much more comfortable than
>
> $ apropos sysctl
> ... oh, I find many freebsd pages, let's grep them out ...
> $ apropos sysctl | grep -v freebsd
> ... hmm, let's see system ...
> $ apropos system | grep -v freebsd
> ... okay, now this shows a lot of stuff, let's remove man1 ...
> $ apropos system | grep -v -e freebsd -e '(1'
> ... I don't want systemd either ...
> $ apropos system | grep -v -e freebsd -e '(1' -e systemd
> ... let's sort by section and navigate through that list ...
> $ apropos system | grep -v -e freebsd -e '(1' -e systemd | sort -k2 | less
>
> Find some pages that may be interesting, note them down, and open
> them one by one, in different tabs, until I find I wanted to read
> proc(5), and close everything else.

As I wrote: I (also) had novice users in mind when I implented the
Apropos pseudo-file (it can also be used for verification, that's
another use of it).  I often watch novice users getting frustrated about
all the typing that is needed to get useful results.  I know, first of
all, they need to train their "keyboard abilities" but some help here
and there could perhaps help to keep them on board or minimize
frustration...

> Which brings us to a valid point Eli raised.  Some pages are an
> unreadable mess, and I think proc(5) is one of those that needs
> a big split into smaller pages such as proc_pid_attr(5).

Yes, one of the points where I thought pagers with additional
features could help...

Cheers,

Dirk


[-- Attachment #2: Makefile for lsp(1) --]
[-- Type: text/plain, Size: 600 bytes --]

version=\"$(shell cat .version)\"
CFLAGS := $(shell pkg-config --cflags ncursesw)
CFLAGS += -DLSP_VERSION=$(version)
LDFLAGS := $(shell pkg-config --libs ncursesw)
LDFLAGS += -lutil

ifeq ($(prefix),)
	prefix := /usr/local
endif

lsp: lsp.c
	gcc $(CFLAGS) $(LDFLAGS) -o $@ $<

doc/lsp.1: doc/lsp.adoc
	a2x --doctype manpage --format manpage -a lsp-version=$(version) $<

.PHONY: uninstall install

install: lsp doc/lsp.1 doc/lsp-help.1
	install lsp $(prefix)/bin
	install doc/lsp.1 doc/lsp-help.1 $(prefix)/share/man/man1/

uninstall:
	rm $(prefix)/bin/lsp
	rm $(prefix)/share/man/man1/lsp{,-help}.1

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-06  1:10       ` Alejandro Colomar
@ 2023-04-06  8:11         ` Eli Zaretskii
  2023-04-06  8:48           ` Gavin Smith
  2023-04-07 22:01           ` Alejandro Colomar
  2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
  1 sibling, 2 replies; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-06  8:11 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: dirk, linux-man, help-texinfo

> Date: Thu, 6 Apr 2023 03:10:59 +0200
> Cc: dirk@gouders.net, linux-man@vger.kernel.org, help-texinfo@gnu.org
> From: Alejandro Colomar <alx.manpages@gmail.com>
> 
> > This last sentence is a misunderstanding.  The goal of Texinfo is not
> > to improve the man pages.  Texinfo is a completely different approach
> > to software documentation, which allows to write large books and then
> > produce various on-line and off-line formats to read and efficiently
> > search those books.
> 
> "The manual was intended to be typeset; some detail is sacrificed on
> terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_,
> Eighth Edition, Volume 1, February 1985)
> 
> You mean books like this one?  Courtesy of groff(1)'s Deri James =)
> <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.04.01.pdf>
> 
> Or maybe you prefer HTML?
> <https://man7.org/linux/man-pages/man1/intro.1.html>

No, I mean books like "GNU Emacs Manual" or "Debugging with GDB"
(https://shop.fsf.org/collection/books-docs).  Or "War and Peace", for
that matter.

> As to efficiency, I'm not going to open that melon, because we're
> both very biased to be efficient on the formats we each maintain.
> I'll just say that I don't see an objective winner in those terms.

How do you find the description of, say, "dereference symbolic link"
(to take just a random example from the Emacs manual) when the actual
text of the manual include neither this string nor matches for any
related regular expressions, like "dereference.*link"?

The way Info does it is to use the index (which should be present in
any respectable reference document) to find description of the
corresponding subject.  The indexing, which is done by the author of
the document, if it's a good indexing, should include index entries
that specify subjects the reader could have in mind when he/she is
looking for this kind of information.

The corresponding index-searching commands of Info readers are a
primary means for finding information quickly and efficiently,
avoiding too many false positives and also avoiding frustrating
misses, i.e., searches that fail to find anything pertinent.

So this is not about objectivity, this is about features that either
are present in the documentation system or are absent.  I prefer the
Info format to the HTML format of the same manual for this single
reason: HTML browsers don't have the index searching capabilities
(this is hopefully about to change, I hope, see the JS support in
latest Texinfo), and that issue alone was enough to avert me from
HTML, because I cannot afford wasting time on looking up information I
cannot find instantly.

> About variety of output formats, anything that can be produced by
> groff(1), man(7) can be translated.  And groff(1) can do many formats.

Groff (and any other typesetting program) can be used for writing any
kind of documents.  I'm not talking about the processors, I'm talking
about the design of the documentation system as a whole and about what
the products actually look like.  IOW, I'm talking about the man pages
produced by the typesetter, not about what can be done with the
typesetter.

> > Man pages have no means of specifying structure
> 
> .SH, .SS, .TP, .TQ, and very soon (hopefully weeks not months) .MR

These provide just one level.

And how frequently are they used in actual man pages out there, even
when available?

> Those can be used to produce very precise links such as this one
> (one of my favourite references when reviewing man-pages patches):
> <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.04.01.pdf#pdf%3Abm11886>

It's full of mojibake when I try reading it here.  But anyway: what
structure do you have there?  It looks just a long sequence of
separate man pages.

> > and hyper-links except
> > by loosely-coupling pages via "SEE ALSO" cross-references at the end;
> > they have no means of quickly and efficiently finding some specific
> > subject except by text search (which usually produces a lot of false
> > positives).
> 
> I guess you mean searching from the command line by the name of the
> parameter to a function, or what?

No, I mean looking a specific subject of interest without having to
search/read through the entire document.

> I would be interested in a more detailed description of what you
> want to be able to search in current pages (hopefully ones that I
> maintain, so I can speak of them) that you can't find easily?  Maybe
> I can help making something more accessible.

See above, the example of using index-searching commands.

> > By contrast, Texinfo documents have sectioning structure, have
> > cross-references that can appear where you need them and point
> > anywhere else in the document (or into another document).
> 
> This was discussed as a possible extension to '.MR'.  We're just not
> sure that there's a real need for that in manual pages (although
> there's not a consensus on that regard, and Branden, which I'm sure
> is reading this, may jump in at any moment :).

Cannot say about man pages, but in a serious documentation of any
computer software you always need cross-references, because you cannot
make any description self-contained without repeating the same stuff
over and over and over again.

Here's a short examples from a random place in the Emacs Lisp
Reference manual:

     When an editing command returns to the editor command loop, Emacs
  automatically calls ‘set-buffer’ on the buffer shown in the selected
  window (*note Selecting Windows::).  This is to prevent confusion: it
  ensures that the buffer that the cursor is in, when Emacs reads a
  command, is the buffer to which that command applies (*note Command
  Loop::).  Thus, you should not use ‘set-buffer’ to switch visibly to a
  different buffer; for that, use the functions described in *note
  Switching Buffers::.

The three places which say with "see SOMETHING" are cross-references
to other parts of the manual.  Without being able to cross-reference
there, the text would have to explain what it means by "selected
window", what it means by "commands" and "command loop", and mention
explicitly the functions to switch to a buffer which are already
described in detail elsewhere.  This allows readers who already know
about those subjects to read the text without having to skip large
amounts of unnecessary information, while also allowing readers who
are not sure they know about that to be able to follow the link, read
there, and then come back to the same place to continue reading.

> >  They also
> > have indexing and commands that allow the reader to use the index in
> > order to find the subject he/she is interested in very quickly and
> 
> You mean whatis(1) and apropos(1)?

No.  These perform text searches on the titles of the man pages, and
are therefore limited to what is in the title.  Indexing is much more
powerful, and works on the topics in the index (which, as explained
above, could contain text not present anywhere else in the document).
And every respectful Info manual has an index (some have several
indices).  See above about the commands which use the index.

> > accurately, even if the text of the index entry doesn't appear
> > anywhere in the manual.
> 
> man pages have several ways:
> 
> -  Including keywords in the NAME section.
> -  Link pages.
> -  TH line.

This is not enough, IME.  You need a way of "tagging" a chunk of text
as describing, or being pertinent to, a particular subject, even if
that subject does not appear literally in the text the reader will
see.  That's because when readers are after some specific material,
they don't always have in mind the exact words used in the manual for
describing that material, they could have some alternative phrases in
their minds.  Good indexing anticipates this in advance, and provides
index entries for those alternative phrases, allowing readers to find
stuff quickly.

> Of course, this is for the terminal.  For PDF or HTML, you can
> get hyperlinks to any subsection (and in the future maybe even
> tagged paragraphs) within a page.

In Info, references to any paragraph are available since long ago.
They are invaluable in some situations, especially when some section
is very long and you want to point to a very specific part thereof.

> > How can you document a large and flexible software package, such as
> > GDB or Texinfo or Emacs, in man pages?
> 
> git is a huge program, yet its man pages are quite useful.

Git is a huge heap of separate commands, with very little to glue them
together in terms of documented functionalities.  Still, even in Git,
there's the stuff that belongs to neither command in particular, and
thus is documented in man pages with invented names like
"gitrevisions", which is impossible to guess in advance for a newbie
who needs this information.

Moreover, the introduction material and the explanation of basic
concepts is not in man pages, but in a separate HTML document ("The
Git User's Manual"), and likewise the API documentation, which in
itself is a telltale sign.

While something like a huge heap of man pages is perhaps borderline
reasonable for Git, it isn't reasonable for programs which are not
easily broken into separate independent "pages", like GDB and Emacs.
The more complex is the system of objects and concepts manipulated by
the software, the less appropriate is the man-page format for
describing it.

> Just split your documentation at the right boundary, which
> usually requires a good design for your software that allows
> such division.

Whether the manual is split or not is immaterial.  Info manuals can
also be split.  The relevant issue is what the viewer allows the
reader to do to read these chunks in a reasonable way, using efficient
commands and features to find related information quickly.

> The fact that current man(1) implementations don't exploit
> the whole power of man(7) doesn't mean you can't design a
> software that does.

Indeed, it doesn't mean that.  But we are discussing what is there,
not what could be there in some distant future.

> I'm sure you could build something similar to info(1) that
> got man(7) pages as its input.

No!  The information about subsections, cross-references, and indices
is missing.  That information must be there to begin with, otherwise
it cannot be recreated, because it's inserted by humans, not by
programs.

> > It isn't missing.  The TOC is presented as top-level menu in each
> > manual, and large manuals have also the "detailed menu" with all the
> > sub-nodes spelled out.  In addition, the Emacs Info reader has the
> > Info-toc command, which presents a structured menu with all the
> > sectioning levels of a manual even if the manual didn't produce it.
> 
> Ahh, yes, this is true.  What I found missing is a kind of a map for
> knowing what I have available for navigating (also the fact that I
> don't usually run info(1) makes me be a bit fuzzy on detailing what
> is it that I miss from it).  So, info(1) has a map of the sections
> available in a page, and does it also have a map of all the pages
> in the system (or whatever you call your pages, I don't yet really
> understand the organization of info manuals).

Yes, it does.  If you invoke 'info' with no arguments, it will show
the "directory" of all the installed manuals -- a large menu where
each manual has at least one line explaining what the manual
describes.  Some manuals have much more than one line; examples
include Coreutils and Binutils (which have a line for each individual
command) and glibc (which has a line for every _function_).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: A less presumptive .info? (was: Re: Playground pager lsp(1))
  2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
@ 2023-04-06  8:14               ` Eli Zaretskii
  2023-04-06  8:56                 ` Gavin Smith
  2023-04-07 13:14                 ` Arsen Arsenović
  0 siblings, 2 replies; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-06  8:14 UTC (permalink / raw)
  To: Arsen Arsenović
  Cc: GavinSmith0123, dirk, alx.manpages, linux-man, help-texinfo

> From: Arsen Arsenović <arsen@aarsen.me>
> Cc: Dirk Gouders <dirk@gouders.net>, alx.manpages@gmail.com,
>  linux-man@vger.kernel.org, help-texinfo@gnu.org
> Date: Wed, 05 Apr 2023 22:38:12 +0200
> 
> I've been casually wondering if creating a new format that can host more
> formatting options and uses more precise syntax than 'plaintext with
> some binary tags' would be a decent thing to work on.
> 
> My thoughts were brief and undeveloped as this was thought of on the
> commute, but something that retains the binary offsets for indices and
> tags, but stores formatted data (perhaps as s-exprs, those would be easy
> to parse).  It is always easier to remove information than to
> reintroduce it.
> 
> Such a structure should resemble the input language, but with far less
> complexity (e.g. something at the level of abstraction that HTML5 sits
> at, so, macros would be expanded, and we'd be dealing with lists of
> paragraphs and formatted blocks, etc.).
> 
> This would allow for the reflowing that was talked about in this thread,
> and provide more readable output in graphical contexts, as it wouldn't
> be data generated with the assumption of a monospace font (rather, the
> format could store whether your context wants monospace or proportional
> fonts at a given point), or data generated for a given screen size, or
> with a given indentation size, or with the assumption of a lack of
> features like italics, etc.
> 
> For instance, info2html used by the KDE info viewer currently produces
> quite terrible results, because it fails to implement the heuristics the
> Info viewers have properly.  This problem would be hard to have with a
> better "at-rest" format for Info pages.
> 
> The alternative is, of course, bringing HTML up to par feature-wise
> (wrt. indices etc), but that'd be on the other end of the extreme, where
> instead of being too easy to parse and lacking important information,
> it'd be oververbose with and difficult to parse (not that such a thing
> should not be done too, so that folks using ordinary browsers can enjoy
> documentation, and so that projects can provide more accessible
> documentation by the merit of more people having HTML than Info
> viewers).
> 
> WDYT folks?

Gavin will tell, but AFAIU our plan is to develop js as the means
towards the goals you mentioned.  That will allow using HTML browsers
to read Texinfo documentation without losing the functionalities of
the Info readers we value.  HTML rendering reflows as integral part of
its workings, so that problem is not an issue if this plan succeeds.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-06  8:11         ` Eli Zaretskii
@ 2023-04-06  8:48           ` Gavin Smith
  2023-04-07 22:01           ` Alejandro Colomar
  1 sibling, 0 replies; 73+ messages in thread
From: Gavin Smith @ 2023-04-06  8:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alejandro Colomar, dirk, linux-man, help-texinfo

On Thu, Apr 06, 2023 at 11:11:40AM +0300, Eli Zaretskii wrote:
> How do you find the description of, say, "dereference symbolic link"
> (to take just a random example from the Emacs manual) when the actual
> text of the manual include neither this string nor matches for any
> related regular expressions, like "dereference.*link"?
> 
> The way Info does it is to use the index (which should be present in
> any respectable reference document) to find description of the
> corresponding subject.  The indexing, which is done by the author of
> the document, if it's a good indexing, should include index entries
> that specify subjects the reader could have in mind when he/she is
> looking for this kind of information.
> 
> The corresponding index-searching commands of Info readers are a
> primary means for finding information quickly and efficiently,
> avoiding too many false positives and also avoiding frustrating
> misses, i.e., searches that fail to find anything pertinent.

In the future, there should be a local documentation search driven
by AI algorithms which handles synonyms and rewordings, so that if
the user searched for "dereference", they also found text about
"following a reference" even if the word "dereference" wasn't used.
Think of it like a version of G**gle running on your own machine.
Implementing such a thing is beyond me, though.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: A less presumptive .info? (was: Re: Playground pager lsp(1))
  2023-04-06  8:14               ` Eli Zaretskii
@ 2023-04-06  8:56                 ` Gavin Smith
  2023-04-07 13:14                 ` Arsen Arsenović
  1 sibling, 0 replies; 73+ messages in thread
From: Gavin Smith @ 2023-04-06  8:56 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Arsen Arsenović, dirk, alx.manpages, linux-man, help-texinfo

On Thu, Apr 06, 2023 at 11:14:01AM +0300, Eli Zaretskii wrote:
> > The alternative is, of course, bringing HTML up to par feature-wise
> > (wrt. indices etc), but that'd be on the other end of the extreme, where
> > instead of being too easy to parse and lacking important information,
> > it'd be oververbose with and difficult to parse (not that such a thing
> > should not be done too, so that folks using ordinary browsers can enjoy
> > documentation, and so that projects can provide more accessible
> > documentation by the merit of more people having HTML than Info
> > viewers).
> > 
> > WDYT folks?
> 
> Gavin will tell, but AFAIU our plan is to develop js as the means
> towards the goals you mentioned.  That will allow using HTML browsers
> to read Texinfo documentation without losing the functionalities of
> the Info readers we value.  HTML rendering reflows as integral part of
> its workings, so that problem is not an issue if this plan succeeds.

Progress on this issue is described in the TODO.HTML file in the Texinfo
repository.

https://git.savannah.gnu.org/cgit/texinfo.git/tree/TODO.HTML

In short, the main avenue of progress appears to be the documentation
browser using the embedded WebkitGTK browser.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-06  1:10       ` Alejandro Colomar
  2023-04-06  8:11         ` Eli Zaretskii
@ 2023-04-07  2:18         ` G. Branden Robinson
  2023-04-07  6:36           ` Eli Zaretskii
  2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
  1 sibling, 2 replies; 73+ messages in thread
From: G. Branden Robinson @ 2023-04-07  2:18 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Eli Zaretskii, dirk, linux-man, help-texinfo

[-- Attachment #1: Type: text/plain, Size: 3547 bytes --]

At 2023-04-06T03:10:59+0200, Alejandro Colomar wrote:
> Hmm, now that I think, it's probably an issue of coordinating man(1)
> and less(1).  I sometimes wish that when I resize a window where I'm
> reading a man page, it would reformat the page from source.

Seems like it shouldn't be impossible to me, but what I imagine would
require a little reëngineering of man(1), perhaps to spawn a little
custom program to manage zcat/nroff pipeline it constructs.  This little
program's sole job could be to be aware of this pipeline and listen for
SIGWINCH; if it happens, kill the rest of the pipeline and reëxecute it.

Maybe I thought of it this way because (I suspect) it aligns with my
vision I've expressed elsewhere of man(1) having unfortunately
aggregated two separate functions: librarian vs. renderer.
Historically, of course the latter function was almost vestigial, since
early Unix systems had no pager program and their man pages required
little to no preprocessing; man(1) slowly accreted into a larger thing.

> Of course, that might be a problem for keeping track of where I was,
> since lines moved around.

That seems like a harder problem to me.  You'd need a way for the pager
to communicate position information back to the mini-man renderer
program I envision.  Two challenges here: (1) what part of the screen
was the reader actually looking at?  (2) how is the pager supposed to
know how to map any given location on the screen back to a place in the
unrendered source document so it can be accurately found when the
document is rerendered?  These feel nearly intractable to me.  But maybe
I have a poor imagination.

> Ahh, yes, this is true.  What I found missing is a kind of a map for
> knowing what I have available for navigating (also the fact that I
> don't usually run info(1) makes me be a bit fuzzy on detailing what
> is it that I miss from it).  So, info(1) has a map of the sections
> available in a page, and does it also have a map of all the pages
> in the system (or whatever you call your pages, I don't yet really
> understand the organization of info manuals).

The "install-info" program is run by packages that install info
documents to the system.  This creates or updates a file called "dir".

For instance, when I "make install" an everyday groff build, the
following shows up.

/home/branden/groff/share/info/dir
/home/branden/groff/share/info/groff.info
/home/branden/groff/share/info/groff.info-1
/home/branden/groff/share/info/groff.info-2
/home/branden/groff/share/info/groff.info-3

Since help-texinfo is on the distribution list of this mail, I'll take
this opportunity to note something from groff's INSTALL.extra file,
explaining how to uninstall the package.

  ... Run the command 'sudo make uninstall'.  (If you successfully used
  'make install', simply run 'make uninstall'.)  At a minimum, some
  directories not particular to groff, like 'bin' and (depending on
  configuration) an X11 'app-defaults' directory will remain, as will
  one plain file called 'dir', created by GNU Texinfo's 'install-info'
  command.  (As of this writing, 'install-info' offers no provision for
  removing an effectively empty 'dir' file, and groff does not attempt
  to parse this file to determine whether it can be safely removed.)
  All other groff artifacts will be deleted from the installation
  hierarchy.

Any chance 'install-info' could get savvy as noted above?  (Maybe it
already has--I'm running 6.7.0.)

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
@ 2023-04-07  6:36           ` Eli Zaretskii
  2023-04-07 11:03             ` Gavin Smith
  2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
  2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
  1 sibling, 2 replies; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-07  6:36 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: alx.manpages, dirk, linux-man, help-texinfo

> Date: Thu, 6 Apr 2023 21:18:22 -0500
> From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, dirk@gouders.net,
> 	linux-man@vger.kernel.org, help-texinfo@gnu.org
> 
> > Hmm, now that I think, it's probably an issue of coordinating man(1)
> > and less(1).  I sometimes wish that when I resize a window where I'm
> > reading a man page, it would reformat the page from source.
> 
> Seems like it shouldn't be impossible to me, but what I imagine would
> require a little reëngineering of man(1), perhaps to spawn a little
> custom program to manage zcat/nroff pipeline it constructs.  This little
> program's sole job could be to be aware of this pipeline and listen for
> SIGWINCH; if it happens, kill the rest of the pipeline and reëxecute it.

This should be possible, but it flies in the face of the feature
whereby formatted man pages are kept for future perusal, which is
therefore faster: if the formatted pages reflect the particular size
of the pager's window, it is meaningless to cache them.

>   ... Run the command 'sudo make uninstall'.  (If you successfully used
>   'make install', simply run 'make uninstall'.)  At a minimum, some
>   directories not particular to groff, like 'bin' and (depending on
>   configuration) an X11 'app-defaults' directory will remain, as will
>   one plain file called 'dir', created by GNU Texinfo's 'install-info'
>   command.  (As of this writing, 'install-info' offers no provision for
>   removing an effectively empty 'dir' file, and groff does not attempt
>   to parse this file to determine whether it can be safely removed.)
>   All other groff artifacts will be deleted from the installation
>   hierarchy.
> 
> Any chance 'install-info' could get savvy as noted above?  (Maybe it
> already has--I'm running 6.7.0.)

Why does it make sense to do that?  An "empty" DIR file is not really
empty: it has instructions at its beginning, which are important for
newbies.  Also, on well-maintained system, DIR will rarely become
empty, and if it does, it will soon enough become non-empty again,
since all the Info manuals installed on the system should be mentioned
there, and why would we want to imagine a system which has no Info
manuals at all, not even an Info manual that describes how to use Info
(which comes with the Texinfo distribution)?

So I think Groff should remove that paragraph from its instructions,
because (IMO) it is misleading and unnecessary.

Of course, mine is not the authoritative opinion about how the Texinfo
project should develop its programs, it is just one opinion.  So wait
for Gavin to chime in.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-07  6:36           ` Eli Zaretskii
@ 2023-04-07 11:03             ` Gavin Smith
  2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
  1 sibling, 0 replies; 73+ messages in thread
From: Gavin Smith @ 2023-04-07 11:03 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: G. Branden Robinson, alx.manpages, dirk, linux-man, help-texinfo

On Fri, Apr 07, 2023 at 09:36:10AM +0300, Eli Zaretskii wrote:
> This should be possible, but it flies in the face of the feature
> whereby formatted man pages are kept for future perusal, which is
> therefore faster: if the formatted pages reflect the particular size
> of the pager's window, it is meaningless to cache them.
> 
> >   ... Run the command 'sudo make uninstall'.  (If you successfully used
> >   'make install', simply run 'make uninstall'.)  At a minimum, some
> >   directories not particular to groff, like 'bin' and (depending on
> >   configuration) an X11 'app-defaults' directory will remain, as will
> >   one plain file called 'dir', created by GNU Texinfo's 'install-info'
> >   command.  (As of this writing, 'install-info' offers no provision for
> >   removing an effectively empty 'dir' file, and groff does not attempt
> >   to parse this file to determine whether it can be safely removed.)
> >   All other groff artifacts will be deleted from the installation
> >   hierarchy.
> > 
> > Any chance 'install-info' could get savvy as noted above?  (Maybe it
> > already has--I'm running 6.7.0.)
> 
> Why does it make sense to do that?  An "empty" DIR file is not really
> empty: it has instructions at its beginning, which are important for
> newbies.  Also, on well-maintained system, DIR will rarely become
> empty, and if it does, it will soon enough become non-empty again,
> since all the Info manuals installed on the system should be mentioned
> there, and why would we want to imagine a system which has no Info
> manuals at all, not even an Info manual that describes how to use Info
> (which comes with the Texinfo distribution)?

It falls under the same category as the "directories not particular
to groff" mentioned in the instructions.  You want install-info (or
Automake rules) to remove an empty dir file; you could equally claim
that install-info should remove the empty 'info' directory that contains
that dir file.

What are the benefits of removing the file?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: A less presumptive .info? (was: Re: Playground pager lsp(1))
  2023-04-06  8:14               ` Eli Zaretskii
  2023-04-06  8:56                 ` Gavin Smith
@ 2023-04-07 13:14                 ` Arsen Arsenović
  1 sibling, 0 replies; 73+ messages in thread
From: Arsen Arsenović @ 2023-04-07 13:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: GavinSmith0123, dirk, alx.manpages, linux-man, help-texinfo

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]


Eli Zaretskii <eliz@gnu.org> writes:

> Gavin will tell, but AFAIU our plan is to develop js as the means
> towards the goals you mentioned.  That will allow using HTML browsers
> to read Texinfo documentation without losing the functionalities of
> the Info readers we value.  HTML rendering reflows as integral part of
> its workings, so that problem is not an issue if this plan succeeds.

Sure, but how will this work with the standalone and/or Emacs viewers?

In Emacs, doing so places a strain on the HTML generator to work around
eww, and presuming we choose to do that, it requires the user to have
Emacs.

In the non-Emacs case, it requires that the implementor implement at
least a subset of HTML, or places a demand on the user to have a web
browser (in which, there are two extremes: either the 'underimplemented
and insufficient' ones for which JS as glue won't work, or full browsers
which aren't accessible in many scenarios).

On the other hand, having a more advanced format based on s-exprs for
info at rest storage could let us have complete information about the
intended markup of the text to be displayed with only two syntactic
elements (lists and strings).  That should be rather easy to parse.

I don't see it as very viable to replace an implementable info storage
format with only HTML for that reason.

I have TODO.HTML open on my workstation to take a look through some of
those when I get back home.  I do believe that it's a high priority
target, as it is very important to newcommers to GNU who are viewing GNU
documentation from remote servers, but I doubt it can replace a native
Info format.
-- 
Arsen Arsenović

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* man page rendering speed (was: Playground pager lsp(1))
  2023-04-07  6:36           ` Eli Zaretskii
  2023-04-07 11:03             ` Gavin Smith
@ 2023-04-07 14:43             ` G. Branden Robinson
  2023-04-07 15:06               ` Eli Zaretskii
                                 ` (2 more replies)
  1 sibling, 3 replies; 73+ messages in thread
From: G. Branden Robinson @ 2023-04-07 14:43 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: alx.manpages, dirk, cjwatson, linux-man, help-texinfo, groff


[-- Attachment #1.1: Type: text/plain, Size: 4511 bytes --]

[adding Colin Watson to CC to solicit his man(1) implementation
knowledge; adding the groff list as this sub-discussion is relevant to
its interests]

At 2023-04-07T09:36:10+0300, Eli Zaretskii wrote:
> > From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
[re-running *roff when a viewing a man page and resizing the terminal]
> > Seems like it shouldn't be impossible to me, but what I imagine
> > would require a little reëngineering of man(1), perhaps to spawn a
> > little custom program to manage zcat/nroff pipeline it constructs.
> > This little program's sole job could be to be aware of this pipeline
> > and listen for SIGWINCH; if it happens, kill the rest of the
> > pipeline and reëxecute it.
> 
> This should be possible, but it flies in the face of the feature
> whereby formatted man pages are kept for future perusal, which is
> therefore faster:

You're referring to cat pages.  As far as I know, these are on their way
out if not already gone.  Colin Watson, who has maintained the man-db
implementation of man(1)[1] for something like 20 years, can speak more
authoritatively to this, but as I understand it, the advent of resizable
xterm windows started to kill the utility of cat pages decades ago and
the increasing importance of desktop environments accelerated their
demise.  If a cat page wasn't pre-rendered at the width of your
terminal, or for your terminal type[1], man pages were formatted from
scratch for you anyway.  You could of course cache pages at a variety of
widths (and for multiple terminal types), but doing so for any width
encountered was a space concern--or even a DoS vector if some
undergraduate rapscallion decided to try rendering every page on the
system at every terminal width from 1 to INT_MAX--in the years when man
page rendering time was also noticeable.

...which brings me to the other factor, of which I'm more confident: man
page rendering times are much lower than they were in Unix's early days.

On my system, all groff man pages but one render in between a tenth and
a fortieth of a second.  The really huge pages like groff(7),
groff_char(7), and groff_diff(7) are toward the upper end of this range,
because they are long, at ~20-25 U.S. letter pages when formatted for
PostScript or PDF, or have many large tables so the tbl(1) preprocessor
produces a lot of output.

The outlier is groff_mdoc(7) at just over one-third of a second.  It is
written in its own macro language, not man(7), and also a lengthy
document (31 U.S.  letter pages).  mdoc has always been slow; its
original implementers warned of this.  (I believe this is mainly due to
an aspect of its design: the typical mdoc(7) document has a large number
of recursive macro calls arising from a decision to help the document
author avoid having to start new control lines to call them.)

While not statistically rigorous, mainly because I didn't undertake a
large number of trials under various system loads, I attempted fair
measurements by (A) always running the 3 preprocessors pic(1), eqn(1),
and tbl(1) on _all_ input documents even though this is pointless most
of the time (only tbl(1) sees use more than rarely), and (B) formatting
both with and without operation of the output driver grotty(1) in the
pipeline, in case "cheating" by having groff(1) discard its standard
output stream artificially deflated the time consumption.  It appears
not to have.

The bottom line is that, even on BSD systems (where mdoc(7) is preferred
to man(7)), a user can expect a man page to render from *roff source in
less than, say, half a second in the worst case, and the median
GNU/Linux user can expect to start reading a man page "instantaneously":

  Human subjects need a minimum of about 0.1 second of visual experience
  or about .01 to .02 second of auditory experience to perceive
  duration; any shorter experiences are called instantaneous.
  -- Encyclopædia Britannica[2]

My findings are attached.

I'll respond to the "uninstall-info" topic in a separate subthread.

Regards,
Branden

[1] Once upon a time, Unix time-sharing systems had to support shell
    sessions originating from a wide variety of terminals; at Purdue, I
    never saw a real DEC VT in use (to my regret), but plenty of Zenith
    Z29s, Wyse 50s, Sun SPARC IPCs in console mode, and the occasional
    really retro Lear Siegler ADM-5.

[2] https://www.britannica.com/science/time-perception/Perceived-duration

[-- Attachment #1.2: TIMING --]
[-- Type: text/plain, Size: 4186 bytes --]

for m in $(find -name "*.[157]" | sort); do echo; echo $m; time ./test-groff -Ez -pet -mandoc $m; done

./contrib/chem/chem.1

real	0m0.039s
user	0m0.043s
sys	0m0.000s

./contrib/eqn2graph/eqn2graph.1

real	0m0.025s
user	0m0.028s
sys	0m0.000s

./contrib/gdiffmk/gdiffmk.1

real	0m0.026s
user	0m0.023s
sys	0m0.007s

./contrib/glilypond/glilypond.1

real	0m0.032s
user	0m0.036s
sys	0m0.000s

./contrib/gperl/gperl.1

real	0m0.028s
user	0m0.031s
sys	0m0.000s

./contrib/gpinyin/gpinyin.1

real	0m0.027s
user	0m0.028s
sys	0m0.002s

./contrib/grap2graph/grap2graph.1

real	0m0.025s
user	0m0.026s
sys	0m0.002s

./contrib/hdtbl/groff_hdtbl.7

real	0m0.035s
user	0m0.032s
sys	0m0.006s

./contrib/mm/groff_mm.7

real	0m0.087s
user	0m0.092s
sys	0m0.009s

./contrib/mm/groff_mmse.7

real	0m0.025s
user	0m0.028s
sys	0m0.000s

./contrib/mm/mmroff.1

real	0m0.024s
user	0m0.018s
sys	0m0.010s

./contrib/mom/groff_mom.7

real	0m0.058s
user	0m0.053s
sys	0m0.010s

./contrib/pdfmark/pdfroff.1

real	0m0.033s
user	0m0.036s
sys	0m0.000s

./contrib/pic2graph/pic2graph.1

real	0m0.026s
user	0m0.029s
sys	0m0.000s

./contrib/rfc1345/groff_rfc1345.7

real	0m0.026s
user	0m0.026s
sys	0m0.004s

./man/groff.7

real	0m0.099s
user	0m0.110s
sys	0m0.000s

./man/groff_char.7

real	0m0.086s
user	0m0.109s
sys	0m0.000s

./man/groff_diff.7

real	0m0.082s
user	0m0.081s
sys	0m0.010s

./man/groff_font.5

real	0m0.033s
user	0m0.037s
sys	0m0.000s

./man/groff_out.5

real	0m0.042s
user	0m0.041s
sys	0m0.005s

./man/groff_tmac.5

real	0m0.037s
user	0m0.035s
sys	0m0.006s

./man/roff.7

real	0m0.047s
user	0m0.052s
sys	0m0.000s

./src/devices/grodvi/grodvi.1

real	0m0.029s
user	0m0.030s
sys	0m0.002s

./src/devices/grohtml/grohtml.1

real	0m0.030s
user	0m0.029s
sys	0m0.004s

./src/devices/grolbp/grolbp.1

real	0m0.029s
user	0m0.027s
sys	0m0.006s

./src/devices/grolj4/grolj4.1

real	0m0.033s
user	0m0.036s
sys	0m0.000s

./src/devices/gropdf/gropdf.1

real	0m0.041s
user	0m0.045s
sys	0m0.000s

./src/devices/gropdf/pdfmom.1

real	0m0.025s
user	0m0.028s
sys	0m0.000s

./src/devices/grops/grops.1

real	0m0.045s
user	0m0.049s
sys	0m0.000s

./src/devices/grotty/grotty.1

real	0m0.031s
user	0m0.032s
sys	0m0.002s

./src/devices/xditview/gxditview.1

real	0m0.035s
user	0m0.036s
sys	0m0.002s

./src/preproc/eqn/eqn.1

real	0m0.047s
user	0m0.052s
sys	0m0.000s

./src/preproc/eqn/neqn.1

real	0m0.024s
user	0m0.025s
sys	0m0.002s

./src/preproc/grn/grn.1

real	0m0.036s
user	0m0.030s
sys	0m0.010s

./src/preproc/pic/pic.1

real	0m0.036s
user	0m0.040s
sys	0m0.000s

./src/preproc/preconv/preconv.1

real	0m0.028s
user	0m0.031s
sys	0m0.000s

./src/preproc/refer/refer.1

real	0m0.051s
user	0m0.047s
sys	0m0.008s

./src/preproc/soelim/soelim.1

real	0m0.026s
user	0m0.030s
sys	0m0.000s

./src/preproc/tbl/tbl.1

real	0m0.043s
user	0m0.047s
sys	0m0.002s

./src/roff/groff/groff.1

real	0m0.050s
user	0m0.053s
sys	0m0.002s

./src/roff/nroff/nroff.1

real	0m0.026s
user	0m0.025s
sys	0m0.004s

./src/roff/troff/troff.1

real	0m0.035s
user	0m0.037s
sys	0m0.002s

./src/utils/addftinfo/addftinfo.1

real	0m0.025s
user	0m0.028s
sys	0m0.000s

./src/utils/afmtodit/afmtodit.1

real	0m0.029s
user	0m0.030s
sys	0m0.002s

./src/utils/grog/grog.1

real	0m0.028s
user	0m0.028s
sys	0m0.004s

./src/utils/hpftodit/hpftodit.1

real	0m0.030s
user	0m0.033s
sys	0m0.000s

./src/utils/indxbib/indxbib.1

real	0m0.029s
user	0m0.026s
sys	0m0.006s

./src/utils/lkbib/lkbib.1

real	0m0.027s
user	0m0.030s
sys	0m0.000s

./src/utils/lookbib/lookbib.1

real	0m0.026s
user	0m0.028s
sys	0m0.002s

./src/utils/pfbtops/pfbtops.1

real	0m0.025s
user	0m0.017s
sys	0m0.011s

./src/utils/tfmtodit/tfmtodit.1

real	0m0.027s
user	0m0.026s
sys	0m0.004s

./src/utils/xtotroff/xtotroff.1

real	0m0.025s
user	0m0.022s
sys	0m0.006s

./tmac/groff_man.7

real	0m0.049s
user	0m0.043s
sys	0m0.012s

./tmac/groff_man_style.7

real	0m0.066s
user	0m0.070s
sys	0m0.004s

./tmac/groff_mdoc.7

real	0m0.379s
user	0m0.379s
sys	0m0.010s

./tmac/groff_me.7

real	0m0.044s
user	0m0.039s
sys	0m0.013s

./tmac/groff_ms.7

real	0m0.065s
user	0m0.060s
sys	0m0.013s

./tmac/groff_trace.7

real	0m0.027s
user	0m0.026s
sys	0m0.004s

./tmac/groff_www.7

real	0m0.035s
user	0m0.030s
sys	0m0.009s

[-- Attachment #1.3: TIMING2 --]
[-- Type: text/plain, Size: 4203 bytes --]

for m in $(find -name "*.[157]" | sort); do echo; echo $m; time ./test-groff -E -pet -mandoc -Tutf8 $m >/dev/null; done

./contrib/chem/chem.1

real	0m0.051s
user	0m0.062s
sys	0m0.008s

./contrib/eqn2graph/eqn2graph.1

real	0m0.018s
user	0m0.019s
sys	0m0.006s

./contrib/gdiffmk/gdiffmk.1

real	0m0.018s
user	0m0.026s
sys	0m0.000s

./contrib/glilypond/glilypond.1

real	0m0.027s
user	0m0.038s
sys	0m0.000s

./contrib/gperl/gperl.1

real	0m0.021s
user	0m0.021s
sys	0m0.009s

./contrib/gpinyin/gpinyin.1

real	0m0.020s
user	0m0.026s
sys	0m0.002s

./contrib/grap2graph/grap2graph.1

real	0m0.018s
user	0m0.025s
sys	0m0.000s

./contrib/hdtbl/groff_hdtbl.7

real	0m0.031s
user	0m0.044s
sys	0m0.002s

./contrib/mm/groff_mm.7

real	0m0.089s
user	0m0.129s
sys	0m0.004s

./contrib/mm/groff_mmse.7

real	0m0.018s
user	0m0.026s
sys	0m0.000s

./contrib/mm/mmroff.1

real	0m0.023s
user	0m0.029s
sys	0m0.002s

./contrib/mom/groff_mom.7

real	0m0.067s
user	0m0.093s
sys	0m0.000s

./contrib/pdfmark/pdfroff.1

real	0m0.033s
user	0m0.040s
sys	0m0.006s

./contrib/pic2graph/pic2graph.1

real	0m0.020s
user	0m0.022s
sys	0m0.007s

./contrib/rfc1345/groff_rfc1345.7

real	0m0.021s
user	0m0.028s
sys	0m0.001s

./man/groff.7

real	0m0.116s
user	0m0.169s
sys	0m0.000s

./man/groff_char.7

real	0m0.069s
user	0m0.111s
sys	0m0.000s

./man/groff_diff.7

real	0m0.093s
user	0m0.134s
sys	0m0.002s

./man/groff_font.5

real	0m0.029s
user	0m0.031s
sys	0m0.011s

./man/groff_out.5

real	0m0.042s
user	0m0.058s
sys	0m0.002s

./man/groff_tmac.5

real	0m0.034s
user	0m0.044s
sys	0m0.005s

./man/roff.7

real	0m0.049s
user	0m0.075s
sys	0m0.000s

./src/devices/grodvi/grodvi.1

real	0m0.023s
user	0m0.031s
sys	0m0.002s

./src/devices/grohtml/grohtml.1

real	0m0.025s
user	0m0.025s
sys	0m0.011s

./src/devices/grolbp/grolbp.1

real	0m0.022s
user	0m0.030s
sys	0m0.002s

./src/devices/grolj4/grolj4.1

real	0m0.027s
user	0m0.033s
sys	0m0.006s

./src/devices/gropdf/gropdf.1

real	0m0.046s
user	0m0.063s
sys	0m0.002s

./src/devices/gropdf/pdfmom.1

real	0m0.018s
user	0m0.021s
sys	0m0.004s

./src/devices/grops/grops.1

real	0m0.038s
user	0m0.055s
sys	0m0.000s

./src/devices/grotty/grotty.1

real	0m0.025s
user	0m0.033s
sys	0m0.004s

./src/devices/xditview/gxditview.1

real	0m0.026s
user	0m0.028s
sys	0m0.009s

./src/preproc/eqn/eqn.1

real	0m0.039s
user	0m0.055s
sys	0m0.000s

./src/preproc/eqn/neqn.1

real	0m0.018s
user	0m0.021s
sys	0m0.002s

./src/preproc/grn/grn.1

real	0m0.032s
user	0m0.042s
sys	0m0.004s

./src/preproc/pic/pic.1

real	0m0.032s
user	0m0.046s
sys	0m0.000s

./src/preproc/preconv/preconv.1

real	0m0.028s
user	0m0.039s
sys	0m0.001s

./src/preproc/refer/refer.1

real	0m0.050s
user	0m0.067s
sys	0m0.004s

./src/preproc/soelim/soelim.1

real	0m0.019s
user	0m0.028s
sys	0m0.000s

./src/preproc/tbl/tbl.1

real	0m0.040s
user	0m0.061s
sys	0m0.000s

./src/roff/groff/groff.1

real	0m0.048s
user	0m0.067s
sys	0m0.002s

./src/roff/nroff/nroff.1

real	0m0.020s
user	0m0.028s
sys	0m0.000s

./src/roff/troff/troff.1

real	0m0.032s
user	0m0.044s
sys	0m0.000s

./src/utils/addftinfo/addftinfo.1

real	0m0.019s
user	0m0.024s
sys	0m0.002s

./src/utils/afmtodit/afmtodit.1

real	0m0.023s
user	0m0.021s
sys	0m0.012s

./src/utils/grog/grog.1

real	0m0.026s
user	0m0.031s
sys	0m0.005s

./src/utils/hpftodit/hpftodit.1

real	0m0.026s
user	0m0.036s
sys	0m0.000s

./src/utils/indxbib/indxbib.1

real	0m0.021s
user	0m0.029s
sys	0m0.000s

./src/utils/lkbib/lkbib.1

real	0m0.019s
user	0m0.020s
sys	0m0.006s

./src/utils/lookbib/lookbib.1

real	0m0.019s
user	0m0.025s
sys	0m0.000s

./src/utils/pfbtops/pfbtops.1

real	0m0.019s
user	0m0.021s
sys	0m0.004s

./src/utils/tfmtodit/tfmtodit.1

real	0m0.023s
user	0m0.028s
sys	0m0.004s

./src/utils/xtotroff/xtotroff.1

real	0m0.020s
user	0m0.021s
sys	0m0.007s

./tmac/groff_man.7

real	0m0.044s
user	0m0.061s
sys	0m0.002s

./tmac/groff_man_style.7

real	0m0.068s
user	0m0.098s
sys	0m0.004s

./tmac/groff_mdoc.7

real	0m0.383s
user	0m0.418s
sys	0m0.006s

./tmac/groff_me.7

real	0m0.031s
user	0m0.033s
sys	0m0.013s

./tmac/groff_ms.7

real	0m0.059s
user	0m0.082s
sys	0m0.005s

./tmac/groff_trace.7

real	0m0.019s
user	0m0.023s
sys	0m0.005s

./tmac/groff_www.7

real	0m0.026s
user	0m0.036s
sys	0m0.002s

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
@ 2023-04-07 15:06               ` Eli Zaretskii
  2023-04-07 15:08                 ` Larry McVoy
                                   ` (2 more replies)
  2023-04-07 16:08               ` Colin Watson
  2023-04-08 11:24               ` Ralph Corderoy
  2 siblings, 3 replies; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-07 15:06 UTC (permalink / raw)
  To: G. Branden Robinson
  Cc: alx.manpages, dirk, cjwatson, linux-man, help-texinfo, groff

> Date: Fri, 7 Apr 2023 09:43:19 -0500
> From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: alx.manpages@gmail.com, dirk@gouders.net, cjwatson@debian.org,
> 	linux-man@vger.kernel.org, help-texinfo@gnu.org, groff@gnu.org
> 
> ...which brings me to the other factor, of which I'm more confident: man
> page rendering times are much lower than they were in Unix's early days.
> 
> On my system, all groff man pages but one render in between a tenth and
> a fortieth of a second.  The really huge pages like groff(7),
> groff_char(7), and groff_diff(7) are toward the upper end of this range,
> because they are long, at ~20-25 U.S. letter pages when formatted for
> PostScript or PDF, or have many large tables so the tbl(1) preprocessor
> produces a lot of output.
> 
> The outlier is groff_mdoc(7) at just over one-third of a second.

Some people consider 0.1 sec, let alone 0.3 sec, to be long enough to
be annoying.

Also, did you try with libpng.3 or gcc.1?

>   Human subjects need a minimum of about 0.1 second of visual experience
>   or about .01 to .02 second of auditory experience to perceive
>   duration; any shorter experiences are called instantaneous.
>   -- Encyclopædia Britannica[2]

IME, 0.05 sec of visual experiences is closer to reality.

Anyway, I won't argue.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 15:06               ` Eli Zaretskii
@ 2023-04-07 15:08                 ` Larry McVoy
  2023-04-07 17:07                 ` man page rendering speed Ingo Schwarze
  2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
  2 siblings, 0 replies; 73+ messages in thread
From: Larry McVoy @ 2023-04-07 15:08 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: G. Branden Robinson, alx.manpages, dirk, cjwatson, linux-man,
	help-texinfo, groff

On Fri, Apr 07, 2023 at 06:06:39PM +0300, Eli Zaretskii wrote:
> > Date: Fri, 7 Apr 2023 09:43:19 -0500
> > From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> > Cc: alx.manpages@gmail.com, dirk@gouders.net, cjwatson@debian.org,
> > 	linux-man@vger.kernel.org, help-texinfo@gnu.org, groff@gnu.org
> > 
> > ...which brings me to the other factor, of which I'm more confident: man
> > page rendering times are much lower than they were in Unix's early days.
> > 
> > On my system, all groff man pages but one render in between a tenth and
> > a fortieth of a second.  The really huge pages like groff(7),
> > groff_char(7), and groff_diff(7) are toward the upper end of this range,
> > because they are long, at ~20-25 U.S. letter pages when formatted for
> > PostScript or PDF, or have many large tables so the tbl(1) preprocessor
> > produces a lot of output.
> > 
> > The outlier is groff_mdoc(7) at just over one-third of a second.
> 
> Some people consider 0.1 sec, let alone 0.3 sec, to be long enough to
> be annoying.

True but try and balance that with what they are trying to do, clean 
things up.  I'm retired so my opinion doesn't count but I think they
are on the right path.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
  2023-04-07 15:06               ` Eli Zaretskii
@ 2023-04-07 16:08               ` Colin Watson
  2023-04-08 11:24               ` Ralph Corderoy
  2 siblings, 0 replies; 73+ messages in thread
From: Colin Watson @ 2023-04-07 16:08 UTC (permalink / raw)
  To: G. Branden Robinson
  Cc: Eli Zaretskii, alx.manpages, dirk, linux-man, help-texinfo, groff

On Fri, Apr 07, 2023 at 09:43:19AM -0500, G. Branden Robinson wrote:
> At 2023-04-07T09:36:10+0300, Eli Zaretskii wrote:
> > > From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> [re-running *roff when a viewing a man page and resizing the terminal]
> > > Seems like it shouldn't be impossible to me, but what I imagine
> > > would require a little reëngineering of man(1), perhaps to spawn a
> > > little custom program to manage zcat/nroff pipeline it constructs.
> > > This little program's sole job could be to be aware of this pipeline
> > > and listen for SIGWINCH; if it happens, kill the rest of the
> > > pipeline and reëxecute it.

I didn't see the rest of the thread, but one significant complexity here
would be interacting with the pager to arrange for the viewing position
to be returned to where it was pre-SIGWINCH; bear in mind that the pager
is user-configurable (less(1) is common but not universal) and isn't
directly part of man(1).

> > This should be possible, but it flies in the face of the feature
> > whereby formatted man pages are kept for future perusal, which is
> > therefore faster:
> 
> You're referring to cat pages.  As far as I know, these are on their way
> out if not already gone.  Colin Watson, who has maintained the man-db
> implementation of man(1)[1] for something like 20 years, can speak more
> authoritatively to this, but as I understand it, the advent of resizable
> xterm windows started to kill the utility of cat pages decades ago and
> the increasing importance of desktop environments accelerated their
> demise.

Another major change in that period was the general though gradual move
to UTF-8, making it somewhat unclear for some time which encoding should
be preferred when rendering cat pages.  (Since 2010, man-db always saves
cat pages in UTF-8 and converts to the proper encoding at display time,
but it took a while to settle on this approach and in the meantime there
were perhaps four or five years when cat pages were commonly unavailable
in practice.  Even then, very few people cared enough to complain.)

Furthermore, the traditional approach to saving system-wide cat pages
involved having man(1) be set-id.  From a modern standpoint, this was
obviously problematic, and it caused both security vulnerabilities and
more ordinary bugs.  There are ways in which this might have been
rearranged to be less of a serious problem, but if you can avoid
bothering with set-id at all then that's clearly safer.

My general approach to cat pages for at least the last ten years has
been to put as little effort into them as possible.  This has so far
included not outright removing support for them (since dealing with the
resulting support load, even if small, would itself be effort), but if
an improvement to man(1) has some kind of degradation of cat pages as a
side-effect then I usually won't hesitate to make it anyway.

> ...which brings me to the other factor, of which I'm more confident: man
> page rendering times are much lower than they were in Unix's early days.

Indeed, and it's been the case for at least a decade that rendering
times have been short enough that they can largely be considered
negligible.  (For most of that time my own equipment has not been
particularly on the bleeding edge of performance.)

> The bottom line is that, even on BSD systems (where mdoc(7) is preferred
> to man(7)), a user can expect a man page to render from *roff source in
> less than, say, half a second in the worst case, and the median
> GNU/Linux user can expect to start reading a man page "instantaneously":

The other thing to note explicitly here is that what normally matters
most is the time to _start_ reading, not the time to render the whole
page.  My usual example for where this makes a difference is zshall(1),
which is a concatenation of several other pages and comes to about 30000
lines of 80-column rendered output; on my system this takes about 0.6
seconds to render in its entirety, but typing "man zshall" nevertheless
shows the first page subjectively instantaneously.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed
  2023-04-07 15:06               ` Eli Zaretskii
  2023-04-07 15:08                 ` Larry McVoy
@ 2023-04-07 17:07                 ` Ingo Schwarze
  2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
  2 siblings, 0 replies; 73+ messages in thread
From: Ingo Schwarze @ 2023-04-07 17:07 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: g.branden.robinson, alx.manpages, dirk, cjwatson, linux-man,
	help-texinfo, groff

Hi Eli,

Eli Zaretskii wrote on Fri, Apr 07, 2023 at 06:06:39PM +0300:
> G. Branden Robinson wrote on Date: Fri, 7 Apr 2023 09:43:19 -0500

>> ...which brings me to the other factor, of which I'm more confident: man
>> page rendering times are much lower than they were in Unix's early days.
>> 
>> On my system, all groff man pages but one render in between a tenth and
>> a fortieth of a second.  The really huge pages like groff(7),
>> groff_char(7), and groff_diff(7) are toward the upper end of this range,
>> because they are long, at ~20-25 U.S. letter pages when formatted for
>> PostScript or PDF, or have many large tables so the tbl(1) preprocessor
>> produces a lot of output.
>> 
>> The outlier is groff_mdoc(7) at just over one-third of a second.

> Some people consider 0.1 sec, let alone 0.3 sec, to be long enough to
> be annoying.
> 
> Also, did you try with libpng.3 or gcc.1?

For what it's worth, on my notebook the largest page is ffmpeg-all(1)
at about 1.6 Megabyte man(1) source code, 42k lines, 182k words,
1.65 Megabyte rendered to UTF-8 terminal output.

Rendering that beast takes three and a half seconds on my notebook
with groff and two thirds of a second with mandoc(1), i.e. mandoc
is more than five times faster on this page than groff.

The largest mdoc(7) page here happens to be openssl(1) at 193 Kilobyte
of mdoc(7) source code, 5k lines, 27k words, 265 Kilobyte of UTF-8
terminal output in rendered form.  It takes 1.3 seconds with groff and
on tenth of a second with mandoc, so mandoc is faster by a factor of
thirteen in this case.  In general, the speed gain of mandoc is much
larger for mdoc(7) than for man(7) input because mandoc refrains from
using recursion in the implementation of the mdoc(7) language.

Relative speed gains also tend to be larger for large pages than for
small ones, so these factors of five and thirteen are on the upper
end of the spectrum.  Then again, who cares about rendering speeds
for small pages, apart from Michael Stapelberg when he pre-renders
stuff he is planning to serve on manpages.debian.org?

In fact, speed was among the design goals of mandoc when development
started about 15 years ago (though the goal was secondary to the goals
of BSD licensing, ease of use, and security, and in the meantime,
the goal of high-quality HTML output has also become more important).

Consequently, people who highly value speed in manual page display
might consider mandoc as an option for a manual page searching,
formatting and display system.  Several Linux distributions nowadays
offer the configuration option of using it out of the box (including
Fedora, openSUSE and Arch), and some even use it by default (including
Alpine, Void, illumos and, of course, almost all BSD systems).

Of course, it is *not* a replacement for groff.  Mandoc only provides
rather poor PDF output and it can only format manual pages, not
general-purpose roff(7) documents.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 15:06               ` Eli Zaretskii
  2023-04-07 15:08                 ` Larry McVoy
  2023-04-07 17:07                 ` man page rendering speed Ingo Schwarze
@ 2023-04-07 19:04                 ` Alejandro Colomar
  2023-04-07 19:28                   ` Gavin Smith
  2 siblings, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-07 19:04 UTC (permalink / raw)
  To: Eli Zaretskii, G. Branden Robinson
  Cc: dirk, cjwatson, linux-man, help-texinfo, groff


[-- Attachment #1.1: Type: text/plain, Size: 4143 bytes --]

Hi!

On 4/7/23 17:06, Eli Zaretskii wrote:
>> Date: Fri, 7 Apr 2023 09:43:19 -0500
>> From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
>> Cc: alx.manpages@gmail.com, dirk@gouders.net, cjwatson@debian.org,
>> 	linux-man@vger.kernel.org, help-texinfo@gnu.org, groff@gnu.org
>>
>> ...which brings me to the other factor, of which I'm more confident: man
>> page rendering times are much lower than they were in Unix's early days.
>>
>> On my system, all groff man pages but one render in between a tenth and
>> a fortieth of a second.  The really huge pages like groff(7),
>> groff_char(7), and groff_diff(7) are toward the upper end of this range,
>> because they are long, at ~20-25 U.S. letter pages when formatted for
>> PostScript or PDF, or have many large tables so the tbl(1) preprocessor
>> produces a lot of output.
>>
>> The outlier is groff_mdoc(7) at just over one-third of a second.
> 
> Some people consider 0.1 sec, let alone 0.3 sec, to be long enough to
> be annoying.
> 
> Also, did you try with libpng.3 or gcc.1?

$ time man -w gcc | xargs zcat | groff -man -Tutf8 2>/dev/null >/dev/null

real	0m0.406s
user	0m0.534s
sys	0m0.042s

But as others said, I don't really care about the time it takes to format
the entire document, but rather the first 24 lines, which is more like
instantaneous (per your own definition of ~0.5 s).

$ time man -w gcc | xargs zcat | groff -man -Tutf8 2>/dev/null | head -n24 >/dev/null
xargs: zcat: terminated by signal 13

real	0m0.064s
user	0m0.051s
sys	0m0.030s


As a curiosity, mandoc(1) seems to be faster for rendering the entire document, but slower to "start reading".

$ time man -w gcc | xargs zcat | mandoc >/dev/null

real	0m0.270s
user	0m0.218s
sys	0m0.057s

$ time man -w gcc | xargs zcat | mandoc | head -n24 >/dev/null

real	0m0.136s
user	0m0.119s
sys	0m0.023s


As a disclaimer, I do sometimes care about reading entire documents,
but even in that case, it's not so bad.  I can read the few thousand man
pages in the Linux man-pages in about a few seconds, or a minute.  [1]


> 
>>   Human subjects need a minimum of about 0.1 second of visual experience
>>   or about .01 to .02 second of auditory experience to perceive
>>   duration; any shorter experiences are called instantaneous.
>>   -- Encyclopædia Britannica[2]
> 
> IME, 0.05 sec of visual experiences is closer to reality.

This is the time to load the first 24 lines of almost any page.
gcc(1), which is one of the longest I have, takes 0.6 s.  MAX(3),
which is one of the shortest I have, takes 0.4 s.

> 
> Anyway, I won't argue.

Cheers,

Alex


[1]:  Here's why I do care about time to lead entire pages.  I know
      I can optimize this pipeline by calling groff(1) directly, or even
      better, mandoc(1), now that I know it's faster for entire docs,
      but since I haven't used this function for a long time, I didn't
      spend time optimizing it.

man_lsfunc()
{
	if [ $# -lt 1 ]; then
		>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
		return $EX_USAGE;
	fi

	for arg in "$@"; do
		man_section "$arg" 'SYNOPSIS';
	done \
	|sed_rm_ccomments \
	|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
	|grep '^[0-9]' \
	|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
	|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
	|uniq;
}


man_section()
{
	if [ $# -lt 2 ]; then
		>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
		return $EX_USAGE;
	fi

	local page="$1";
	shift;
	local sect="$*";

	find "$page" -type f \
	|xargs wc -l \
	|grep -v -e '\b1 ' -e '\btotal\b' \
	|awk '{ print $2 }' \
	|sort \
	|while read -r manpage; do
		(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
		 for s in $sect; do
			<"$manpage" \
			sed -n \
				-e "/^\.SH $s/p" \
				-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
		 done;) \
		|man -P cat -l - 2>/dev/null;
	done;
}


man_lsfunc() is quite slow, but it's acceptable to me, since I only
run it sporadically.

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
@ 2023-04-07 19:28                   ` Gavin Smith
  2023-04-07 20:43                     ` Alejandro Colomar
  0 siblings, 1 reply; 73+ messages in thread
From: Gavin Smith @ 2023-04-07 19:28 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Eli Zaretskii, G. Branden Robinson, dirk, cjwatson, linux-man,
	help-texinfo, groff

On Fri, Apr 07, 2023 at 09:04:03PM +0200, Alejandro Colomar wrote:
> $ time man -w gcc | xargs zcat | groff -man -Tutf8 2>/dev/null >/dev/null
> 
> real	0m0.406s
> user	0m0.534s
> sys	0m0.042s
> 
> But as others said, I don't really care about the time it takes to format
> the entire document, but rather the first 24 lines, which is more like
> instantaneous (per your own definition of ~0.5 s).

Here's a sample comparison of "man" versus "info" on my system
(relevant as help-texinfo@gnu.org is being copied into this
discussion):

$ time info gcc > temp

real    0m0.112s
user    0m0.085s
sys     0m0.017s
$ ls -l temp
-rw-rw-r-- 1 g g 3.0M Apr  7 20:14 temp
$ time man gcc > temp
troff: <standard input>:11612: warning [p 111, 6.0i]: can't break line
troff: <standard input>:11660: warning [p 111, 13.8i]: can't break line

real    0m0.620s
user    0m1.004s
sys     0m0.114s
$ ls -l temp
-rw-rw-r-- 1 g g 1.2M Apr  7 20:16 temp

I find the startup of "info" to be instantaneous, whereas man pages often
have a noticeable delay.

Doubtless man would have more comparable runtimes were cat pages being used.

Being able to reformat the text for arbitrary widths is of limited use,
in my opinion, as text becomes more unreadable at long line lengths.  I
suppose cat pages could be provided in a series of sensible widths.  (The
same is true in theory for Info, but I've never heard of anybody using
widths for Info output other than the default 72 columns.)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 19:28                   ` Gavin Smith
@ 2023-04-07 20:43                     ` Alejandro Colomar
  0 siblings, 0 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-07 20:43 UTC (permalink / raw)
  To: Gavin Smith
  Cc: Eli Zaretskii, G. Branden Robinson, dirk, cjwatson, linux-man,
	help-texinfo, groff, Ingo Schwarze


[-- Attachment #1.1: Type: text/plain, Size: 2868 bytes --]

Hi Gavin,

On 4/7/23 21:28, Gavin Smith wrote:
> On Fri, Apr 07, 2023 at 09:04:03PM +0200, Alejandro Colomar wrote:
>> $ time man -w gcc | xargs zcat | groff -man -Tutf8 2>/dev/null >/dev/null
>>
>> real	0m0.406s
>> user	0m0.534s
>> sys	0m0.042s
>>
>> But as others said, I don't really care about the time it takes to format
>> the entire document, but rather the first 24 lines, which is more like
>> instantaneous (per your own definition of ~0.5 s).
> 
> Here's a sample comparison of "man" versus "info" on my system
> (relevant as help-texinfo@gnu.org is being copied into this
> discussion):
> 
> $ time info gcc > temp
> 
> real    0m0.112s
> user    0m0.085s
> sys     0m0.017s
> $ ls -l temp
> -rw-rw-r-- 1 g g 3.0M Apr  7 20:14 temp
> $ time man gcc > temp
> troff: <standard input>:11612: warning [p 111, 6.0i]: can't break line
> troff: <standard input>:11660: warning [p 111, 13.8i]: can't break line
> 
> real    0m0.620s
> user    0m1.004s
> sys     0m0.114s
> $ ls -l temp
> -rw-rw-r-- 1 g g 1.2M Apr  7 20:16 temp
> 
> I find the startup of "info" to be instantaneous, whereas man pages often
> have a noticeable delay.

The times you showed are not _startup_ times, but rather the time for
formatting the _entire_ documents.  Remember that less(1) already shows you
the first lines when they are ready, without waiting for the rest of the
pipe.

I've optimized a moment ago the functions I had for listing all the
functions that appear in the Linux man-pages' SYNOPSIS sections, and got it
down from 55 s (calling man(1)) to just 14 s (calling groff(1)) and further
to 4 s (calling mandoc(1)).

That's parsing around a thousand pages, extracting the SYNOPSIS with sed(1),
formatting it, and parsing that to find function prototypes.

I guess that's one of the worst cases of when one would care about the time
it takes to format a man page, and it's a very reasonable one.


> 
> Doubtless man would have more comparable runtimes were cat pages being used.

The startup times don't really change.  It's around 0.5 s.  However, the
time to show the entire page is the same (i.e., virtually all the time is
spent in finding and opening the page)

> 
> Being able to reformat the text for arbitrary widths is of limited use,
> in my opinion, as text becomes more unreadable at long line lengths.

I often want it for the opposite reason: I want to make the terminal
narrower (e.g., for pasting contents into an email, at 72 or 66 columns).

>  I
> suppose cat pages could be provided in a series of sensible widths.  (The
> same is true in theory for Info, but I've never heard of anybody using
> widths for Info output other than the default 72 columns.)

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* reformatting man pages at SIGWINCH (was: Playground pager lsp(1))
  2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
  2023-04-07  6:36           ` Eli Zaretskii
@ 2023-04-07 21:26           ` Alejandro Colomar
  2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
  1 sibling, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-07 21:26 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Eli Zaretskii, dirk, linux-man, help-texinfo, groff


[-- Attachment #1.1: Type: text/plain, Size: 2143 bytes --]

Hi Branden,

On 4/7/23 04:18, G. Branden Robinson wrote:
> At 2023-04-06T03:10:59+0200, Alejandro Colomar wrote:
>> Hmm, now that I think, it's probably an issue of coordinating man(1)
>> and less(1).  I sometimes wish that when I resize a window where I'm
>> reading a man page, it would reformat the page from source.
> 
> Seems like it shouldn't be impossible to me, but what I imagine would
> require a little reëngineering of man(1), perhaps to spawn a little
> custom program to manage zcat/nroff pipeline it constructs.  This little
> program's sole job could be to be aware of this pipeline and listen for
> SIGWINCH; if it happens, kill the rest of the pipeline and reëxecute it.
> 
> Maybe I thought of it this way because (I suspect) it aligns with my
> vision I've expressed elsewhere of man(1) having unfortunately
> aggregated two separate functions: librarian vs. renderer.
> Historically, of course the latter function was almost vestigial, since
> early Unix systems had no pager program and their man pages required
> little to no preprocessing; man(1) slowly accreted into a larger thing.
> 
>> Of course, that might be a problem for keeping track of where I was,
>> since lines moved around.
> 
> That seems like a harder problem to me.  You'd need a way for the pager
> to communicate position information back to the mini-man renderer
> program I envision.  Two challenges here: (1) what part of the screen
> was the reader actually looking at?  (2) how is the pager supposed to
> know how to map any given location on the screen back to a place in the
> unrendered source document so it can be accurately found when the
> document is rerendered?  These feel nearly intractable to me.  But maybe
> I have a poor imagination.

Maybe it could be done with .SH and .SS.  The heuristics to find these
are simple.  It wouldn't be very precise, but it could try to find the
closest (only upwards) (sub)section heading.  With some luck, .TP would
also be helpful.

Cheers,
Alex


-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-06  8:11         ` Eli Zaretskii
  2023-04-06  8:48           ` Gavin Smith
@ 2023-04-07 22:01           ` Alejandro Colomar
  2023-04-08  7:05             ` Eli Zaretskii
  1 sibling, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-07 22:01 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: dirk, linux-man, help-texinfo, наб,
	G. Branden Robinson, groff, Colin Watson


[-- Attachment #1.1: Type: text/plain, Size: 15285 bytes --]

Hi Eli,

On 4/6/23 10:11, Eli Zaretskii wrote:
>> Date: Thu, 6 Apr 2023 03:10:59 +0200
>> Cc: dirk@gouders.net, linux-man@vger.kernel.org, help-texinfo@gnu.org
>> From: Alejandro Colomar <alx.manpages@gmail.com>
>>
>>> This last sentence is a misunderstanding.  The goal of Texinfo is not
>>> to improve the man pages.  Texinfo is a completely different approach
>>> to software documentation, which allows to write large books and then
>>> produce various on-line and off-line formats to read and efficiently
>>> search those books.
>>
>> "The manual was intended to be typeset; some detail is sacrificed on
>> terminals." (man(1), _Unix Time-Sharing System Programmer's Manual_,
>> Eighth Edition, Volume 1, February 1985)
>>
>> You mean books like this one?  Courtesy of groff(1)'s Deri James =)
>> <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.04.01.pdf>
>>
>> Or maybe you prefer HTML?
>> <https://man7.org/linux/man-pages/man1/intro.1.html>
> 
> No, I mean books like "GNU Emacs Manual" or "Debugging with GDB"
> (https://shop.fsf.org/collection/books-docs).  Or "War and Peace", for
> that matter.
> 
>> As to efficiency, I'm not going to open that melon, because we're
>> both very biased to be efficient on the formats we each maintain.
>> I'll just say that I don't see an objective winner in those terms.
> 
> How do you find the description of, say, "dereference symbolic link"
> (to take just a random example from the Emacs manual) when the actual
> text of the manual include neither this string nor matches for any
> related regular expressions, like "dereference.*link"?

$ apropos link | grep sym | head -n5
readlink (2)         - read value of a symbolic link
readlinkat (2)       - read value of a symbolic link
sln (8)              - create symbolic links
symlink (2)          - make a new name for a file
symlink (7)          - symbolic link handling

I bet you're looking for readlink(2) and symlink(7), aren't you?

> 
> The way Info does it is to use the index (which should be present in
> any respectable reference document) to find description of the
> corresponding subject.  The indexing, which is done by the author of
> the document, if it's a good indexing, should include index entries
> that specify subjects the reader could have in mind when he/she is
> looking for this kind of information.

We do that too in man(7).  For example, we improved the "index" for
proc(5) recently, after наб lost some time without finding proc(5)
in the list of pages that were interesting for the topic at hand:


commit 2e1c1a57f138eedd35b7b2a825002fddb12d240f
Author: наб <nabijaczleweli@nabijaczleweli.xyz>
Date:   Sat Apr 1 00:04:52 2023 +0200

    proc.5: NAME: Add "system information, and sysctl"
    
    procfs hosts a whole host of information about the system, as well as
    sysctls; proc(5) hosts a description of a lot of sysctls, and at present
    there's no way to find that out.
    
    Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Cc: Jakub Wilk <jwilk@jwilk.net>
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

diff --git a/man5/proc.5 b/man5/proc.5
index 521402fe8..233cc1c9d 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -36,7 +36,7 @@
 .\"
 .TH proc 5 (date) "Linux man-pages (unreleased)"
 .SH NAME
-proc \- process information pseudo-filesystem
+proc \- process information, system information, and sysctl pseudo-filesystem
 .SH DESCRIPTION
 The
 .B proc


After this patch, if you apropos "system" or "sysctl", you'll see
proc(5) pop up in your list.

> 
> The corresponding index-searching commands of Info readers are a
> primary means for finding information quickly and efficiently,
> avoiding too many false positives and also avoiding frustrating
> misses, i.e., searches that fail to find anything pertinent.

That's no different than apropos(1).  The only problem is when a
man page feels like a one-page book.  But if you split the book
into several pages, then the index is useful to know which page
you want.

> 
> So this is not about objectivity, this is about features that either
> are present in the documentation system or are absent.  I prefer the
> Info format to the HTML format of the same manual for this single
> reason: HTML browsers don't have the index searching capabilities
> (this is hopefully about to change, I hope, see the JS support in
> latest Texinfo), and that issue alone was enough to avert me from
> HTML, because I cannot afford wasting time on looking up information I
> cannot find instantly.

Yep, I also prefer man(1) over HTML man pages for similar reasons :).
I can do whatis(1) and apropos(1) (although some man-pages websites
have this capability too, but then I can't grep those results in the
browser).

> 
>> About variety of output formats, anything that can be produced by
>> groff(1), man(7) can be translated.  And groff(1) can do many formats.
> 
> Groff (and any other typesetting program) can be used for writing any
> kind of documents.  I'm not talking about the processors, I'm talking
> about the design of the documentation system as a whole and about what
> the products actually look like.  IOW, I'm talking about the man pages
> produced by the typesetter, not about what can be done with the
> typesetter.
> 
>>> Man pages have no means of specifying structure
>>
>> .SH, .SS, .TP, .TQ, and very soon (hopefully weeks not months) .MR
> 
> These provide just one level.

We have many levels:

book:		/opt/local/foobar/man/
volume:		man2/, man3/, ...
chapter:	man3/, man3type/, ...
page:		sscanf(3)
section:	sscanf(3)/DESCRIPTION
subsection:	sscanf(3)/DESCRIPTION/Conversions
tags:		sscanf(3)/DESCRIPTION/Conversions/n

Branden, I now remember your wondering about MR and linking to
specific locations in a page...  Maybe we could use such a URI-like
syntax for that.  I guess it's not yet taken by any software, so we
should be free to define paths in the 'man:' schema to mean this?

> 
> And how frequently are they used in actual man pages out there, even
> when available?

Used in source man(7)?  Always.

> 
>> Those can be used to produce very precise links such as this one
>> (one of my favourite references when reviewing man-pages patches):
>> <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.04.01.pdf#pdf%3Abm11886>
> 
> It's full of mojibake when I try reading it here.  But anyway: what
> structure do you have there?  It looks just a long sequence of
> separate man pages.

There's a navigation panel in the left in most (all?) PDF readers.
You can use that to navigate to the page you want, and get hyperlinks
to pages or their contents.

> 
>>> and hyper-links except
>>> by loosely-coupling pages via "SEE ALSO" cross-references at the end;
>>> they have no means of quickly and efficiently finding some specific
>>> subject except by text search (which usually produces a lot of false
>>> positives).
>>
>> I guess you mean searching from the command line by the name of the
>> parameter to a function, or what?
> 
> No, I mean looking a specific subject of interest without having to
> search/read through the entire document.

See symlink above.

> 
>> I would be interested in a more detailed description of what you
>> want to be able to search in current pages (hopefully ones that I
>> maintain, so I can speak of them) that you can't find easily?  Maybe
>> I can help making something more accessible.
> 
> See above, the example of using index-searching commands.

Yep.  I hope my answer about symlinks satisfied you.

Cheers,
Alex

> 
>>> By contrast, Texinfo documents have sectioning structure, have
>>> cross-references that can appear where you need them and point
>>> anywhere else in the document (or into another document).
>>
>> This was discussed as a possible extension to '.MR'.  We're just not
>> sure that there's a real need for that in manual pages (although
>> there's not a consensus on that regard, and Branden, which I'm sure
>> is reading this, may jump in at any moment :).
> 
> Cannot say about man pages, but in a serious documentation of any
> computer software you always need cross-references, because you cannot
> make any description self-contained without repeating the same stuff
> over and over and over again.
> 
> Here's a short examples from a random place in the Emacs Lisp
> Reference manual:
> 
>      When an editing command returns to the editor command loop, Emacs
>   automatically calls ‘set-buffer’ on the buffer shown in the selected
>   window (*note Selecting Windows::).  This is to prevent confusion: it
>   ensures that the buffer that the cursor is in, when Emacs reads a
>   command, is the buffer to which that command applies (*note Command
>   Loop::).  Thus, you should not use ‘set-buffer’ to switch visibly to a
>   different buffer; for that, use the functions described in *note
>   Switching Buffers::.
> 
> The three places which say with "see SOMETHING" are cross-references
> to other parts of the manual.  Without being able to cross-reference
> there, the text would have to explain what it means by "selected
> window", what it means by "commands" and "command loop", and mention
> explicitly the functions to switch to a buffer which are already
> described in detail elsewhere.  This allows readers who already know
> about those subjects to read the text without having to skip large
> amounts of unnecessary information, while also allowing readers who
> are not sure they know about that to be able to follow the link, read
> there, and then come back to the same place to continue reading.
> 
>>>  They also
>>> have indexing and commands that allow the reader to use the index in
>>> order to find the subject he/she is interested in very quickly and
>>
>> You mean whatis(1) and apropos(1)?
> 
> No.  These perform text searches on the titles of the man pages, and
> are therefore limited to what is in the title.  Indexing is much more
> powerful, and works on the topics in the index (which, as explained
> above, could contain text not present anywhere else in the document).
> And every respectful Info manual has an index (some have several
> indices).  See above about the commands which use the index.
> 
>>> accurately, even if the text of the index entry doesn't appear
>>> anywhere in the manual.
>>
>> man pages have several ways:
>>
>> -  Including keywords in the NAME section.
>> -  Link pages.
>> -  TH line.
> 
> This is not enough, IME.  You need a way of "tagging" a chunk of text
> as describing, or being pertinent to, a particular subject, even if
> that subject does not appear literally in the text the reader will
> see.  That's because when readers are after some specific material,
> they don't always have in mind the exact words used in the manual for
> describing that material, they could have some alternative phrases in
> their minds.  Good indexing anticipates this in advance, and provides
> index entries for those alternative phrases, allowing readers to find
> stuff quickly.
> 
>> Of course, this is for the terminal.  For PDF or HTML, you can
>> get hyperlinks to any subsection (and in the future maybe even
>> tagged paragraphs) within a page.
> 
> In Info, references to any paragraph are available since long ago.
> They are invaluable in some situations, especially when some section
> is very long and you want to point to a very specific part thereof.
> 
>>> How can you document a large and flexible software package, such as
>>> GDB or Texinfo or Emacs, in man pages?
>>
>> git is a huge program, yet its man pages are quite useful.
> 
> Git is a huge heap of separate commands, with very little to glue them
> together in terms of documented functionalities.  Still, even in Git,
> there's the stuff that belongs to neither command in particular, and
> thus is documented in man pages with invented names like
> "gitrevisions", which is impossible to guess in advance for a newbie
> who needs this information.
> 
> Moreover, the introduction material and the explanation of basic
> concepts is not in man pages, but in a separate HTML document ("The
> Git User's Manual"), and likewise the API documentation, which in
> itself is a telltale sign.
> 
> While something like a huge heap of man pages is perhaps borderline
> reasonable for Git, it isn't reasonable for programs which are not
> easily broken into separate independent "pages", like GDB and Emacs.
> The more complex is the system of objects and concepts manipulated by
> the software, the less appropriate is the man-page format for
> describing it.
> 
>> Just split your documentation at the right boundary, which
>> usually requires a good design for your software that allows
>> such division.
> 
> Whether the manual is split or not is immaterial.  Info manuals can
> also be split.  The relevant issue is what the viewer allows the
> reader to do to read these chunks in a reasonable way, using efficient
> commands and features to find related information quickly.
> 
>> The fact that current man(1) implementations don't exploit
>> the whole power of man(7) doesn't mean you can't design a
>> software that does.
> 
> Indeed, it doesn't mean that.  But we are discussing what is there,
> not what could be there in some distant future.
> 
>> I'm sure you could build something similar to info(1) that
>> got man(7) pages as its input.
> 
> No!  The information about subsections, cross-references, and indices
> is missing.  That information must be there to begin with, otherwise
> it cannot be recreated, because it's inserted by humans, not by
> programs.
> 
>>> It isn't missing.  The TOC is presented as top-level menu in each
>>> manual, and large manuals have also the "detailed menu" with all the
>>> sub-nodes spelled out.  In addition, the Emacs Info reader has the
>>> Info-toc command, which presents a structured menu with all the
>>> sectioning levels of a manual even if the manual didn't produce it.
>>
>> Ahh, yes, this is true.  What I found missing is a kind of a map for
>> knowing what I have available for navigating (also the fact that I
>> don't usually run info(1) makes me be a bit fuzzy on detailing what
>> is it that I miss from it).  So, info(1) has a map of the sections
>> available in a page, and does it also have a map of all the pages
>> in the system (or whatever you call your pages, I don't yet really
>> understand the organization of info manuals).
> 
> Yes, it does.  If you invoke 'info' with no arguments, it will show
> the "directory" of all the installed manuals -- a large menu where
> each manual has at least one line explaining what the manual
> describes.  Some manuals have much more than one line; examples
> include Coreutils and Binutils (which have a line for each individual
> command) and glibc (which has a line for every _function_).

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
@ 2023-04-07 22:09             ` Dirk Gouders
  2023-04-07 22:16               ` Alejandro Colomar
  2023-04-08 11:40               ` Ralph Corderoy
  0 siblings, 2 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-07 22:09 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: G. Branden Robinson, Eli Zaretskii, linux-man, help-texinfo, groff

Alejandro Colomar <alx.manpages@gmail.com> writes:

> Hi Branden,
>
> On 4/7/23 04:18, G. Branden Robinson wrote:
>> At 2023-04-06T03:10:59+0200, Alejandro Colomar wrote:
>>> Hmm, now that I think, it's probably an issue of coordinating man(1)
>>> and less(1).  I sometimes wish that when I resize a window where I'm
>>> reading a man page, it would reformat the page from source.
>> 
>> Seems like it shouldn't be impossible to me, but what I imagine would
>> require a little reëngineering of man(1), perhaps to spawn a little
>> custom program to manage zcat/nroff pipeline it constructs.  This little
>> program's sole job could be to be aware of this pipeline and listen for
>> SIGWINCH; if it happens, kill the rest of the pipeline and reëxecute it.
>> 
>> Maybe I thought of it this way because (I suspect) it aligns with my
>> vision I've expressed elsewhere of man(1) having unfortunately
>> aggregated two separate functions: librarian vs. renderer.
>> Historically, of course the latter function was almost vestigial, since
>> early Unix systems had no pager program and their man pages required
>> little to no preprocessing; man(1) slowly accreted into a larger thing.
>> 
>>> Of course, that might be a problem for keeping track of where I was,
>>> since lines moved around.
>> 
>> That seems like a harder problem to me.  You'd need a way for the pager
>> to communicate position information back to the mini-man renderer
>> program I envision.  Two challenges here: (1) what part of the screen
>> was the reader actually looking at?  (2) how is the pager supposed to
>> know how to map any given location on the screen back to a place in the
>> unrendered source document so it can be accurately found when the
>> document is rerendered?  These feel nearly intractable to me.  But maybe
>> I have a poor imagination.
>
> Maybe it could be done with .SH and .SS.  The heuristics to find these
> are simple.  It wouldn't be very precise, but it could try to find the
> closest (only upwards) (sub)section heading.  With some luck, .TP would
> also be helpful.

Yes, that should give nice results.  But for manual pages like git(1)
with large areas between those this becomes difficult, again.

Today, I experimented with one more heuristics, adjusting the current
position according to the proportional change of avg. line size and also
change of window dimension (horizontal) but all of those didn't get better
results than what I currently implemented (stay at the position).

Out of curiosity, I checked how firefox behaves on horizontal resizes
and comparing to some of those results, lsp is not the worst on earth ;-)

If time allows, I want to see if working with Levenshtein distances
could get exact results.  Perhaps this will turn out to be too expensive
but maybe the fact that the area to be checked is limited helps...

Regards,

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
@ 2023-04-07 22:16               ` Alejandro Colomar
  2023-04-10 19:05                 ` Dirk Gouders
  2023-04-08 11:40               ` Ralph Corderoy
  1 sibling, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-07 22:16 UTC (permalink / raw)
  To: Dirk Gouders
  Cc: G. Branden Robinson, Eli Zaretskii, linux-man, help-texinfo, groff


[-- Attachment #1.1: Type: text/plain, Size: 1541 bytes --]

Hi Dirk,

On 4/8/23 00:09, Dirk Gouders wrote:
>> Maybe it could be done with .SH and .SS.  The heuristics to find these
>> are simple.  It wouldn't be very precise, but it could try to find the
>> closest (only upwards) (sub)section heading.  With some luck, .TP would
>> also be helpful.
> 
> Yes, that should give nice results.  But for manual pages like git(1)
> with large areas between those this becomes difficult, again.
> 
> Today, I experimented with one more heuristics, adjusting the current
> position according to the proportional change of avg. line size and also
> change of window dimension (horizontal) but all of those didn't get better
> results than what I currently implemented (stay at the position).
> 
> Out of curiosity, I checked how firefox behaves on horizontal resizes
> and comparing to some of those results, lsp is not the worst on earth ;-)
> 
> If time allows, I want to see if working with Levenshtein distances
> could get exact results.  Perhaps this will turn out to be too expensive
> but maybe the fact that the area to be checked is limited helps...

For something simpler, you could just count words since the start of the
section divided by total words in the section.  That should be fast, and
I expect, also quite precise.  Hyphenating might work against you on
this, but on average it shouldn't move you too much.

Cheers,
Alex

> 
> Regards,
> 
> Dirk

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Playground pager lsp(1)
  2023-04-07 22:01           ` Alejandro Colomar
@ 2023-04-08  7:05             ` Eli Zaretskii
  2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
  0 siblings, 1 reply; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-08  7:05 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: dirk, linux-man, help-texinfo, nabijaczleweli,
	g.branden.robinson, groff, cjwatson

> Date: Sat, 8 Apr 2023 00:01:08 +0200
> Cc: dirk@gouders.net, linux-man@vger.kernel.org, help-texinfo@gnu.org,
>  наб <nabijaczleweli@nabijaczleweli.xyz>,
>  "G. Branden Robinson" <g.branden.robinson@gmail.com>, groff <groff@gnu.org>,
>  Colin Watson <cjwatson@debian.org>
> From: Alejandro Colomar <alx.manpages@gmail.com>
> 
> > How do you find the description of, say, "dereference symbolic link"
> > (to take just a random example from the Emacs manual) when the actual
> > text of the manual include neither this string nor matches for any
> > related regular expressions, like "dereference.*link"?
> 
> $ apropos link | grep sym | head -n5
> readlink (2)         - read value of a symbolic link
> readlinkat (2)       - read value of a symbolic link
> sln (8)              - create symbolic links
> symlink (2)          - make a new name for a file
> symlink (7)          - symbolic link handling
> 
> I bet you're looking for readlink(2) and symlink(7), aren't you?

I said "in the Emacs manual", and I said "when the actual text of the
manual doesn't include the phrase you are looking for".  So your
example is not really up to its job: it uses text that is not the
Emacs manual, and it finds only hits that literally appear in the
title text of the man pages.  For example, the above doesn't find the
man page of Find, nor the man pages of cp and ls (and quite a few of
others), all of which discuss what these utilities do with symbolic
links.  By contrast, the Info manual of Coreutils has almost 40 index
entries starting with "symbolic link", and they are all shown when the
user types "i symbolic link TAB" ('i' being the letter that invokes
index-searching command).

> diff --git a/man5/proc.5 b/man5/proc.5
> index 521402fe8..233cc1c9d 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -36,7 +36,7 @@
>  .\"
>  .TH proc 5 (date) "Linux man-pages (unreleased)"
>  .SH NAME
> -proc \- process information pseudo-filesystem
> +proc \- process information, system information, and sysctl pseudo-filesystem
>  .SH DESCRIPTION
>  The
>  .B proc
> 
> 
> After this patch, if you apropos "system" or "sysctl", you'll see
> proc(5) pop up in your list.

This literally adds the text to what the reader will see.  It makes
the text longer and thus more difficult to read and parse, and there's
a limit to how many key phrases you can add like this.  By contrast,
Texinfo lets you add any number of index entries that point to the
same text.  A random example from the Emacs manual:

  @cindex arrow keys
  @cindex moving point
  @cindex movement
  @cindex cursor motion
  @cindex moving the cursor
    To do more than insert characters, you have to know how to move
  point (@pxref{Point}).  The keyboard commands @kbd{C-f}, @kbd{C-b},
  @kbd{C-n}, and @kbd{C-p} move point to the right, left, down, and up,
  respectively.  You can also move point using the @dfn{arrow keys}
  present on most keyboards: @key{RIGHT}, @key{LEFT},
  @key{DOWN}, and @key{UP}; however, many Emacs users find
  that it is slower to use the arrow keys than the control keys, because
  you need to move your hand to the area of the keyboard where those
  keys are located.

This paragraph has 5 index entries with different key phrases, all
pointing to it.  Different people will have different phrases in their
minds when they think about "cursor movement", thus the need for
several entries.  One of the phrases appears in the text literally,
the other don't; moreover, one of them, "movement" is a very frequent
word, so searching for it with Grep is likely to bring a lot of false
hits, whereas index-searching commands will not.

> > The corresponding index-searching commands of Info readers are a
> > primary means for finding information quickly and efficiently,
> > avoiding too many false positives and also avoiding frustrating
> > misses, i.e., searches that fail to find anything pertinent.
> 
> That's no different than apropos(1).

See above: it is very different.

> >>> Man pages have no means of specifying structure
> >>
> >> .SH, .SS, .TP, .TQ, and very soon (hopefully weeks not months) .MR
> > 
> > These provide just one level.
> 
> We have many levels:
> 
> book:		/opt/local/foobar/man/
> volume:		man2/, man3/, ...
> chapter:	man3/, man3type/, ...
> page:		sscanf(3)
> section:	sscanf(3)/DESCRIPTION
> subsection:	sscanf(3)/DESCRIPTION/Conversions
> tags:		sscanf(3)/DESCRIPTION/Conversions/n

Texinfo has:

  - chapters
  - sections
  - subsections
  - subsubsections
  - unnumbered variants of the above (unnumberedsubsec etc.)
  - appendices (appendix, appendixsubsec etc.)
  - headings (majorheading, chapheading, subheading, etc.)

More importantly, all those have meaningful names, not just standard
labels like "DESCRIPTION" or "Conversions".  So when you see them in
TOC or any similar navigation aid, you _know_, at least approximately,
what each section is about.

> >>> and hyper-links except
> >>> by loosely-coupling pages via "SEE ALSO" cross-references at the end;
> >>> they have no means of quickly and efficiently finding some specific
> >>> subject except by text search (which usually produces a lot of false
> >>> positives).
> >>
> >> I guess you mean searching from the command line by the name of the
> >> parameter to a function, or what?
> > 
> > No, I mean looking a specific subject of interest without having to
> > search/read through the entire document.
> 
> See symlink above.

Not relevant.

> >> I would be interested in a more detailed description of what you
> >> want to be able to search in current pages (hopefully ones that I
> >> maintain, so I can speak of them) that you can't find easily?  Maybe
> >> I can help making something more accessible.
> > 
> > See above, the example of using index-searching commands.
> 
> Yep.  I hope my answer about symlinks satisfied you.

No, it didn't.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: man page rendering speed (was: Playground pager lsp(1))
  2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
  2023-04-07 15:06               ` Eli Zaretskii
  2023-04-07 16:08               ` Colin Watson
@ 2023-04-08 11:24               ` Ralph Corderoy
  2 siblings, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-08 11:24 UTC (permalink / raw)
  To: linux-man, groff; +Cc: Eli Zaretskii, alx.manpages, dirk, Colin Watson

Hi Branden,

> You're referring to cat pages.  As far as I know, these are on their
> way out if not already gone.

catman must die.  It was never a good solution to the problem.  As well
as ignoring different TERMs, it also didn't handle a user's variations
to a terminal's definition.  I'm glad to see Colin is open to the idea,
though accept it's initial and on-going work for him.

> On my system, all groff man pages but one render in between a tenth and
> a fortieth of a second.

Colin made the point I was going to make: how long must my eyeballs wait
to be pleasured?

    $ strace -ttt -fe read,write -o /tmp/st man ffmpeg-all
    $ cat /tmp/st
 →  19788 1680952657.119429 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0  \0\0\0\0\0\0"..., 832) = 832
    ...
    19801 1680952658.350823 write(1, "FFMPEG-ALL(1)                   "..., 1023 <unfinished ...>
    19801 1680952658.352054 <... write resumed>) = 1023
    19801 1680952658.353074 write(1, "ified by a plain output url.\33[m\n"..., 1023 <unfinished ...>
    19801 1680952658.353357 <... write resumed>) = 1023
    19801 1680952658.354272 write(1, "e command line multiple times. E"..., 1023) = 1023
    19801 1680952658.357171 write(1, "aw input files.\33[m\n\33[m\n\33[1mDETAI"..., 1009) = 1009
    19801 1680952658.357478 read(0, "--- | encoded data | <----+\n    "..., 4096) = 4096
    19801 1680952658.358752 write(1, "               | output | <-----"..., 1023) = 1023
    19801 1680952658.359556 write(1, "peg\33[0m can process raw audio an"..., 574) = 574
 →  19801 1680952658.359735 read(3,  <unfinished ...>
    ...
    19801 1680952662.323859 <... read resumed>"q", 1) = 1
    ...
    $

    1680952658.359735 - 1680952657.119429 = 1.240306

strace adds a bit of overhead.

    $ PAGER=true time -p man ffmpeg-all
    real 0.99
    user 1.07
    sys 0.15
    $

Hard to find a slower CPU.

    $ grep name /proc/cpuinfo | uniq -c
          4 model name      : Intel(R) Atom(TM) CPU D525   @ 1.80GHz

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
  2023-04-07 22:16               ` Alejandro Colomar
@ 2023-04-08 11:40               ` Ralph Corderoy
  1 sibling, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-08 11:40 UTC (permalink / raw)
  To: linux-man, groff; +Cc: Eli Zaretskii, Dirk Gouders

Hi,

> > > (1) what part of the screen was the reader actually looking at?

less(1) has -j; that would be a good start.

> > > (2) how is the pager supposed to know how to map any given
> > > location on the screen back to a place in the unrendered source
> > > document so it can be accurately found when the document is
> > > rerendered?

I would assume the pager looks for the same place in its input, not in
the man-page source.  It keeps seeking forward to the best matching run
of words, jumping to the best so far.

Problems I can think of:

- the formatter's input may be ephemeral and so need buffering,
- the originator may not have intended that and limited its size,
- seeking the best match after being WINCH'd must also buffer and may
  never reach EOF,
- the input formatter may alter its output based on the terminal's size,
  e.g. a pic(1) diagram disappears, and
- a solution which re-starts the pager loses the pager's ephemeral
  settings.

I expect more would be found in practice.

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Accessibility of man pages (was: Playground pager lsp(1))
  2023-04-08  7:05             ` Eli Zaretskii
@ 2023-04-08 13:02               ` Alejandro Colomar
  2023-04-08 13:42                 ` Eli Zaretskii
                                   ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-08 13:02 UTC (permalink / raw)
  To: Eli Zaretskii, cjwatson
  Cc: dirk, linux-man, help-texinfo, nabijaczleweli, g.branden.robinson, groff


[-- Attachment #1.1: Type: text/plain, Size: 13257 bytes --]

Hi Eli, Colin,

On 4/8/23 09:05, Eli Zaretskii wrote:
>> Date: Sat, 8 Apr 2023 00:01:08 +0200
>> Cc: dirk@gouders.net, linux-man@vger.kernel.org, help-texinfo@gnu.org,
>>  наб <nabijaczleweli@nabijaczleweli.xyz>,
>>  "G. Branden Robinson" <g.branden.robinson@gmail.com>, groff <groff@gnu.org>,
>>  Colin Watson <cjwatson@debian.org>
>> From: Alejandro Colomar <alx.manpages@gmail.com>
>>
>>> How do you find the description of, say, "dereference symbolic link"
>>> (to take just a random example from the Emacs manual) when the actual
>>> text of the manual include neither this string nor matches for any
>>> related regular expressions, like "dereference.*link"?
>>
>> $ apropos link | grep sym | head -n5
>> readlink (2)         - read value of a symbolic link
>> readlinkat (2)       - read value of a symbolic link
>> sln (8)              - create symbolic links
>> symlink (2)          - make a new name for a file
>> symlink (7)          - symbolic link handling
>>
>> I bet you're looking for readlink(2) and symlink(7), aren't you?
> 
> I said "in the Emacs manual",

I wanted to show the man-pages equivalent.  Of course I know nothing
about the Emacs manual :)

> and I said "when the actual text of the
> manual doesn't include the phrase you are looking for".  So your
> example is not really up to its job: it uses text that is not the
> Emacs manual, and it finds only hits that literally appear in the
> title text of the man pages.

I thought you wanted to know about how dereferencing symlinks works
in general.

>  For example, the above doesn't find the
> man page of Find,

If you want how symlinks are dereferenced by find(1):

$ man find | grep sym.*link | head -n1
       The  -H,  -L  and  -P  options control the treatment of symbolic links.

$ man find | sed -n '/^       -L/,/^$/p;' | sed '/^$/,$d'
       -L     Follow symbolic links.  When find examines or prints information
              about  files, the information used shall be taken from the prop‐
              erties of the file to which the link points, not from  the  link
              itself (unless it is a broken symbolic link or find is unable to
              examine  the file to which the link points).  Use of this option
              implies -noleaf.  If you later use the -P option,  -noleaf  will
              still  be  in  effect.   If -L is in effect and find discovers a
              symbolic link to a subdirectory during its search, the subdirec‐
              tory pointed to by the symbolic link will be searched.

$ man find | sed -n '/^       -H/,/^$/p;' | sed '/^$/,$d'
       -H     Do not follow symbolic links, except while processing  the  com‐
              mand  line  arguments.  When find examines or prints information
              about files, the information used shall be taken from the  prop‐
              erties  of the symbolic link itself.  The only exception to this
              behaviour is when a file specified on the command line is a sym‐
              bolic link, and the link can be resolved.  For  that  situation,
              the  information  used is taken from whatever the link points to
              (that is, the link is followed).  The information about the link
              itself is used as a fallback if the file pointed to by the  sym‐
              bolic  link  cannot  be examined.  If -H is in effect and one of
              the paths specified on the command line is a symbolic link to  a
              directory,  the  contents  of  that  directory  will be examined
              (though of course -maxdepth 0 would prevent this).

$ man find | sed -n '/^       -P/,/^$/p;' | sed '/^$/,$d'
       -P     Never follow symbolic links.  This  is  the  default  behaviour.
              When  find  examines  or prints information about files, and the
              file is a symbolic link, the information  used  shall  be  taken
              from the properties of the symbolic link itself.

> nor the man pages of cp

If you want to know how symlinks are handled by cp(1), then:

$ man cp | grep sym.*link -B1

       -H     follow command-line symbolic links in SOURCE
--
       -L, --dereference
              always follow symbolic links in SOURCE
--
       -P, --no-dereference
              never follow symbolic links in SOURCE
--

       -s, --symbolic-link
              make symbolic links instead of copying

> and ls (and quite a few of

And similarly for ls(1):

$ man ls | grep sym.*link -C2

       -H, --dereference-command-line
              follow symbolic links listed on the command line

       --dereference-command-line-symlink-to-dir
              follow each command line symbolic link that points to  a  direc‐
              tory

--

       -L, --dereference
              when showing file information for a symbolic link, show informa‐
              tion  for  the file the link references rather than for the link
              itself


> others), all of which discuss what these utilities do with symbolic
> links.

If you want to know how other command handles symlinks, look at that
command's page, and try a few things with grep and sed.

>  By contrast, the Info manual of Coreutils has almost 40 index
> entries starting with "symbolic link", and they are all shown when the
> user types "i symbolic link TAB" ('i' being the letter that invokes
> index-searching command).
> 
>> diff --git a/man5/proc.5 b/man5/proc.5
>> index 521402fe8..233cc1c9d 100644
>> --- a/man5/proc.5
>> +++ b/man5/proc.5
>> @@ -36,7 +36,7 @@
>>  .\"
>>  .TH proc 5 (date) "Linux man-pages (unreleased)"
>>  .SH NAME
>> -proc \- process information pseudo-filesystem
>> +proc \- process information, system information, and sysctl pseudo-filesystem
>>  .SH DESCRIPTION
>>  The
>>  .B proc
>>
>>
>> After this patch, if you apropos "system" or "sysctl", you'll see
>> proc(5) pop up in your list.
> 
> This literally adds the text to what the reader will see.  It makes
> the text longer and thus more difficult to read and parse, and there's
> a limit to how many key phrases you can add like this.

If a page has too many topics, consider splitting the page (I agree
that proc(5) is asking for that job).

>  By contrast,
> Texinfo lets you add any number of index entries that point to the
> same text.  A random example from the Emacs manual:
> 
>   @cindex arrow keys
>   @cindex moving point
>   @cindex movement
>   @cindex cursor motion
>   @cindex moving the cursor

Using consistent language across pages helps for these things.

>     To do more than insert characters, you have to know how to move
>   point (@pxref{Point}).  The keyboard commands @kbd{C-f}, @kbd{C-b},
>   @kbd{C-n}, and @kbd{C-p} move point to the right, left, down, and up,
>   respectively.  You can also move point using the @dfn{arrow keys}
>   present on most keyboards: @key{RIGHT}, @key{LEFT},
>   @key{DOWN}, and @key{UP}; however, many Emacs users find
>   that it is slower to use the arrow keys than the control keys, because
>   you need to move your hand to the area of the keyboard where those
>   keys are located.
> 
> This paragraph has 5 index entries with different key phrases, all
> pointing to it.  Different people will have different phrases in their
> minds when they think about "cursor movement", thus the need for
> several entries.  One of the phrases appears in the text literally,
> the other don't; moreover, one of them, "movement" is a very frequent
> word, so searching for it with Grep is likely to bring a lot of false
> hits, whereas index-searching commands will not.
> 
>>> The corresponding index-searching commands of Info readers are a
>>> primary means for finding information quickly and efficiently,
>>> avoiding too many false positives and also avoiding frustrating
>>> misses, i.e., searches that fail to find anything pertinent.
>>
>> That's no different than apropos(1).
> 
> See above: it is very different.
> 
>>>>> Man pages have no means of specifying structure
>>>>
>>>> .SH, .SS, .TP, .TQ, and very soon (hopefully weeks not months) .MR
>>>
>>> These provide just one level.
>>
>> We have many levels:
>>
>> book:		/opt/local/foobar/man/
>> volume:		man2/, man3/, ...
>> chapter:	man3/, man3type/, ...
>> page:		sscanf(3)
>> section:	sscanf(3)/DESCRIPTION
>> subsection:	sscanf(3)/DESCRIPTION/Conversions
>> tags:		sscanf(3)/DESCRIPTION/Conversions/n
> 
> Texinfo has:
> 
>   - chapters
>   - sections
>   - subsections
>   - subsubsections
>   - unnumbered variants of the above (unnumberedsubsec etc.)
>   - appendices (appendix, appendixsubsec etc.)
>   - headings (majorheading, chapheading, subheading, etc.)
> 
> More importantly, all those have meaningful names, not just standard
> labels like "DESCRIPTION" or "Conversions".

"Conversions" is not a standard subsection.  It's about conversion
specifiers; something exclusive of sscanf(3).  However, sections and
above do be standardized, and I believe that's good, so that you can
have some a-priori expectations of the organization of a page.

>  So when you see them in
> TOC or any similar navigation aid, you _know_, at least approximately,
> what each section is about.

I know a priori that if I'm reading sscanf(3)'s SYNOPSIS, I'll find
the function prototype for it.  Or if I read printf(3)'s ATTRIBUTES
I'll find the thread-safety of the function.  So much, that I have
functions for reading a specific section of a certain page:


$ man_section man3/sscanf.3 SYNOPSIS
sscanf(3)               Library Functions Manual               sscanf(3)

SYNOPSIS
       #include <stdio.h>

       int sscanf(const char *restrict str,
                  const char *restrict format, ...);

       #include <stdarg.h>

       int vsscanf(const char *restrict str,
                  const char *restrict format, va_list ap);

   Feature    Test    Macro    Requirements    for   glibc   (see   fea‐
   ture_test_macros(7)):

       vsscanf():
           _ISOC99_SOURCE || _POSIX_C_SOURCE >= 200112L

Linux man‐pages (unreleased)     (date)                        sscanf(3)



$ man_section man3/printf.3 ATTRIBUTES
printf(3)               Library Functions Manual               printf(3)

ATTRIBUTES
       For an explanation of the terms used in this section, see attrib‐
       utes(7).
       ┌──────────────────────────────┬───────────────┬────────────────┐
       │ Interface                    │ Attribute     │ Value          │
       ├──────────────────────────────┼───────────────┼────────────────┤
       │ printf(), fprintf(),         │ Thread safety │ MT‐Safe locale │
       │ sprintf(), snprintf(),       │               │                │
       │ vprintf(), vfprintf(),       │               │                │
       │ vsprintf(), vsnprintf()      │               │                │
       └──────────────────────────────┴───────────────┴────────────────┘


Linux man‐pages (unreleased)     (date)                        printf(3)


> 
>>>>> and hyper-links except
>>>>> by loosely-coupling pages via "SEE ALSO" cross-references at the end;
>>>>> they have no means of quickly and efficiently finding some specific
>>>>> subject except by text search (which usually produces a lot of false
>>>>> positives).

text search has false positives, like anything else.  But having good
tools for handling text is the key to solving the problem.  grep(1)
and sed(1) are your friends when reading man pages.

Colin, I've had a feeling for a long time that compressed pages are
not very useful.  These days, storage is cheap.  How would you feel
about having the man pages installed uncompressed in Debian?  That
would allow running text tools directly in /usr/share/man/.  I've had
to do that several times, and lucky me that I have the source code of
the Linux man-pages checked out in my computers, but other users don't
and they might have trouble finding for example which pages talk about
RLIMIT_NOFILE.  The only way I know of is:

$ grep -rl RLIMIT_NOFILE man*
man2/dup.2
man2/pidfd_getfd.2
man2/open.2
man2/fcntl.2
man2/poll.2
man2/pidfd_open.2
man2/getrlimit.2
man2/select.2
man2/seccomp_unotify.2
man3/getdtablesize.3
man3/mq_open.3
man3/errno.3
man3/sysconf.3
man5/proc.5
man7/unix.7
man7/fanotify.7
man7/capabilities.7


I'd like to enable this ability for everyone by not compressing
system man pages.  I guess we should talk to the Debian policy
mailing list?


Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages (was: Playground pager lsp(1))
  2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
@ 2023-04-08 13:42                 ` Eli Zaretskii
  2023-04-08 16:06                   ` Alejandro Colomar
  2023-04-08 13:47                 ` Colin Watson
       [not found]                 ` <87a5zhwntt.fsf@ada>
  2 siblings, 1 reply; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-08 13:42 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: cjwatson, dirk, linux-man, help-texinfo, nabijaczleweli,
	g.branden.robinson, groff

> Date: Sat, 8 Apr 2023 15:02:59 +0200
> Cc: dirk@gouders.net, linux-man@vger.kernel.org, help-texinfo@gnu.org,
>  nabijaczleweli@nabijaczleweli.xyz, g.branden.robinson@gmail.com,
>  groff@gnu.org
> From: Alejandro Colomar <alx.manpages@gmail.com>
> 
> If you want how symlinks are dereferenced by find(1):
> 
> $ man find | grep sym.*link | head -n1
>        The  -H,  -L  and  -P  options control the treatment of symbolic links.

That's because the text appears verbatim in the man page.  Suppose the
person in question doesn't think about "symbolic links", but has
something else in mind, for example, "dereference".  (Why? because
he/she just happened to see that term in some article, and wanted to
know what does Find do with that.  Or for some other reason.)  Then
they will not find the description of symlink behavior of Find by
searching for "dereference".

Do you see the crucial issue here?  Indexing can tag some text with
topics which do not appear verbatim in the text, but instead
anticipate what people could have in mind when they are searching for
that text without knowing what it says, exactly.

> >> After this patch, if you apropos "system" or "sysctl", you'll see
> >> proc(5) pop up in your list.
> > 
> > This literally adds the text to what the reader will see.  It makes
> > the text longer and thus more difficult to read and parse, and there's
> > a limit to how many key phrases you can add like this.
> 
> If a page has too many topics, consider splitting the page (I agree
> that proc(5) is asking for that job).

Indexing can tag any paragraph of text, not just the entire page.  A
page cannot usefully have too many keywords in its title, but it _can_
benefit from different keywords for different paragraphs.

> >  By contrast,
> > Texinfo lets you add any number of index entries that point to the
> > same text.  A random example from the Emacs manual:
> > 
> >   @cindex arrow keys
> >   @cindex moving point
> >   @cindex movement
> >   @cindex cursor motion
> >   @cindex moving the cursor
> 
> Using consistent language across pages helps for these things.

There's no consistency when we want to be friendly to different people
with vastly different backgrounds and cultural preferences.  Good
indexing will anticipate any "inconsistent" habits.  And, once again,
since the index entries don't appear in the text presented to the
reader, the text remains consistent even if the index entries draw
from different inconsistent sources.

> > Texinfo has:
> > 
> >   - chapters
> >   - sections
> >   - subsections
> >   - subsubsections
> >   - unnumbered variants of the above (unnumberedsubsec etc.)
> >   - appendices (appendix, appendixsubsec etc.)
> >   - headings (majorheading, chapheading, subheading, etc.)
> > 
> > More importantly, all those have meaningful names, not just standard
> > labels like "DESCRIPTION" or "Conversions".
> 
> "Conversions" is not a standard subsection.  It's about conversion
> specifiers; something exclusive of sscanf(3).  However, sections and
> above do be standardized, and I believe that's good, so that you can
> have some a-priori expectations of the organization of a page.

But it then makes it impossible to add sections with meaningful names,
if those names aren't standardized.

> >  So when you see them in
> > TOC or any similar navigation aid, you _know_, at least approximately,
> > what each section is about.
> 
> I know a priori that if I'm reading sscanf(3)'s SYNOPSIS, I'll find
> the function prototype for it.  Or if I read printf(3)'s ATTRIBUTES
> I'll find the thread-safety of the function.

SYNOPSIS is at least approximately self-describing (although some
non-native English speakers might stumble on it).  But how would a
random reader know that ATTRIBUTES will describe thread-safety, for
example?  I wouldn't.  Isn't it better to have a section named "Thread
Safety" instead?

> text search has false positives, like anything else.  But having good
> tools for handling text is the key to solving the problem.  grep(1)
> and sed(1) are your friends when reading man pages.

Modern documentation is not plain text (even if we ignore
compression), so tools which just search the text have limitations,
sometimes serious ones.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages (was: Playground pager lsp(1))
  2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
  2023-04-08 13:42                 ` Eli Zaretskii
@ 2023-04-08 13:47                 ` Colin Watson
  2023-04-08 15:42                   ` Alejandro Colomar
  2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
       [not found]                 ` <87a5zhwntt.fsf@ada>
  2 siblings, 2 replies; 73+ messages in thread
From: Colin Watson @ 2023-04-08 13:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Eli Zaretskii, dirk, linux-man, help-texinfo, nabijaczleweli,
	g.branden.robinson, groff

On Sat, Apr 08, 2023 at 03:02:59PM +0200, Alejandro Colomar wrote:
> Colin, I've had a feeling for a long time that compressed pages are
> not very useful.  These days, storage is cheap.  How would you feel
> about having the man pages installed uncompressed in Debian?  That
> would allow running text tools directly in /usr/share/man/.

I'm not personally all that bothered either way, but it's a
distribution-wide policy decision rather than something I'd decide on.
I suspect there are still some people who would push back against the
space cost.

> I've had to do that several times, and lucky me that I have the source
> code of the Linux man-pages checked out in my computers, but other
> users don't and they might have trouble finding for example which
> pages talk about RLIMIT_NOFILE.  The only way I know of is:

man -Kaw RLIMIT_NOFILE

(This looks at the page source rather than the rendered output, so
sometimes it over-reports if your search term matches a groff macro,
etc.  But that's true of your approach too.)

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages (was: Playground pager lsp(1))
  2023-04-08 13:47                 ` Colin Watson
@ 2023-04-08 15:42                   ` Alejandro Colomar
  2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
  1 sibling, 0 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-08 15:42 UTC (permalink / raw)
  To: Colin Watson
  Cc: Eli Zaretskii, dirk, linux-man, nabijaczleweli,
	g.branden.robinson, groff, help-texinfo


[-- Attachment #1.1: Type: text/plain, Size: 1387 bytes --]

Hi Colin,

On 4/8/23 15:47, Colin Watson wrote:
> On Sat, Apr 08, 2023 at 03:02:59PM +0200, Alejandro Colomar wrote:
>> Colin, I've had a feeling for a long time that compressed pages are
>> not very useful.  These days, storage is cheap.  How would you feel
>> about having the man pages installed uncompressed in Debian?  That
>> would allow running text tools directly in /usr/share/man/.
> 
> I'm not personally all that bothered either way, but it's a
> distribution-wide policy decision rather than something I'd decide on.
> I suspect there are still some people who would push back against the
> space cost.
> 
>> I've had to do that several times, and lucky me that I have the source
>> code of the Linux man-pages checked out in my computers, but other
>> users don't and they might have trouble finding for example which
>> pages talk about RLIMIT_NOFILE.  The only way I know of is:
> 
> man -Kaw RLIMIT_NOFILE

Hmm, interesting; I didn't know about -K.

> 
> (This looks at the page source rather than the rendered output, so
> sometimes it over-reports if your search term matches a groff macro,
> etc.  But that's true of your approach too.)

Yeah, this should be good for most purposes.  Consider my itch scratched. :)

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages (was: Playground pager lsp(1))
  2023-04-08 13:42                 ` Eli Zaretskii
@ 2023-04-08 16:06                   ` Alejandro Colomar
  0 siblings, 0 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-08 16:06 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: cjwatson, dirk, linux-man, help-texinfo, nabijaczleweli,
	g.branden.robinson, groff


[-- Attachment #1.1: Type: text/plain, Size: 5401 bytes --]

Hi Eli,

On 4/8/23 15:42, Eli Zaretskii wrote:
>> Date: Sat, 8 Apr 2023 15:02:59 +0200
>> Cc: dirk@gouders.net, linux-man@vger.kernel.org, help-texinfo@gnu.org,
>>  nabijaczleweli@nabijaczleweli.xyz, g.branden.robinson@gmail.com,
>>  groff@gnu.org
>> From: Alejandro Colomar <alx.manpages@gmail.com>
>>
>> If you want how symlinks are dereferenced by find(1):
>>
>> $ man find | grep sym.*link | head -n1
>>        The  -H,  -L  and  -P  options control the treatment of symbolic links.
> 
> That's because the text appears verbatim in the man page.  Suppose the
> person in question doesn't think about "symbolic links", but has
> something else in mind, for example, "dereference".  (Why? because
> he/she just happened to see that term in some article, and wanted to
> know what does Find do with that.  Or for some other reason.)  Then
> they will not find the description of symlink behavior of Find by
> searching for "dereference".

That's why using consistent language is important.  Searching just for
"dereference" will of course have slightly less quality, but that
should be expected.  Once you have a slightly related match, you can
find terms that will help refine your search.

$ man find | grep dereference -C1
       When  the  -H  or  -L options are in effect, any symbolic links
       listed as the argument of -newer will be dereferenced, and  the
       timestamp  will  be  taken  from the file to which the symbolic
--
       used but -follow is, any symbolic links appearing after -follow
       on  the  command line will be dereferenced, and those before it
       will not).
--
              haviour of the -newer predicate; any files listed as the
              argument of -newer will be dereferenced if they are sym‐
              bolic  links.   The  same consideration applies to -new‐
--
       -newer Supported.  If the file specified is a symbolic link, it
              is always dereferenced.  This is a change from  previous
              behaviour, which used to take the relevant time from the


This already shows "symbolic link" several times, so you probably want
to search for that.

If you want something that processes natural language, you can always
ask some AI engine to process man pages for you ;).

> 
> Do you see the crucial issue here?  Indexing can tag some text with
> topics which do not appear verbatim in the text, but instead
> anticipate what people could have in mind when they are searching for
> that text without knowing what it says, exactly.

I don't remember myself having had such issues so far.  I'd like to
see real reports of readers that struggle to find a certain search
term in a certain page.  There are, but few (the only one I remember
is this one we had recently about proc(5)).  If you ever have such a
real case with man pages, please report it, and I will try to make it
more accessible.  The intention is that a combination of man(1),
apropos(1), whatis(1), and then some grep(1) and sed(1) should be
enough 99% of the time, and we should fix the outliers.

> 
>>>> After this patch, if you apropos "system" or "sysctl", you'll see
>>>> proc(5) pop up in your list.
>>>
>>> This literally adds the text to what the reader will see.  It makes
>>> the text longer and thus more difficult to read and parse, and there's
>>> a limit to how many key phrases you can add like this.
>>
>> If a page has too many topics, consider splitting the page (I agree
>> that proc(5) is asking for that job).
> 
> Indexing can tag any paragraph of text, not just the entire page.  A
> page cannot usefully have too many keywords in its title, but it _can_
> benefit from different keywords for different paragraphs.

We can add source code comments, which would appear in `man -K`
searches, but so far I haven't seen the need in any specific page.


[...]

> 
>>>  So when you see them in
>>> TOC or any similar navigation aid, you _know_, at least approximately,
>>> what each section is about.
>>
>> I know a priori that if I'm reading sscanf(3)'s SYNOPSIS, I'll find
>> the function prototype for it.  Or if I read printf(3)'s ATTRIBUTES
>> I'll find the thread-safety of the function.
> 
> SYNOPSIS is at least approximately self-describing (although some
> non-native English speakers might stumble on it).  But how would a
> random reader know that ATTRIBUTES will describe thread-safety, for
> example?  I wouldn't.  Isn't it better to have a section named "Thread
> Safety" instead?

I don't know the origin of the name of ATTRIBUTES.  There's
attributes(7), which documents what you can find there.

> 
>> text search has false positives, like anything else.  But having good
>> tools for handling text is the key to solving the problem.  grep(1)
>> and sed(1) are your friends when reading man pages.
> 
> Modern documentation is not plain text (even if we ignore
> compression), so tools which just search the text have limitations,
> sometimes serious ones.

In some cases you need to search the man(7) source code to get
extra information that is difficult to search in formatted text,
but that's for rare cases.  So far, I find mostly everything I
need just with text tools.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 13:47                 ` Colin Watson
  2023-04-08 15:42                   ` Alejandro Colomar
@ 2023-04-08 19:48                   ` Dirk Gouders
  2023-04-08 20:02                     ` Eli Zaretskii
  2023-04-08 20:31                     ` Ingo Schwarze
  1 sibling, 2 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-08 19:48 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Colin Watson, Eli Zaretskii, linux-man, help-texinfo,
	nabijaczleweli, g.branden.robinson, groff

Hi Alex,

Colin Watson <cjwatson@debian.org> writes:

> On Sat, Apr 08, 2023 at 03:02:59PM +0200, Alejandro Colomar wrote:
>> Colin, I've had a feeling for a long time that compressed pages are
>> not very useful.  These days, storage is cheap.  How would you feel
>> about having the man pages installed uncompressed in Debian?  That
>> would allow running text tools directly in /usr/share/man/.
>
> I'm not personally all that bothered either way, but it's a
> distribution-wide policy decision rather than something I'd decide on.
> I suspect there are still some people who would push back against the
> space cost.
>
>> I've had to do that several times, and lucky me that I have the source
>> code of the Linux man-pages checked out in my computers, but other
>> users don't and they might have trouble finding for example which
>> pages talk about RLIMIT_NOFILE.  The only way I know of is:
>>
>> $ grep -rl RLIMIT_NOFILE man*
>> man2/dup.2
>> man2/pidfd_getfd.2
>> man2/open.2
>> man2/fcntl.2
>> man2/poll.2
>> man2/pidfd_open.2
>> man2/getrlimit.2
>> man2/select.2
>> man2/seccomp_unotify.2
>> man3/getdtablesize.3
>> man3/mq_open.3
>> man3/errno.3
>> man3/sysconf.3
>> man5/proc.5
>> man7/unix.7
>> man7/fanotify.7
>> man7/capabilities.7
>
> man -Kaw RLIMIT_NOFILE

Sometimes it is good to have options and one would be bzgrep(1).
As far as I know it doesn't understand "-r" but:

$ find /usr/share/man -type f -exec bzgrep -l RLIMIT_NOFILE {} \;
/usr/share/man/man1/runuser.1.bz2
/usr/share/man/man1/su.1.bz2
/usr/share/man/man1/nghttpx.1.bz2
/usr/share/man/man3/getdtablesize.3.bz2
/usr/share/man/man3/mq_open.3.bz2
/usr/share/man/man3/errno.3.bz2
/usr/share/man/man3/sysconf.3.bz2
/usr/share/man/man3p/getrlimit.3p.bz2
/usr/share/man/man3p/sysconf.3p.bz2
/usr/share/man/man3p/posix_spawn_file_actions_addclose.3p.bz2
/usr/share/man/man0p/sys_resource.h.0p.bz2
/usr/share/man/man2/pidfd_open.2.bz2
/usr/share/man/man2/poll.2.bz2
/usr/share/man/man2/getrlimit.2.bz2
/usr/share/man/man2/open.2.bz2
/usr/share/man/man2/select.2.bz2
/usr/share/man/man2/fcntl.2.bz2
/usr/share/man/man2/seccomp_unotify.2.bz2
/usr/share/man/man2/dup.2.bz2
/usr/share/man/man2/pidfd_getfd.2.bz2
/usr/share/man/man7/fanotify.7.bz2
/usr/share/man/man7/capabilities.7.bz2
/usr/share/man/man7/unix.7.bz2
/usr/share/man/man5/proc.5.bz2

Yes, it's very slow but close to `man -K`:

find...             man -K...

real 107.45         real 96.34
user 117.06         user 70.11
sys 14.43           sys 26.86

[a thought later]

Oh, I found something much faster:

$ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
[snip]

real 24.30
user 32.34
sys 6.84

Hmm, perhaps, someone has an explanation for this?

Cheers,

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
@ 2023-04-08 20:02                     ` Eli Zaretskii
  2023-04-08 20:46                       ` Dirk Gouders
  2023-04-09 10:28                       ` Ralph Corderoy
  2023-04-08 20:31                     ` Ingo Schwarze
  1 sibling, 2 replies; 73+ messages in thread
From: Eli Zaretskii @ 2023-04-08 20:02 UTC (permalink / raw)
  To: Dirk Gouders
  Cc: alx.manpages, cjwatson, linux-man, help-texinfo, nabijaczleweli,
	g.branden.robinson, groff

> From: Dirk Gouders <dirk@gouders.net>
> Cc: Colin Watson <cjwatson@debian.org>, Eli Zaretskii <eliz@gnu.org>,
>         linux-man@vger.kernel.org, help-texinfo@gnu.org,
>         nabijaczleweli@nabijaczleweli.xyz, g.branden.robinson@gmail.com,
>         groff@gnu.org
> Date: Sat, 08 Apr 2023 21:48:13 +0200
> 
> $ find /usr/share/man -type f -exec bzgrep -l RLIMIT_NOFILE {} \;
> /usr/share/man/man1/runuser.1.bz2
> /usr/share/man/man1/su.1.bz2
> /usr/share/man/man1/nghttpx.1.bz2
> /usr/share/man/man3/getdtablesize.3.bz2
> /usr/share/man/man3/mq_open.3.bz2
> /usr/share/man/man3/errno.3.bz2
> /usr/share/man/man3/sysconf.3.bz2
> /usr/share/man/man3p/getrlimit.3p.bz2
> /usr/share/man/man3p/sysconf.3p.bz2
> /usr/share/man/man3p/posix_spawn_file_actions_addclose.3p.bz2
> /usr/share/man/man0p/sys_resource.h.0p.bz2
> /usr/share/man/man2/pidfd_open.2.bz2
> /usr/share/man/man2/poll.2.bz2
> /usr/share/man/man2/getrlimit.2.bz2
> /usr/share/man/man2/open.2.bz2
> /usr/share/man/man2/select.2.bz2
> /usr/share/man/man2/fcntl.2.bz2
> /usr/share/man/man2/seccomp_unotify.2.bz2
> /usr/share/man/man2/dup.2.bz2
> /usr/share/man/man2/pidfd_getfd.2.bz2
> /usr/share/man/man7/fanotify.7.bz2
> /usr/share/man/man7/capabilities.7.bz2
> /usr/share/man/man7/unix.7.bz2
> /usr/share/man/man5/proc.5.bz2
> 
> Yes, it's very slow but close to `man -K`:
> 
> find...             man -K...
> 
> real 107.45         real 96.34
> user 117.06         user 70.11
> sys 14.43           sys 26.86
> 
> [a thought later]
> 
> Oh, I found something much faster:
> 
> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
> [snip]
> 
> real 24.30
> user 32.34
> sys 6.84
> 
> Hmm, perhaps, someone has an explanation for this?

Multiprocessing, obviously.  Your CPU has more than one execution
unit, so the pipe via xargs runs 'find' and 'bzgrep' in parallel on
two different execution units.  By contrast, "find -exec" runs them
sequentially, in a single thread.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
  2023-04-08 20:02                     ` Eli Zaretskii
@ 2023-04-08 20:31                     ` Ingo Schwarze
  2023-04-08 20:59                       ` Dirk Gouders
  1 sibling, 1 reply; 73+ messages in thread
From: Ingo Schwarze @ 2023-04-08 20:31 UTC (permalink / raw)
  To: Dirk Gouders
  Cc: Alejandro Colomar, Colin Watson, Eli Zaretskii, linux-man,
	help-texinfo, nabijaczleweli, g.branden.robinson, groff

Hi Dirk,

Dirk Gouders wrote on Sat, Apr 08, 2023 at 09:48:13PM +0200:

> Yes, it's very slow but close to `man -K`:
> 
> find...             man -K...
> 
> real 107.45         real 96.34
> user 117.06         user 70.11
> sys 14.43           sys 26.86
> 
> [a thought later]
> 
> Oh, I found something much faster:
> 
> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
> [snip]
> 
> real 24.30
> user 32.34
> sys 6.84
> 
> Hmm, perhaps, someone has an explanation for this?

These are all terribly slow IMHO.

For comparison, this happens on my OpenBSD notebook, with more than
five hundred optional software packages installed in addition to the
complete default installation:

   $ time man -k any=RLIMIT_NOFILE
  dup, dup2, dup3(2) - duplicate an existing file descriptor
  getrlimit, setrlimit(2) - control maximum system resource consumption
  sudoers(5) - default sudo security policy plugin
    0m00.21s real     0m00.00s user     0m00.03s system

   $ time man -k 'any=rlimit'       
  ps(1) - display process status
  brk, sbrk(2) - change data segment size
  dup, dup2, dup3(2) - duplicate an existing file descriptor
  execve(2) - execute a file
  fork(2) - create a new process
  getdtablecount(2) - get descriptor table count
  getrlimit, setrlimit(2) - control maximum system resource consumption
  mlock, munlock(2) - lock (unlock) physical pages in memory
  mlockall, munlockall(2) - lock (unlock) the address space of a process
  pledge(2) - restrict system operations
  poll, ppoll(2) - synchronous I/O multiplexing
  quotactl(2) - manipulate filesystem quotas
  sigaction(2) - software signal facilities
  getdtablesize(3) - get descriptor table size
  login_cap, login_getclass, login_close, login_getcapbool, login_getcapnum, login_getcapsize, login_getcapstr, login_getcaptime, login_getstyle, setclasscontext, setusercontext(3) - query login.conf database about a user class
  signal, bsd_signal(3) - simplified software signal facilities
  sigvec(3) - software signal facilities
  core(5) - memory image file format
  login.conf(5) - login class capability database
  sudoers(5) - default sudo security policy plugin
  fork1(9) - create a new process
  mi_switch, cpu_switchto(9) - switch to another process context
      0m00.05s real     0m00.01s user     0m00.00s system

   $ time man -k any=RLIMIT_NOFILE 
  dup, dup2, dup3(2) - duplicate an existing file descriptor
  getrlimit, setrlimit(2) - control maximum system resource consumption
  sudoers(5) - default sudo security policy plugin
    0m00.01s real     0m00.01s user     0m00.01s system

The effect that the time goes down from 210 milliseconds to 10
milliseconds when doing the search a second time is due to the fact
that the kernel now has the required information in the buffer cache
and no longer needs to read from the rotating disk.  The machine in
question has i5 2.3 GHz processors and 8 GB of RAM, so it's hardly
a high-end machine.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 20:02                     ` Eli Zaretskii
@ 2023-04-08 20:46                       ` Dirk Gouders
  2023-04-08 21:53                         ` Alejandro Colomar
  2023-04-09 10:28                       ` Ralph Corderoy
  1 sibling, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-04-08 20:46 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: alx.manpages, cjwatson, linux-man, help-texinfo, nabijaczleweli,
	g.branden.robinson, groff

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Dirk Gouders <dirk@gouders.net>
>> Cc: Colin Watson <cjwatson@debian.org>, Eli Zaretskii <eliz@gnu.org>,
>>         linux-man@vger.kernel.org, help-texinfo@gnu.org,
>>         nabijaczleweli@nabijaczleweli.xyz, g.branden.robinson@gmail.com,
>>         groff@gnu.org
>> Date: Sat, 08 Apr 2023 21:48:13 +0200
>> 
>> $ find /usr/share/man -type f -exec bzgrep -l RLIMIT_NOFILE {} \;
>> /usr/share/man/man1/runuser.1.bz2
>> /usr/share/man/man1/su.1.bz2
>> /usr/share/man/man1/nghttpx.1.bz2
>> /usr/share/man/man3/getdtablesize.3.bz2
>> /usr/share/man/man3/mq_open.3.bz2
>> /usr/share/man/man3/errno.3.bz2
>> /usr/share/man/man3/sysconf.3.bz2
>> /usr/share/man/man3p/getrlimit.3p.bz2
>> /usr/share/man/man3p/sysconf.3p.bz2
>> /usr/share/man/man3p/posix_spawn_file_actions_addclose.3p.bz2
>> /usr/share/man/man0p/sys_resource.h.0p.bz2
>> /usr/share/man/man2/pidfd_open.2.bz2
>> /usr/share/man/man2/poll.2.bz2
>> /usr/share/man/man2/getrlimit.2.bz2
>> /usr/share/man/man2/open.2.bz2
>> /usr/share/man/man2/select.2.bz2
>> /usr/share/man/man2/fcntl.2.bz2
>> /usr/share/man/man2/seccomp_unotify.2.bz2
>> /usr/share/man/man2/dup.2.bz2
>> /usr/share/man/man2/pidfd_getfd.2.bz2
>> /usr/share/man/man7/fanotify.7.bz2
>> /usr/share/man/man7/capabilities.7.bz2
>> /usr/share/man/man7/unix.7.bz2
>> /usr/share/man/man5/proc.5.bz2
>> 
>> Yes, it's very slow but close to `man -K`:
>> 
>> find...             man -K...
>> 
>> real 107.45         real 96.34
>> user 117.06         user 70.11
>> sys 14.43           sys 26.86
>> 
>> [a thought later]
>> 
>> Oh, I found something much faster:
>> 
>> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
>> [snip]
>> 
>> real 24.30
>> user 32.34
>> sys 6.84
>> 
>> Hmm, perhaps, someone has an explanation for this?
>
> Multiprocessing, obviously.  Your CPU has more than one execution
> unit, so the pipe via xargs runs 'find' and 'bzgrep' in parallel on
> two different execution units.  By contrast, "find -exec" runs them
> sequentially, in a single thread.

Yes, that must be it, thanks.  I noticed `man -K...` uses up to four
CPUs in parallel and therefore was unsure.

With your explanation, we can get even faster:

$ time -p find /usr/share/man -type f | xargs -P 6 bzgrep -l RLIMIT_NOFILE
[snip]

real 7.56
user 32.97
sys 7.02

Dirk

PS: Colin, too late, I noticed a Mail-Followup-To Header in your mail.
    For the future: Is it correct that in such a case one should use
    that recipient list (without your address) -- even if he replies to
    something you wrote?  In that case: I'm sorry I did that wrong.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 20:31                     ` Ingo Schwarze
@ 2023-04-08 20:59                       ` Dirk Gouders
  2023-04-08 22:39                         ` Ingo Schwarze
  0 siblings, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-04-08 20:59 UTC (permalink / raw)
  To: Ingo Schwarze
  Cc: Alejandro Colomar, Colin Watson, Eli Zaretskii, linux-man,
	help-texinfo, nabijaczleweli, g.branden.robinson, groff

Hi Ingo,

Ingo Schwarze <schwarze@usta.de> writes:

> Hi Dirk,
>
> Dirk Gouders wrote on Sat, Apr 08, 2023 at 09:48:13PM +0200:
>
>> Yes, it's very slow but close to `man -K`:
>> 
>> find...             man -K...
>> 
>> real 107.45         real 96.34
>> user 117.06         user 70.11
>> sys 14.43           sys 26.86
>> 
>> [a thought later]
>> 
>> Oh, I found something much faster:
>> 
>> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
>> [snip]
>> 
>> real 24.30
>> user 32.34
>> sys 6.84
>> 
>> Hmm, perhaps, someone has an explanation for this?
>
> These are all terribly slow IMHO.
>
> For comparison, this happens on my OpenBSD notebook, with more than
> five hundred optional software packages installed in addition to the
> complete default installation:
>
>    $ time man -k any=RLIMIT_NOFILE
>   dup, dup2, dup3(2) - duplicate an existing file descriptor
>   getrlimit, setrlimit(2) - control maximum system resource consumption
>   sudoers(5) - default sudo security policy plugin
>     0m00.21s real     0m00.00s user     0m00.03s system

Yes, this is really fast and would allow for quite interesting ways to
work with manual pages.

But, OpenBSD's `man -k` operates on a makewhatis(8) database and not on every
single manual page or am I wrong?

Regards,

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 20:46                       ` Dirk Gouders
@ 2023-04-08 21:53                         ` Alejandro Colomar
  2023-04-08 22:33                           ` Alejandro Colomar
  0 siblings, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-08 21:53 UTC (permalink / raw)
  To: Dirk Gouders, Eli Zaretskii, cjwatson, Ingo Schwarze
  Cc: linux-man, help-texinfo, nabijaczleweli, g.branden.robinson, groff


[-- Attachment #1.1: Type: text/plain, Size: 3777 bytes --]

Hi Dirk, Ingo, Eli, Colin,

I prepared some (hopefully) fair comparison:

$ sudo make install-man prefix=/opt/local/man/compressed -j LINK_PAGES=symlink Z=.gz >/dev/null
$ sudo make install-man prefix=/opt/local/man/expanded__ -j LINK_PAGES=symlink       >/dev/null


I don't know what kind of magic man(1) does to be so fast reading compressed pages:


$ export MANPATH=/opt/local/man/compressed/share/man
$ time man -Kaw RLIMIT_NOFILE | wc -l
17

real	0m0.330s
user	0m0.261s
sys	0m0.074s
$ time find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l
17

real	0m3.732s
user	0m4.776s
sys	0m0.703s
$ time find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l
17

real	0m3.403s
user	0m4.706s
sys	0m0.699s
$ time find $MANPATH -type f | while read f; do zcat $f | grep -l RLIMIT_NOFILE >/dev/null && echo "$f"; done | wc -l
17

real	0m3.730s
user	0m4.769s
sys	0m1.973s


man(1) seems to be faster than reading uncompressed pages!  See:


$ export MANPATH=/opt/local/man/expanded__/share/man
$ time man -Kaw RLIMIT_NOFILE | wc -l
35

real	0m1.138s
user	0m0.669s
sys	0m0.470s
$ time find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l
17

real	0m0.018s
user	0m0.007s
sys	0m0.015s


Having the pages uncompressed seems to be an important advantage for
searching through the sources.  0.018 (with the manual search) is
more than 10x faster than what man(1) can get from compressed pages.
And it allows using more complex tools, like pcre2grep(1), or sed(1)
for more complex searches.

Colin, did I do anything wrong to have this slowness in man(1) with
uncompressed pages?  Also, it's finding some repeated lines; did we
find a bug?


$ man -Kaw RLIMIT_NOFILE
/opt/local/man/expanded__/share/man/man3/errno.3
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man3/getdtablesize.3
/opt/local/man/expanded__/share/man/man3/mq_open.3
/opt/local/man/expanded__/share/man/man3/sysconf.3
/opt/local/man/expanded__/share/man/man2/fcntl.2
/opt/local/man/expanded__/share/man/man2/fcntl.2
/opt/local/man/expanded__/share/man/man2/open.2
/opt/local/man/expanded__/share/man/man2/open.2
/opt/local/man/expanded__/share/man/man2/open.2
/opt/local/man/expanded__/share/man/man2/poll.2
/opt/local/man/expanded__/share/man/man2/poll.2
/opt/local/man/expanded__/share/man/man2/seccomp_unotify.2
/opt/local/man/expanded__/share/man/man2/pidfd_getfd.2
/opt/local/man/expanded__/share/man/man2/dup.2
/opt/local/man/expanded__/share/man/man2/dup.2
/opt/local/man/expanded__/share/man/man2/dup.2
/opt/local/man/expanded__/share/man/man2/getrlimit.2
/opt/local/man/expanded__/share/man/man2/getrlimit.2
/opt/local/man/expanded__/share/man/man2/getrlimit.2
/opt/local/man/expanded__/share/man/man2/getrlimit.2
/opt/local/man/expanded__/share/man/man2/getrlimit.2
/opt/local/man/expanded__/share/man/man2/pidfd_open.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man2/select.2
/opt/local/man/expanded__/share/man/man5/proc.5
/opt/local/man/expanded__/share/man/man5/proc.5
/opt/local/man/expanded__/share/man/man7/capabilities.7
/opt/local/man/expanded__/share/man/man7/fanotify.7
/opt/local/man/expanded__/share/man/man7/unix.7

$ grep -n RLIMIT_NOFILE /opt/local/man/expanded__/share/man/man2/select.2
412:.B RLIMIT_NOFILE


Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 21:53                         ` Alejandro Colomar
@ 2023-04-08 22:33                           ` Alejandro Colomar
  0 siblings, 0 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-08 22:33 UTC (permalink / raw)
  To: cjwatson
  Cc: linux-man, help-texinfo, nabijaczleweli, g.branden.robinson,
	groff, Dirk Gouders, Eli Zaretskii, Ingo Schwarze


[-- Attachment #1.1: Type: text/plain, Size: 2533 bytes --]



On 4/8/23 23:53, Alejandro Colomar wrote:
> Colin, did I do anything wrong to have this slowness in man(1) with
> uncompressed pages?  Also, it's finding some repeated lines; did we
> find a bug?
> 
> 
> $ man -Kaw RLIMIT_NOFILE
> /opt/local/man/expanded__/share/man/man3/errno.3
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man3/getdtablesize.3
> /opt/local/man/expanded__/share/man/man3/mq_open.3
> /opt/local/man/expanded__/share/man/man3/sysconf.3
> /opt/local/man/expanded__/share/man/man2/fcntl.2
> /opt/local/man/expanded__/share/man/man2/fcntl.2
> /opt/local/man/expanded__/share/man/man2/open.2
> /opt/local/man/expanded__/share/man/man2/open.2
> /opt/local/man/expanded__/share/man/man2/open.2
> /opt/local/man/expanded__/share/man/man2/poll.2
> /opt/local/man/expanded__/share/man/man2/poll.2
> /opt/local/man/expanded__/share/man/man2/seccomp_unotify.2
> /opt/local/man/expanded__/share/man/man2/pidfd_getfd.2
> /opt/local/man/expanded__/share/man/man2/dup.2
> /opt/local/man/expanded__/share/man/man2/dup.2
> /opt/local/man/expanded__/share/man/man2/dup.2
> /opt/local/man/expanded__/share/man/man2/getrlimit.2
> /opt/local/man/expanded__/share/man/man2/getrlimit.2
> /opt/local/man/expanded__/share/man/man2/getrlimit.2
> /opt/local/man/expanded__/share/man/man2/getrlimit.2
> /opt/local/man/expanded__/share/man/man2/getrlimit.2
> /opt/local/man/expanded__/share/man/man2/pidfd_open.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man2/select.2
> /opt/local/man/expanded__/share/man/man5/proc.5
> /opt/local/man/expanded__/share/man/man5/proc.5
> /opt/local/man/expanded__/share/man/man7/capabilities.7
> /opt/local/man/expanded__/share/man/man7/fanotify.7
> /opt/local/man/expanded__/share/man/man7/unix.7
> 
> $ grep -n RLIMIT_NOFILE /opt/local/man/expanded__/share/man/man2/select.2
> 412:.B RLIMIT_NOFILE

Ahh, it seems to be following symlinks as if they were actual pages.
But for some reason this only happens for uncompressed pages, and not
for .gz pages.  Bug here :)

> 
> 
> Cheers,
> Alex
> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 20:59                       ` Dirk Gouders
@ 2023-04-08 22:39                         ` Ingo Schwarze
  2023-04-09  9:50                           ` Dirk Gouders
  0 siblings, 1 reply; 73+ messages in thread
From: Ingo Schwarze @ 2023-04-08 22:39 UTC (permalink / raw)
  To: Dirk Gouders
  Cc: Alejandro Colomar, Colin Watson, Eli Zaretskii, linux-man,
	help-texinfo, nabijaczleweli, g.branden.robinson, groff

Hi Dirk,

Dirk Gouders wrote on Sat, Apr 08, 2023 at 10:59:32PM +0200:
> Ingo Schwarze <schwarze@usta.de> writes:
>> Dirk Gouders wrote on Sat, Apr 08, 2023 at 09:48:13PM +0200:

>>> Yes, it's very slow but close to `man -K`:
>>> 
>>> find...             man -K...
>>> 
>>> real 107.45         real 96.34
>>> user 117.06         user 70.11
>>> sys 14.43           sys 26.86
>>> 
>>> [a thought later]
>>> 
>>> Oh, I found something much faster:
>>> 
>>> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
>>> [snip]
>>> 
>>> real 24.30
>>> user 32.34
>>> sys 6.84
>>> 
>>> Hmm, perhaps, someone has an explanation for this?

>> These are all terribly slow IMHO.
>>
>> For comparison, this happens on my OpenBSD notebook, with more than
>> five hundred optional software packages installed in addition to the
>> complete default installation:
>>
>>    $ time man -k any=RLIMIT_NOFILE
>>   dup, dup2, dup3(2) - duplicate an existing file descriptor
>>   getrlimit, setrlimit(2) - control maximum system resource consumption
>>   sudoers(5) - default sudo security policy plugin
>>     0m00.21s real     0m00.00s user     0m00.03s system

> Yes, this is really fast and would allow for quite interesting ways to
> work with manual pages.
> 
> But, OpenBSD's `man -k` operates on a makewhatis(8) database and not
> on every single manual page or am I wrong?

Yes, you are completely correct about that.
The database format is documented here:

  https://man.openbsd.org/mandoc.db.5

And the search syntax here:

  https://man.openbsd.org/apropos.1

The concept works very well because in contrast to man(7), mdoc(7)
provides substatial semantic markup (without being harder to write
or maintain).

The comparison seemed relevant to me because as far as i understood the
intention of the thread, participants were looking for ideas to make
searching for content in manual pages more powerful and more efficient.
The combination of semantic markup and indexing of marked up content
is one way to make progress in that direction, and the combination
of mdoc(7) with mandoc(1) is an example of a system demonstrating
the concept.

I understand people familiar with GNU info(1) pointed out that
providing index entries that do not correspond to marked up
content is also occasionally useful.  I do not completely disagree
with that, and the mdoc(7) language as implemented by mandoc(1)
provides a dedicated macro to do just that:

  https://man.openbsd.org/mdoc.7#Tg

Then again, practical experience shows that manual tagging is needed
only in extremely rare cases and completely automatic tagging produces
completely satisfactory index entries for the vast majority of cases.

Yours,
  Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 22:39                         ` Ingo Schwarze
@ 2023-04-09  9:50                           ` Dirk Gouders
  2023-04-09 10:35                             ` Dirk Gouders
  0 siblings, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-04-09  9:50 UTC (permalink / raw)
  To: Ingo Schwarze
  Cc: Alejandro Colomar, Colin Watson, Eli Zaretskii, linux-man,
	help-texinfo, nabijaczleweli, g.branden.robinson, groff

Hi Ingo,

Ingo Schwarze <schwarze@usta.de> writes:
> Dirk Gouders wrote on Sat, Apr 08, 2023 at 10:59:32PM +0200:
>> Ingo Schwarze <schwarze@usta.de> writes:
>>> Dirk Gouders wrote on Sat, Apr 08, 2023 at 09:48:13PM +0200:
>
>>>> Yes, it's very slow but close to `man -K`:
>>>> 
>>>> find...             man -K...
>>>> 
>>>> real 107.45         real 96.34
>>>> user 117.06         user 70.11
>>>> sys 14.43           sys 26.86
>>>> 
>>>> [a thought later]
>>>> 
>>>> Oh, I found something much faster:
>>>> 
>>>> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
>>>> [snip]
>>>> 
>>>> real 24.30
>>>> user 32.34
>>>> sys 6.84
>>>> 
>>>> Hmm, perhaps, someone has an explanation for this?
>
>>> These are all terribly slow IMHO.
>>>
>>> For comparison, this happens on my OpenBSD notebook, with more than
>>> five hundred optional software packages installed in addition to the
>>> complete default installation:
>>>
>>>    $ time man -k any=RLIMIT_NOFILE
>>>   dup, dup2, dup3(2) - duplicate an existing file descriptor
>>>   getrlimit, setrlimit(2) - control maximum system resource consumption
>>>   sudoers(5) - default sudo security policy plugin
>>>     0m00.21s real     0m00.00s user     0m00.03s system
>
>> Yes, this is really fast and would allow for quite interesting ways to
>> work with manual pages.
>> 
>> But, OpenBSD's `man -k` operates on a makewhatis(8) database and not
>> on every single manual page or am I wrong?
>
> Yes, you are completely correct about that.
> The database format is documented here:
>
>   https://man.openbsd.org/mandoc.db.5
>
> And the search syntax here:
>
>   https://man.openbsd.org/apropos.1
>
> The concept works very well because in contrast to man(7), mdoc(7)
> provides substatial semantic markup (without being harder to write
> or maintain).
>
> The comparison seemed relevant to me because as far as i understood the
> intention of the thread, participants were looking for ideas to make
> searching for content in manual pages more powerful and more efficient.
> The combination of semantic markup and indexing of marked up content
> is one way to make progress in that direction, and the combination
> of mdoc(7) with mandoc(1) is an example of a system demonstrating
> the concept.

Very interesting.  I gues that makewhatis(8) then has to cope both
formats (man(7) and mdoc(7)) and from between the lines I read that it
is not really a problem.

Are there any outstanding queries mdoc(7) enables that man(7) cannot?
From what I read so far with mdoc(7) it should be very easy (by querying
.Xr), for example to get an answer to the question "Which manual pages
are referencing me?" (From inside a pager, for example).

> I understand people familiar with GNU info(1) pointed out that
> providing index entries that do not correspond to marked up
> content is also occasionally useful.  I do not completely disagree
> with that, and the mdoc(7) language as implemented by mandoc(1)
> provides a dedicated macro to do just that:
>
>   https://man.openbsd.org/mdoc.7#Tg

My role in this thread is not an experts one but the one of a naive guy
who plays with an experimental pager (lsp(1)) that tries to offer some
additional features for handling manual pages.

I read that with .Tg tags are passed to the PAGER and with less(1) one
could use :t to navigate to them.  I tried to see how this works and
wonder how the user knows which tags are available -- maybe man-db's
man(1) doesn't support this...

If your time allows and it's not too off-topic, perhaps you could
provide more detail, e.g. if I can make use of the .Tg tags on a
non-OpenBSD system.

Regards,

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-08 20:02                     ` Eli Zaretskii
  2023-04-08 20:46                       ` Dirk Gouders
@ 2023-04-09 10:28                       ` Ralph Corderoy
  1 sibling, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-09 10:28 UTC (permalink / raw)
  To: linux-man, groff

Hi,

(Colin, something for you near the end; search ‘interesting’.)

Eli wrote:
> Dirk wrote:
> > $ find /usr/share/man -type f -exec bzgrep -l RLIMIT_NOFILE {} \;
...
> > find...             man -K...
> >
> > real 107.45         real 96.34
> > user 117.06         user 70.11
> > sys 14.43           sys 26.86
...
> > $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
...
> > real 24.30
> > user 32.34
> > sys 6.84
>
> Multiprocessing, obviously.  Your CPU has more than one execution
> unit, so the pipe via xargs runs 'find' and 'bzgrep' in parallel on
> two different execution units.  By contrast, "find -exec" runs them
> sequentially, in a single thread.

No, I don't think it's that.

With the first, find(1) does stop whilst waiting for bzgrep to grep a
single file.  bzgrep may or may not run on the same core.  The important
thing is the one bzgrep per file and its fork() and exec() overhead.

The second has find fill a pipe's buffer with paths and when that's
full, xargs's read can return.  This continues until xargs either reads
end-of-file or reaches the argv[] limits.  It then runs a single bzgrep
with many filenames.  The fork+exec overhead is much reduced.

bzgrep is a shell script and has overhead before it gets to the
argument-processing loop.  That overhead is suffered many times if
bzgrep is run once per file.

The *zgrep scripts are a poor option in general due to this
one-grep-per-file overhead.  Better than nothing, but a grep which can
internally decompress all the different compression formats avoids this
shell overhead.

Here is an example.  260 files causes eight times as many clone(2)s,
i.e. forks.  I've added an extra ‘×...’ column.  The ls and xargs will
complete their work nearly instantly.  All the wall-clock time is the
single run of zgrep.

    $ pwd
    /usr/share/man/man7
    $ ls *.gz | wc -l
    260
    $
    $ ls *.gz | LC_ALL=C strace -fc xargs -rd\\n zgrep -H not-to-be-found
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- ---------    ------ ----------------
     93.70   27.510039        7555      3641 ×14  1560 wait4
      0.85    0.248763          26      9389           mmap
      0.68    0.198674          11     17166           rt_sigprocmask
      0.56    0.165702          35      4691           mprotect
      0.52    0.153146           6     22637           rt_sigaction
      0.50    0.146029          21      6780           read
      0.43    0.125451          10     12235      1040 close
      0.31    0.091715          29      3132           openat
      0.25    0.073542          11      6513       522 fcntl
      0.24    0.070822          12      5728      2080 stat
      0.23    0.068825          16      4171           fstat
      0.20    0.057703          24      2348           brk
      0.19    0.054838          69       786         4 execve
      0.18    0.052849          25      2081 ×8        clone
      0.17    0.051089          17      2862       782 access
      0.15    0.043284          55       782           munmap
      0.14    0.040012          48       819           write
      0.11    0.031992          11      2870       260 lseek
      0.11    0.031393           8      3902           dup2
      0.08    0.023038          22      1041           pipe
      0.07    0.021190          13      1560           rt_sigreturn
      0.06    0.018363          11      1564       782 arch_prctl
      0.05    0.016013           7      2081 ×8        getgid
      0.05    0.015314           7      2081 ×8        getegid
      0.05    0.015251           7      2081 ×8        getuid
      0.05    0.014780           7      2081 ×8        geteuid
      0.02    0.004703          18       260 ×1        sigaltstack
      0.01    0.003685          13       264 ×1        prlimit64
      0.01    0.003523           1      2084 ×8        getpid
      0.01    0.003443          13       260 ×1        set_tid_address
      0.01    0.003425          13       260 ×1        set_robust_list
      0.00    0.000094          23         4           getdents64
      0.00    0.000053          17         3         2 ioctl
      0.00    0.000033          16         2           poll
      0.00    0.000018          18         1           sysinfo
      0.00    0.000014          14         1           getppid
      0.00    0.000013          13         1           uname
      0.00    0.000013          13         1           getpgrp
    ------ ----------- ----------- --------- --------- ----------------
    100.00   29.358834                128163      7032 total

Compare with running sh(1) to run zcat and grep on each bunch of xargs's files.

    $ ls *.gz | LC_ALL=C strace -fc sh -c 'xargs -rd\\n zcat | grep not-to-be-found'
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- ---------    ------ ----------------
     82.18    0.150049       37512         4         1 wait4
      5.92    0.010814          12       881           read
      2.25    0.004116          14       286 ×1        openat
      2.17    0.003966          13       299 ×1        write
      1.33    0.002432           8       301 ×1      2 close
      1.24    0.002263          30        74           mmap
      1.19    0.002166           7       285 ×1        fstat
      0.58    0.001060          17        62        16 stat
      0.52    0.000954          34        28           mprotect
      0.42    0.000766          12        63           rt_sigaction
      0.35    0.000642          27        23         6 access
      0.30    0.000543          24        22           rt_sigprocmask
      0.19    0.000346          17        20         1 lseek
      0.17    0.000310          62         5           munmap
      0.14    0.000250          13        18           getuid
      0.13    0.000242          18        13         4 ioctl
      0.13    0.000238          13        18           getegid
      0.13    0.000236          13        18           geteuid
      0.11    0.000204          11        18           brk
      0.11    0.000199          11        18           getgid
      0.06    0.000116          11        10         5 arch_prctl
      0.06    0.000106           9        11         1 fcntl
      0.05    0.000083          10         8           getpid
      0.04    0.000082          13         6           prlimit64
      0.04    0.000076          38         2           pipe
      0.03    0.000061          20         3           clone
      0.03    0.000047          23         2           getpgrp
      0.02    0.000044          22         2           sysinfo
      0.02    0.000033           3         9         4 execve
      0.02    0.000031          10         3           dup2
      0.02    0.000030          30         1           set_tid_address
      0.01    0.000023          11         2           uname
      0.01    0.000022          22         1           set_robust_list
      0.01    0.000014           7         2           poll
      0.01    0.000014          14         1           rt_sigreturn
      0.01    0.000010           5         2           getppid
      0.00    0.000007           1         4           getdents64
      0.00    0.000000           0         1           sigaltstack
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.182595                  2526        40 total
    $

Fifty times fewer system calls.  Especially expensive ones.

Now, here's the interesting bit.  man here also forks once per file.
Presumably so the child can decompress the file and write to a pipe with
the existing search code in the parent reading from the other end
without caring the file is compressed.  Removing the pipe and fork could
speed things up a bit.  A function pointer would be one way.

    $ LC_ALL=C strace -fc man -Ks7 not-to-be-found
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- ---------    ------ ----------------
     61.90    1.890948          14    132630           read
     32.05    0.979069        1882       520 ×2    260 wait4
      1.68    0.051233          98       520 ×2    260 seccomp
      0.90    0.027443          13      2096 ×8        close
      0.61    0.018720          18      1024           write
      0.59    0.017941          17      1040 ×4        rt_sigprocmask
      0.53    0.016137          27       585        37 stat
      0.49    0.015120          27       551        15 openat
      0.26    0.007975          30       260 ×1        pipe
      0.15    0.004705          18       260 ×1        clone
      0.14    0.004345          16       260 ×1        rt_sigreturn
      0.14    0.004254          16       263 ×1        ioctl
      0.13    0.003824          14       262 ×1        getpid
      0.10    0.003067           1      1573           rt_sigaction
      0.08    0.002581           9       273           fstat
      0.07    0.002052           3       520 ×2        prctl
      0.07    0.002040           7       263 ×1        lseek
      0.06    0.001959           7       260 ×1        dup
      0.04    0.001115           2       520 ×2        dup2
      0.01    0.000204          17        12           brk
      0.00    0.000000           0       145           lstat
      0.00    0.000000           0        31           mmap
      0.00    0.000000           0        12           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         3           fcntl
      0.00    0.000000           0         6           readlink
      0.00    0.000000           0         1           umask
      0.00    0.000000           0         1           sysinfo
      0.00    0.000000           0         1           getuid
      0.00    0.000000           0         1           getgid
      0.00    0.000000           0         1           geteuid
      0.00    0.000000           0         1           getegid
      0.00    0.000000           0         3           fstatfs
      0.00    0.000000           0         2         1 arch_prctl
      0.00    0.000000           0         8           getdents64
    ------ ----------- ----------- --------- --------- ----------------
    100.00    3.054732                143911       574 total
    $

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Accessibility of man pages
  2023-04-09  9:50                           ` Dirk Gouders
@ 2023-04-09 10:35                             ` Dirk Gouders
  0 siblings, 0 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-09 10:35 UTC (permalink / raw)
  To: Ingo Schwarze
  Cc: Alejandro Colomar, Colin Watson, Eli Zaretskii, linux-man,
	help-texinfo, nabijaczleweli, g.branden.robinson, groff

Dirk Gouders <dirk@gouders.net> writes:

> Hi Ingo,
>
> Ingo Schwarze <schwarze@usta.de> writes:
>> Dirk Gouders wrote on Sat, Apr 08, 2023 at 10:59:32PM +0200:
>>> Ingo Schwarze <schwarze@usta.de> writes:
>>>> Dirk Gouders wrote on Sat, Apr 08, 2023 at 09:48:13PM +0200:
>>
>>>>> Yes, it's very slow but close to `man -K`:
>>>>> 
>>>>> find...             man -K...
>>>>> 
>>>>> real 107.45         real 96.34
>>>>> user 117.06         user 70.11
>>>>> sys 14.43           sys 26.86
>>>>> 
>>>>> [a thought later]
>>>>> 
>>>>> Oh, I found something much faster:
>>>>> 
>>>>> $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
>>>>> [snip]
>>>>> 
>>>>> real 24.30
>>>>> user 32.34
>>>>> sys 6.84
>>>>> 
>>>>> Hmm, perhaps, someone has an explanation for this?
>>
>>>> These are all terribly slow IMHO.
>>>>
>>>> For comparison, this happens on my OpenBSD notebook, with more than
>>>> five hundred optional software packages installed in addition to the
>>>> complete default installation:
>>>>
>>>>    $ time man -k any=RLIMIT_NOFILE
>>>>   dup, dup2, dup3(2) - duplicate an existing file descriptor
>>>>   getrlimit, setrlimit(2) - control maximum system resource consumption
>>>>   sudoers(5) - default sudo security policy plugin
>>>>     0m00.21s real     0m00.00s user     0m00.03s system
>>
>>> Yes, this is really fast and would allow for quite interesting ways to
>>> work with manual pages.
>>> 
>>> But, OpenBSD's `man -k` operates on a makewhatis(8) database and not
>>> on every single manual page or am I wrong?
>>
>> Yes, you are completely correct about that.
>> The database format is documented here:
>>
>>   https://man.openbsd.org/mandoc.db.5
>>
>> And the search syntax here:
>>
>>   https://man.openbsd.org/apropos.1
>>
>> The concept works very well because in contrast to man(7), mdoc(7)
>> provides substatial semantic markup (without being harder to write
>> or maintain).
>>
>> The comparison seemed relevant to me because as far as i understood the
>> intention of the thread, participants were looking for ideas to make
>> searching for content in manual pages more powerful and more efficient.
>> The combination of semantic markup and indexing of marked up content
>> is one way to make progress in that direction, and the combination
>> of mdoc(7) with mandoc(1) is an example of a system demonstrating
>> the concept.
>
> Very interesting.  I gues that makewhatis(8) then has to cope both
> formats (man(7) and mdoc(7)) and from between the lines I read that it
> is not really a problem.
>
> Are there any outstanding queries mdoc(7) enables that man(7) cannot?
> From what I read so far with mdoc(7) it should be very easy (by querying
> .Xr), for example to get an answer to the question "Which manual pages
> are referencing me?" (From inside a pager, for example).
>
>> I understand people familiar with GNU info(1) pointed out that
>> providing index entries that do not correspond to marked up
>> content is also occasionally useful.  I do not completely disagree
>> with that, and the mdoc(7) language as implemented by mandoc(1)
>> provides a dedicated macro to do just that:
>>
>>   https://man.openbsd.org/mdoc.7#Tg
>
> My role in this thread is not an experts one but the one of a naive guy
> who plays with an experimental pager (lsp(1)) that tries to offer some
> additional features for handling manual pages.
>
> I read that with .Tg tags are passed to the PAGER and with less(1) one
> could use :t to navigate to them.  I tried to see how this works and
> wonder how the user knows which tags are available -- maybe man-db's
> man(1) doesn't support this...
>
> If your time allows and it's not too off-topic, perhaps you could
> provide more detail, e.g. if I can make use of the .Tg tags on a
> non-OpenBSD system.

Hmm, I already learned that I have all those commands available with an
'm' prefixed, i.e. mapropos, mman, mmakewhatis...

So, I built the makewhatis databases:

# find / -name mandoc.db -ls
   659416      4 -rw-r--r--   1 root     root         3984 Apr  9 12:25 /usr/lib/rust/1.66.1/share/man/mandoc.db
   659419      8 -rw-r--r--   1 root     root         4456 Apr  9 12:25 /usr/lib/llvm/15/share/man/mandoc.db
   954004   1812 -rw-r--r--   1 root     root      1848712 Apr  9 12:25 /usr/share/man/mandoc.db
   954003      4 -rw-r--r--   1 root     root         1864 Apr  9 12:24 /usr/share/binutils-data/x86_64-pc-linux-gnu/2.39/man/mandoc.db
   787032      4 -rw-r--r--   1 root     root         1164 Apr  9 12:24 /usr/share/gcc-data/x86_64-pc-linux-gnu/12/man/mandoc.db
      732     12 -rw-r--r--   1 root     root         8444 Apr  9 12:24 /usr/lib64/icedtea8/man/mandoc.db

But your example query gives not matches:

$ mman -k any=RLIMIT_NOFILE
mman: nothing appropriate

It's very fast, though:

real 0.00
user 0.00
sys 0.00

;-)

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
       [not found]                 ` <87a5zhwntt.fsf@ada>
@ 2023-04-09 12:05                   ` Alejandro Colomar
  2023-04-09 12:17                     ` Alejandro Colomar
                                       ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-09 12:05 UTC (permalink / raw)
  To: Alexis, groff, linux-man
  Cc: Ingo Schwarze, Dirk Gouders, Colin Watson, Sam James, Ralph Corderoy


[-- Attachment #1.1: Type: text/plain, Size: 5185 bytes --]

[Added back linux-man@, and people that commented on this (sub)topic]
[Added Sam, I've got a question for you]

Hi Alexis,

Please keep (at least) linux-man@ in the loop.

On 4/9/23 08:44, Alexis wrote:
> 
> As a related data point, i'd like to mention Gentoo's position on 
> this, i.e. that man pages will continue to be bzip2-compressed by 
> default:
> 
> "app-text/mandoc bzip2 support"
> https://bugs.gentoo.org/854267
> 
> "Remove /usr/share/man from default inclusion list for docompress"
> https://bugs.gentoo.org/836367

As Ingo said[1] 3 years ago, I don't think in this year it makes any
sense to compress pages anymore.  However, since it's simple for me
to add support for that, and it can be interesting for testing
purposes, I added support for installing the Linux man-pages
compressed with bzip2 using the Makefile[2].  While I was at it, I
also added support for generating .tar.bz2 release tarballs[3].

With this, I was able to test a bit more than what I did yesterday:


$ sudo rm -rf /opt/local/man/
$ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
2570
$ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
2570
$ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
2570
$ du -sh /opt/local/man/*
5.4M	/opt/local/man/bz2
5.5M	/opt/local/man/gz_
9.4M	/opt/local/man/man


$ export MANPATH=/opt/local/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.31
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
17
1.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
17
1.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.24
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


$ export MANPATH=/opt/local/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
10.90
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
17
1.33
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
17
1.31
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.22


$ export MANPATH=/opt/local/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
17
0.01

Weird thing: today, the symlink bug in man(1) was reproducible in
all kinds of pages, while yesterday it only reproduced in
uncompressed ones.

Another weird thing: times today changed considerably for the
find(1) pipelines (half of yesterday's).  It's not a thing of
using dash(1), because I get similar times with bash(1) and its
builtin time(1).

Important note: Sam, are you sure you want your pages compressed
with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
find a word in the pages?  I suggest that at least you try to
reproduce these tests in your machine, and see if it's just me or
man-db's man(1) is pretty bad at non-gz pages.

Test results:

-  man-db's man(1) is slower with plain man(7) source than with .gz
   pages for some misterious reason.

-  man-db's man(1) is turtle slow with .bz2 pages.

-  xargs -P0 doesn't affect significantly.  As Ralph said, this is
   probably because the main issue with find(1) was having the
   bottleneck in clone/fork+exec, and xargs(1) already solves that.

   Expanding the pipeline to use zcat(1) instead of zgrep(1)
   improves a little bit more, because the zgrep(1) script is
   probably quite inefficient, while zcat(1) is just a simple
   wrapper around gzip(1).  We see that zgrep(1) is more
   inefficient than running ourselves a few programs per file in a
   pipeline!

   Calling gzip(1) directly is even faster, since we avoid invoking
   a shell for such a small script.

   Expanding the bzgrep(1) pipeline into one using bzcat(1) has
   similar improvements.  However, since bzcat(1) is a binary, we
   don't get further improvement from calling bzip2(1) directly.


Cheers,
Alex

> 
> 
> Alexis.
> 


[1]:  <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2>

[2]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056>

[3]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5>

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-09 12:05                   ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
@ 2023-04-09 12:17                     ` Alejandro Colomar
  2023-04-09 18:55                       ` G. Branden Robinson
  2023-04-09 12:29                     ` Colin Watson
  2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
  2 siblings, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-09 12:17 UTC (permalink / raw)
  To: groff, linux-man
  Cc: Ingo Schwarze, Dirk Gouders, Colin Watson, Sam James,
	Ralph Corderoy, Alexis


[-- Attachment #1.1: Type: text/plain, Size: 5931 bytes --]



On 4/9/23 14:05, Alejandro Colomar wrote:
> [Added back linux-man@, and people that commented on this (sub)topic]
> [Added Sam, I've got a question for you]
> 
> Hi Alexis,
> 
> Please keep (at least) linux-man@ in the loop.
> 
> On 4/9/23 08:44, Alexis wrote:
>>
>> As a related data point, i'd like to mention Gentoo's position on 
>> this, i.e. that man pages will continue to be bzip2-compressed by 
>> default:
>>
>> "app-text/mandoc bzip2 support"
>> https://bugs.gentoo.org/854267
>>
>> "Remove /usr/share/man from default inclusion list for docompress"
>> https://bugs.gentoo.org/836367
> 
> As Ingo said[1] 3 years ago, I don't think in this year it makes any
> sense to compress pages anymore.  However, since it's simple for me
> to add support for that, and it can be interesting for testing
> purposes, I added support for installing the Linux man-pages
> compressed with bzip2 using the Makefile[2].  While I was at it, I
> also added support for generating .tar.bz2 release tarballs[3].
> 
> With this, I was able to test a bit more than what I did yesterday:
> 
> 
> $ sudo rm -rf /opt/local/man/
> $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
> 2570
> $ du -sh /opt/local/man/*
> 5.4M	/opt/local/man/bz2
> 5.5M	/opt/local/man/gz_
> 9.4M	/opt/local/man/man
> 
> 
> $ export MANPATH=/opt/local/man/gz_/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.24
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.14
> 
> 
> $ export MANPATH=/opt/local/man/bz2/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 10.90
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.33
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.21
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.22
> 
> 
> $ export MANPATH=/opt/local/man/man/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> 
> Weird thing: today, the symlink bug in man(1) was reproducible in
> all kinds of pages, while yesterday it only reproduced in
> uncompressed ones.
> 
> Another weird thing: times today changed considerably for the
> find(1) pipelines (half of yesterday's).  It's not a thing of
> using dash(1), because I get similar times with bash(1) and its
> builtin time(1).
> 
> Important note: Sam, are you sure you want your pages compressed
> with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages?  I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.
> 
> Test results:
> 
> -  man-db's man(1) is slower with plain man(7) source than with .gz
>    pages for some misterious reason.
> 
> -  man-db's man(1) is turtle slow with .bz2 pages.
> 
> -  xargs -P0 doesn't affect significantly.  As Ralph said, this is
>    probably because the main issue with find(1) was having the
>    bottleneck in clone/fork+exec, and xargs(1) already solves that.
> 
>    Expanding the pipeline to use zcat(1) instead of zgrep(1)
>    improves a little bit more, because the zgrep(1) script is
>    probably quite inefficient, while zcat(1) is just a simple
>    wrapper around gzip(1).  We see that zgrep(1) is more
>    inefficient than running ourselves a few programs per file in a
>    pipeline!
> 
>    Calling gzip(1) directly is even faster, since we avoid invoking
>    a shell for such a small script.
> 
>    Expanding the bzgrep(1) pipeline into one using bzcat(1) has
>    similar improvements.  However, since bzcat(1) is a binary, we
>    don't get further improvement from calling bzip2(1) directly.

And I forgot the obvious one:

-  Using plain man(7) source is blazingly fast.  So much that I
   don't miss mdoc(7)'s indexability so much.

However, I must admit that I do miss mdoc(7)'s power sometimes.
The man_lsfunc() and man_lsvar() functions for finding function
prototypes and variable declarations in man(7) source would be
much simpler using mdoc(1), and I could even use mandoc(1) to
find such things.

> 
> 
> Cheers,
> Alex
> 
>>
>>
>> Alexis.
>>
> 
> 
> [1]:  <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2>
> 
> [2]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056>
> 
> [3]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5>
> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-09 12:05                   ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
  2023-04-09 12:17                     ` Alejandro Colomar
@ 2023-04-09 12:29                     ` Colin Watson
  2023-04-09 13:36                       ` Alejandro Colomar
  2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
  2 siblings, 1 reply; 73+ messages in thread
From: Colin Watson @ 2023-04-09 12:29 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Alexis, groff, linux-man, Ingo Schwarze, Dirk Gouders, Sam James,
	Ralph Corderoy

On Sun, Apr 09, 2023 at 02:05:08PM +0200, Alejandro Colomar wrote:
> Important note: Sam, are you sure you want your pages compressed
> with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages?  I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.

man-db is significantly slower with bzip2 than gzip these days, because
much of the performance work I did in 2.10.0 only applies to gzip:
there's in-process support for decompressing gzip, but we use
subprocesses for bzip2.  IMO the relatively small difference in
compressed size doesn't justify the effort of building in-process
support for multiple compression algorithms.  I recommend that
distributions just use gzip; but if distributions _really_ want to use
something else for whatever reason, then perhaps they should contribute
code to man-db to ensure similar performance to gzip.  I'm happy to give
pointers if there's a sufficiently compelling reason to make it worth
the effort.

> -  man-db's man(1) is slower with plain man(7) source than with .gz
>    pages for some misterious reason.

Maybe CPU is sufficiently cheaper than I/O that the fact of reading less
data from disk dominates.


(Can I request that any concrete actions that need to be taken based on
this thread be split out to separate bug reports or something, please?
This thread is long and I don't really want to have lots of meandering
discourse in my inbox going back over the tired old man vs. info debate
or whatever, but if there are actual things I need to fix in man-db then
I'd rather not miss them.)

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-09 12:29                     ` Colin Watson
@ 2023-04-09 13:36                       ` Alejandro Colomar
  2023-04-09 13:47                         ` Compressed man pages Ralph Corderoy
  0 siblings, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-09 13:36 UTC (permalink / raw)
  To: Colin Watson, Sam James
  Cc: Alexis, groff, linux-man, Ingo Schwarze, Ralph Corderoy, Dirk Gouders


[-- Attachment #1.1: Type: text/plain, Size: 5842 bytes --]



On 4/9/23 14:29, Colin Watson wrote:
> On Sun, Apr 09, 2023 at 02:05:08PM +0200, Alejandro Colomar wrote:
>> Important note: Sam, are you sure you want your pages compressed
>> with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
>> find a word in the pages?  I suggest that at least you try to
>> reproduce these tests in your machine, and see if it's just me or
>> man-db's man(1) is pretty bad at non-gz pages.
> 
> man-db is significantly slower with bzip2 than gzip these days, because
> much of the performance work I did in 2.10.0 only applies to gzip:
> there's in-process support for decompressing gzip, but we use
> subprocesses for bzip2.  IMO the relatively small difference in
> compressed size doesn't justify the effort of building in-process
> support for multiple compression algorithms.

Agree.

>  I recommend that
> distributions just use gzip;

I don't agree here.  gzip vs man source is 5M vs 9M.  However, a
simple pipeline searching for a word in gzip pages takes ~114x the
time it takes to perform the same search on man(7) source.  I don't
think that small benefit in size doesn't justify the slowness.

Of course, this is only about theoretical maximum performance.
Current man(1) has other issues so it doesn't benefit from this
performance advantage.


> but if distributions _really_ want to use
> something else for whatever reason, then perhaps they should contribute
> code to man-db to ensure similar performance to gzip.  I'm happy to give
> pointers if there's a sufficiently compelling reason to make it worth
> the effort.
> 
>> -  man-db's man(1) is slower with plain man(7) source than with .gz
>>    pages for some misterious reason.
> 
> Maybe CPU is sufficiently cheaper than I/O that the fact of reading less
> data from disk dominates.

My CPU is powerful, but so is my SSD.  I wouldn't expect decompressing
to be faster than I/O.  I have a Samsung 960 PRO, which is quite fast[1].

$ lscpu
[...]
  Model name:            Intel(R) Core(TM) i7-5775C CPU @ 3.30GHz
    CPU family:          6
    Model:               71
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    CPU(s) scaling MHz:  44%
    CPU max MHz:         3700.0000
    CPU min MHz:         800.0000
[...]
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    1 MiB (4 instances)
  L3:                    6 MiB (1 instance)
  L4:                    128 MiB (1 instance)
[...]

$ lspci | grep -i samsung
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963

$ lsblk -o NAME,FSTYPE,MOUNTPOINT,SIZE,MODEL
NAME                                FSTYPE   MOUNTPOINT              SIZE MODEL
[...]
nvme0n1                                                            953.9G Samsung SSD 960 PRO
├─nvme0n1p1                         vfat     /boot/efi              1023M 
├─nvme0n1p2                         ext4     /boot                     4G 
└─nvme0n1p3                         crypto_L                         948G 
  └─nvme0n1p3_crypt                 ext4     /                       948G


Also, a manual loop should have similar problems, but it doesn't have
them; if I loop manually over the files and grep them, it takes 0.01 s,
which is the lowest that /bin/time can measure on my system.


I repeated the tests on a tmpfs just to check.  The times are almost the
same (except that bzip goes down from 10 s to 9 s :).


$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,noatime,inode64)
$ sudo rm -r /tmp/man
$ sudo make install-man prefix=/tmp/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
2570
$ sudo make install-man prefix=/tmp/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
2570
$ sudo make install-man prefix=/tmp/man/man -j LINK_PAGES=symlink Z= | wc -l
2570
$ du -sh /tmp/man/*
5.3M	/tmp/man/bz2
5.4M	/tmp/man/gz_
9.3M	/tmp/man/man


$ export MANPATH=/tmp/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.30
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


This is quite optimized.  I can't beat man(1) with a shell pipeline
for .gz pages.  :)


$ export MANPATH=/tmp/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
9.22
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.22


Sam, really consider not using .bz2 for Gentoo's pages.  :)


$ export MANPATH=/tmp/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.52
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01


man(1) is ~52x slower than my loop.  Similar results from RAM and NVMe,
so I/O is not the issue here.

> 
> 
> (Can I request that any concrete actions that need to be taken based on
> this thread be split out to separate bug reports or something, please?
> This thread is long and I don't really want to have lots of meandering
> discourse in my inbox going back over the tired old man vs. info debate
> or whatever, but if there are actual things I need to fix in man-db then
> I'd rather not miss them.)

Sure; do you have a mailing list, or should I send them to you and CC
linux-man@?  I have at least one bug report for you.

Cheers,
Alex

[1]:  <https://www.anandtech.com/show/10754/samsung-960-pro-ssd-review>

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages
  2023-04-09 13:36                       ` Alejandro Colomar
@ 2023-04-09 13:47                         ` Ralph Corderoy
  0 siblings, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-09 13:47 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Colin Watson, groff, linux-man

Hi Alejandro,

> Sure; do you have a mailing list, or should I send them to you and
> CC linux-man@?  I have at least one bug report for you.

Start from https://man-db.gitlab.io/man-db/,
which is the home page according to Arch Linux's package,
and you'll end up in all the typical places:
mailing list, issue tracker, etc.

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-09 12:17                     ` Alejandro Colomar
@ 2023-04-09 18:55                       ` G. Branden Robinson
  0 siblings, 0 replies; 73+ messages in thread
From: G. Branden Robinson @ 2023-04-09 18:55 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: groff, linux-man, Dirk Gouders, Sam James, Alexis

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

[dropping some people I recognize from the groff list from CC]

At 2023-04-09T14:17:57+0200, Alejandro Colomar wrote:
> -  Using plain man(7) source is blazingly fast.  So much that I
>    don't miss mdoc(7)'s indexability so much.
> 
> However, I must admit that I do miss mdoc(7)'s power sometimes.
> The man_lsfunc() and man_lsvar() functions for finding function
> prototypes and variable declarations in man(7) source would be
> much simpler using mdoc(1), and I could even use mandoc(1) to
> find such things.

I must point out that I have sketched a solution for solving the problem
of semantic tagging in man(7).

https://lists.gnu.org/archive/html/groff/2022-12/msg00075.html

...though perhaps I should add some detail to that sketch.  My ideas are
firming up, so I may mail a proposal to groff@ and linux-man@ in the
near future.

I'm happy to report that all the man(7) extension macros I have in mind,
except for `Q` for quotation[1], will be trivially ignorable; i.e., an
implementation (like mandoc(1)) that doesn't recognize them can ignore
them (treating them as comment lines) without doing damage to the
rendered text of a page.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2022-12/msg00078.html

    ...and even that admits a one-line fallback definition.  I suspect
    you could even get away with defining it as a string.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-07 22:16               ` Alejandro Colomar
@ 2023-04-10 19:05                 ` Dirk Gouders
  2023-04-10 19:57                   ` Alejandro Colomar
  2023-04-10 20:24                   ` G. Branden Robinson
  0 siblings, 2 replies; 73+ messages in thread
From: Dirk Gouders @ 2023-04-10 19:05 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: G. Branden Robinson, Eli Zaretskii, linux-man, help-texinfo, groff

Hi Alex,

Alejandro Colomar <alx.manpages@gmail.com> writes:

> On 4/8/23 00:09, Dirk Gouders wrote:
>>> Maybe it could be done with .SH and .SS.  The heuristics to find these
>>> are simple.  It wouldn't be very precise, but it could try to find the
>>> closest (only upwards) (sub)section heading.  With some luck, .TP would
>>> also be helpful.
>> 
>> Yes, that should give nice results.  But for manual pages like git(1)
>> with large areas between those this becomes difficult, again.
>> 
>> Today, I experimented with one more heuristics, adjusting the current
>> position according to the proportional change of avg. line size and also
>> change of window dimension (horizontal) but all of those didn't get better
>> results than what I currently implemented (stay at the position).
>> 
>> Out of curiosity, I checked how firefox behaves on horizontal resizes
>> and comparing to some of those results, lsp is not the worst on earth ;-)
>> 
>> If time allows, I want to see if working with Levenshtein distances
>> could get exact results.  Perhaps this will turn out to be too expensive
>> but maybe the fact that the area to be checked is limited helps...
>
> For something simpler, you could just count words since the start of the
> section divided by total words in the section.  That should be fast, and
> I expect, also quite precise.  Hyphenating might work against you on
> this, but on average it shouldn't move you too much.

very pragmatic -- very effective, thanks for that suggestion.  I
started with implementing a simpler version of that (no counting of all
words in the section):

- Backwards count words until we reach an empty line, the section
  header or the beginning of the document

        Stop if it was the section header or beginning of the document

        Continue and just count empty lines until we reach the
        section header or the beginning of the document

This relies on the assumption that horizontal resizes don't create or
delete emty lines and it still has the weakness that manual pages
(e.g. bash(1)) contain large areas without empty lines but it's
definitely better than just staying at the position as it was before.

If it turns out to still be too weak, I could count all words between
two empty lines and set that in relation to the words from the
preceeding empty line.

But perhaps, I now learn that empty lines are by no means that constant
value that I assume...

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-10 19:05                 ` Dirk Gouders
@ 2023-04-10 19:57                   ` Alejandro Colomar
  2023-04-10 20:24                   ` G. Branden Robinson
  1 sibling, 0 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-10 19:57 UTC (permalink / raw)
  To: Dirk Gouders
  Cc: G. Branden Robinson, Eli Zaretskii, linux-man, help-texinfo, groff


[-- Attachment #1.1: Type: text/plain, Size: 2282 bytes --]

Hi Dirk,

On 4/10/23 21:05, Dirk Gouders wrote:
>> For something simpler, you could just count words since the start of the
>> section divided by total words in the section.  That should be fast, and
>> I expect, also quite precise.  Hyphenating might work against you on
>> this, but on average it shouldn't move you too much.
> 
> very pragmatic -- very effective, thanks for that suggestion.  I
> started with implementing a simpler version of that (no counting of all
> words in the section):
> 
> - Backwards count words until we reach an empty line, the section
>   header or the beginning of the document
> 
>         Stop if it was the section header or beginning of the document
> 
>         Continue and just count empty lines until we reach the
>         section header or the beginning of the document

Hmmmm, good idea.

$ man gcc 2>/dev/null | grep "^$" | wc -l
5462
$ man gcc 2>/dev/null | grep "^$" | wc -l
5462
$ man gcc 2>/dev/null | grep "^$" | wc -l
5464

$ man tzset 2>/dev/null | grep "^$" | wc -l
41
$ man tzset 2>/dev/null | grep "^$" | wc -l
41
$ man tzset 2>/dev/null | grep "^$" | wc -l
41

$ man bash 2>/dev/null | grep "^$" | wc -l
657
$ man bash 2>/dev/null | grep "^$" | wc -l
657
$ man bash 2>/dev/null | grep "^$" | wc -l
658


Of course there were important resizes between those invocations. 

> 
> This relies on the assumption that horizontal resizes don't create or
> delete emty lines and it still has the weakness that manual pages
> (e.g. bash(1)) contain large areas without empty lines but it's
> definitely better than just staying at the position as it was before.

 
That should give you a quite precise idea of where you were.

> 
> If it turns out to still be too weak, I could count all words between
> two empty lines and set that in relation to the words from the
> preceeding empty line.
> 
> But perhaps, I now learn that empty lines are by no means that constant
> value that I assume...

They seem to be constant.  Only with the shortest terminal size I can
have, that number changes, and only by one or two per entire page.

> 
> Dirk

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-10 19:05                 ` Dirk Gouders
  2023-04-10 19:57                   ` Alejandro Colomar
@ 2023-04-10 20:24                   ` G. Branden Robinson
  2023-04-11  9:20                     ` Ralph Corderoy
  2023-04-11  9:39                     ` Dirk Gouders
  1 sibling, 2 replies; 73+ messages in thread
From: G. Branden Robinson @ 2023-04-10 20:24 UTC (permalink / raw)
  To: Dirk Gouders
  Cc: Alejandro Colomar, Eli Zaretskii, linux-man, help-texinfo, groff

[-- Attachment #1: Type: text/plain, Size: 4103 bytes --]

Hi Dirk,

At 2023-04-10T21:05:24+0200, Dirk Gouders wrote:
> This relies on the assumption that horizontal resizes don't create or
> delete emty lines and it still has the weakness that manual pages
> (e.g. bash(1)) contain large areas without empty lines but it's
> definitely better than just staying at the position as it was before.

I think this assumption should hold for man and mdoc documents rendered
by a *roff--I'm not sure about mandoc(1), but it probably will for
reasons I'll elaborate below.

Vertical space in *roff documents might get reduced at page breaks, but
not to zero, except at page breaks.

There are a few reasons that I think reinforce the assumption holding:

1.  man(7) and mdoc(7) don't offer macros for just sticking an arbitrary
amount of vertical space into a document.  If you want that, you'll need
to go down to formatter requests, which is seldom done by human man page
authors, but a bit more frequently by automated generators of man(7) or
mdoc(7) from other formats.

2.  Even in traditional *roff, if you issued an ".sp 6" request
(demanding 6 blank lines), then if you were within 6 lines of a "trap"
(usually a page footer trap or the actual bottom of the page), the
result would be that you'd get blank lines until the trap sprung, and
any excess would be thrown away.  So if there were only 4 lines of
distance to the page footer, the leftover two would be discarded and
_not_ appear after the header of the next page.[1]

3.  mandoc(1) and groff's man(7) and mdoc(7) macro packages both
implement "continuous rendering" for terminal output.  This means that
they contrive to arrange for an effectively infinite page length, so
there are no page breaks.  (Except when you render multiple man pages at
a time, a use case groff 1.23.0 will support.) Since pager programs are
applicable only to terminal output in the first place, this should
address your use case.  (You _can_ turn off continuous rendering in
groff, and see man pages as they would have formatted for Western
Electric Teletype machines, which printed to long spools of paper with
66 lines to the nominal page.)

4.  A habit has grown up among man(1) programs and pagers to call for
and support, respectively, a "blank line squeezing" feature: any runs of
more than one blank line are condensed to 1 blank line each.  In groff
1.23.0, this will no longer be necessary when continuously rendering.
(Historically, this squeezing feature was used to "tighten up" vertical
space after the page header, prior to the "NAME" section heading of the
document.)  In my opinion, pager programs should perform as few
transformations as possible on the output of grotty(1), the groff output
driver that supports terminal devices.  The long-time author and
maintainer of less(1) does not agree, so you have to call that program
with its "-R" flag to view grotty(1) output as groff intends it.  (To
see what those intentions are, format the document without paging it.)

> If it turns out to still be too weak, I could count all words between
> two empty lines and set that in relation to the words from the
> preceeding empty line.

You might do this only as a fallback, if there were no blank lines on
the screen at the old window size when the resizing event happened.

> But perhaps, I now learn that empty lines are by no means that
> constant value that I assume...

In my opinion, the presence or absence of a single blank line in
formatted output is important.  groff 1.23.0 will feature some bug fixes
with respect to their handling within and adjacent to tbl(1) input.[2]

Since I flogged groff 1.23.0 three times in this email, I suppose I
should point people to where they can get the 1.23.0.rc3 release
candidate source archive.  Feedback would be appreciated.

https://alpha.gnu.org/gnu/groff/

Regards,
Branden

[1] For example, give the following input to "nroff | cat -n".

--snip--
.pl 10v
.nf
The page length is set to 10 vees.
2
3
4
5
Asking for 6 vees of space now.
.sp 6
How many appeared?
--end snip--

[2] https://savannah.gnu.org/bugs/?57665
    https://savannah.gnu.org/bugs/?49390

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-10 20:24                   ` G. Branden Robinson
@ 2023-04-11  9:20                     ` Ralph Corderoy
  2023-04-11  9:39                     ` Dirk Gouders
  1 sibling, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-11  9:20 UTC (permalink / raw)
  To: linux-man, groff

Hi Branden,

> see man pages as they would have formatted for Western Electric
> Teletype machines, which printed to long spools of paper with 66 lines
> to the nominal page.

In case it isn't obvious, it was normal for teletypes and line printers
to print six lines per inch onto letter-height fan-fold paper perforated
every eleven inches giving 66 lines per real page, not nominal.

As long as the paper was positioned so it started printing just after a
perforation, the page breaks occurred over a perforation.  To allow for
a bit of leeway, the page often started and ended with blank lines.

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-10 20:24                   ` G. Branden Robinson
  2023-04-11  9:20                     ` Ralph Corderoy
@ 2023-04-11  9:39                     ` Dirk Gouders
  2023-04-17  6:23                       ` G. Branden Robinson
  1 sibling, 1 reply; 73+ messages in thread
From: Dirk Gouders @ 2023-04-11  9:39 UTC (permalink / raw)
  To: G. Branden Robinson
  Cc: Alejandro Colomar, Eli Zaretskii, linux-man, help-texinfo, groff

Hi Branden,

"G. Branden Robinson" <g.branden.robinson@gmail.com> writes:

> At 2023-04-10T21:05:24+0200, Dirk Gouders wrote:
>> This relies on the assumption that horizontal resizes don't create or
>> delete emty lines and it still has the weakness that manual pages
>> (e.g. bash(1)) contain large areas without empty lines but it's
>> definitely better than just staying at the position as it was before.
>
> I think this assumption should hold for man and mdoc documents rendered
> by a *roff--I'm not sure about mandoc(1), but it probably will for
> reasons I'll elaborate below.
>
> Vertical space in *roff documents might get reduced at page breaks, but
> not to zero, except at page breaks.
>
> There are a few reasons that I think reinforce the assumption holding:
>
> 1.  man(7) and mdoc(7) don't offer macros for just sticking an arbitrary
> amount of vertical space into a document.  If you want that, you'll need
> to go down to formatter requests, which is seldom done by human man page
> authors, but a bit more frequently by automated generators of man(7) or
> mdoc(7) from other formats.
>
> 2.  Even in traditional *roff, if you issued an ".sp 6" request
> (demanding 6 blank lines), then if you were within 6 lines of a "trap"
> (usually a page footer trap or the actual bottom of the page), the
> result would be that you'd get blank lines until the trap sprung, and
> any excess would be thrown away.  So if there were only 4 lines of
> distance to the page footer, the leftover two would be discarded and
> _not_ appear after the header of the next page.[1]
>
> 3.  mandoc(1) and groff's man(7) and mdoc(7) macro packages both
> implement "continuous rendering" for terminal output.  This means that
> they contrive to arrange for an effectively infinite page length, so
> there are no page breaks.  (Except when you render multiple man pages at
> a time, a use case groff 1.23.0 will support.) Since pager programs are
> applicable only to terminal output in the first place, this should
> address your use case.  (You _can_ turn off continuous rendering in
> groff, and see man pages as they would have formatted for Western
> Electric Teletype machines, which printed to long spools of paper with
> 66 lines to the nominal page.)
>
> 4.  A habit has grown up among man(1) programs and pagers to call for
> and support, respectively, a "blank line squeezing" feature: any runs of
> more than one blank line are condensed to 1 blank line each.  In groff
> 1.23.0, this will no longer be necessary when continuously rendering.
> (Historically, this squeezing feature was used to "tighten up" vertical
> space after the page header, prior to the "NAME" section heading of the
> document.)  In my opinion, pager programs should perform as few
> transformations as possible on the output of grotty(1), the groff output
> driver that supports terminal devices.  The long-time author and
> maintainer of less(1) does not agree, so you have to call that program
> with its "-R" flag to view grotty(1) output as groff intends it.  (To
> see what those intentions are, format the document without paging it.)

Thank you for the detailled assessment.  Perhaps my misunderstanding is
because I'm not a native speaker but which document should I format to
see what those intentions are?

>> If it turns out to still be too weak, I could count all words between
>> two empty lines and set that in relation to the words from the
>> preceeding empty line.
>
> You might do this only as a fallback, if there were no blank lines on
> the screen at the old window size when the resizing event happened.

Yes, such a fallback would be good to have.  I am again about to
implement a suggestion with some modifications: I would make using the
section-word-count (which is expensive) dependent on _how many_ words I
found while searching for an empty line or the section header.  My
motivation for this is that an increasing number of continuous words
also increases the possibility for hyphenation working against the
heuristics.  Saying that, I probably also need to consider the number of
lines that contain those words.  I have to think more about this.

>> But perhaps, I now learn that empty lines are by no means that
>> constant value that I assume...
>
> In my opinion, the presence or absence of a single blank line in
> formatted output is important.  groff 1.23.0 will feature some bug fixes
> with respect to their handling within and adjacent to tbl(1) input.[2]
>
> Since I flogged groff 1.23.0 three times in this email, I suppose I
> should point people to where they can get the 1.23.0.rc3 release
> candidate source archive.  Feedback would be appreciated.

Oh well, I didn't measure it but I spent quite some time to work on
doc/lsp-help.1 and try to find a solution for that "nasty empty line"
that appeared in of the tables that I use for the online help -- I was
convinced it was my fault.

Gentoo already has an ebuild for groff-1.23.0-rc3 and simply using this
fixes that problem in the table.  So, from now on all my testing happens
with groff-1.23.0-rc3 and I will report should I recognize problems.

Regards,

Dirk

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-09 12:05                   ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
  2023-04-09 12:17                     ` Alejandro Colomar
  2023-04-09 12:29                     ` Colin Watson
@ 2023-04-12  8:13                     ` Sam James
  2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
  2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
  2 siblings, 2 replies; 73+ messages in thread
From: Sam James @ 2023-04-12  8:13 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Alexis, groff, linux-man, Ingo Schwarze, Dirk Gouders,
	Colin Watson, Ralph Corderoy, Kerin Millar

[-- Attachment #1: Type: text/plain, Size: 4996 bytes --]


Alejandro Colomar <alx.manpages@gmail.com> writes:

> [[PGP Signed Part:Undecided]]
> [Added back linux-man@, and people that commented on this (sub)topic]
> [Added Sam, I've got a question for you]
>
> Hi Alexis,
>
> Please keep (at least) linux-man@ in the loop.
>
> On 4/9/23 08:44, Alexis wrote:
>> 
>> As a related data point, i'd like to mention Gentoo's position on 
>> this, i.e. that man pages will continue to be bzip2-compressed by
>> default:
>> 
>> "app-text/mandoc bzip2 support"
>> https://bugs.gentoo.org/854267
>> 
>> "Remove /usr/share/man from default inclusion list for docompress"
>> https://bugs.gentoo.org/836367
>
> As Ingo said[1] 3 years ago, I don't think in this year it makes any
> sense to compress pages anymore.  However, since it's simple for me
> to add support for that, and it can be interesting for testing
> purposes, I added support for installing the Linux man-pages
> compressed with bzip2 using the Makefile[2].  While I was at it, I
> also added support for generating .tar.bz2 release tarballs[3].
>
> With this, I was able to test a bit more than what I did yesterday:
>
>
> $ sudo rm -rf /opt/local/man/
> $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
> 2570
> $ du -sh /opt/local/man/*
> 5.4M	/opt/local/man/bz2
> 5.5M	/opt/local/man/gz_
> 9.4M	/opt/local/man/man
>
>
> $ export MANPATH=/opt/local/man/gz_/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.24
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.14
>
>
> $ export MANPATH=/opt/local/man/bz2/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 10.90
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.33
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.21
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.22
>
>
> $ export MANPATH=/opt/local/man/man/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
>
> Weird thing: today, the symlink bug in man(1) was reproducible in
> all kinds of pages, while yesterday it only reproduced in
> uncompressed ones.
>
> Another weird thing: times today changed considerably for the
> find(1) pipelines (half of yesterday's).  It's not a thing of
> using dash(1), because I get similar times with bash(1) and its
> builtin time(1).
>
> Important note: Sam, are you sure you want your pages compressed
> with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages?  I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.
>
> Test results:
>
> -  man-db's man(1) is slower with plain man(7) source than with .gz
>    pages for some misterious reason.
>
> -  man-db's man(1) is turtle slow with .bz2 pages.

I started looking into changing to xz (or just.. not bz2, anyway),
partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 /
just interest locally (without having done measurements to see if it
would be worth a global change) and the xz maintainer ended up
recommending a different implementation to how man-db currently handles
external utilties entirely (which I have a draft of).

The xz author had some suggestions on the best parameters to use
for man pages too which I need to look into and dig up...

https://bugs.gentoo.org/169260 was an interesting discussion
about our choice of bz2 (it came up a bit in
https://bugs.gentoo.org/372653 too).

(I'll get back and read the rest of the thread later, but wanted
to add this tidbit.)

Definitely surprised to learn bz2 is *that* bad though!

best,
sam

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages
  2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
@ 2023-04-12  8:32                       ` Ralph Corderoy
  2023-04-12 10:35                         ` Mingye Wang
  2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
  1 sibling, 1 reply; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-12  8:32 UTC (permalink / raw)
  To: Sam James; +Cc: groff, linux-man

Hi Sam,

> I started looking into changing to xz (or just.. not bz2, anyway)

If you're putting effort into researching another compressor then
consider lzip(1).  https://www.nongnu.org/lzip/lzip.html

Its author compares it against xz in particular.
https://www.nongnu.org/lzip/xz_inadequate.html

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages
  2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
@ 2023-04-12 10:35                         ` Mingye Wang
  2023-04-12 10:55                           ` Ralph Corderoy
  0 siblings, 1 reply; 73+ messages in thread
From: Mingye Wang @ 2023-04-12 10:35 UTC (permalink / raw)
  To: Ralph Corderoy; +Cc: Sam James, groff, linux-man

On Wed, Apr 12, 2023 at 4:36 PM Ralph Corderoy <ralph@inputplus.co.uk> wrote:
> If you're putting effort into researching another compressor then
> consider lzip(1).  https://www.nongnu.org/lzip/lzip.html
>
> Its author compares it against xz in particular.
> https://www.nongnu.org/lzip/xz_inadequate.html

lzip is cool and all, but the thing is we are talking about storage
for distribution on every single person's computer in single-file
form, not archiving into a tarball. We are looking at a world where
almost every system has xz installed because of some past decisions,
unfortunate or not.

Regards,
Mingye

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages
  2023-04-12 10:35                         ` Mingye Wang
@ 2023-04-12 10:55                           ` Ralph Corderoy
  0 siblings, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-12 10:55 UTC (permalink / raw)
  To: groff, linux-man; +Cc: Sam James

Hi Mingye,

> the thing is we are talking about storage for distribution on every
> single person's computer

No, I was talking to sam@gentoo.org so I assumed Gentoo as the target.

> We are looking at a world where almost every system has xz installed
> because of some past decisions, unfortunate or not.

That's not the kind of thing I expect to bother Gentoo.  :-)

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
  2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
@ 2023-04-12 13:04                       ` Kerin Millar
  2023-04-12 14:24                         ` Alejandro Colomar
  1 sibling, 1 reply; 73+ messages in thread
From: Kerin Millar @ 2023-04-12 13:04 UTC (permalink / raw)
  To: Sam James
  Cc: Alejandro Colomar, Alexis, groff, linux-man, Ingo Schwarze,
	Dirk Gouders, Colin Watson, Ralph Corderoy

On Wed, 12 Apr 2023 09:13:13 +0100
Sam James <sam@gentoo.org> wrote:

> 
> Alejandro Colomar <alx.manpages@gmail.com> writes:
> 
> > [[PGP Signed Part:Undecided]]
> > [Added back linux-man@, and people that commented on this (sub)topic]
> > [Added Sam, I've got a question for you]
> >
> > Hi Alexis,
> >
> > Please keep (at least) linux-man@ in the loop.
> >
> > On 4/9/23 08:44, Alexis wrote:
> >> 
> >> As a related data point, i'd like to mention Gentoo's position on 
> >> this, i.e. that man pages will continue to be bzip2-compressed by
> >> default:
> >> 
> >> "app-text/mandoc bzip2 support"
> >> https://bugs.gentoo.org/854267
> >> 
> >> "Remove /usr/share/man from default inclusion list for docompress"
> >> https://bugs.gentoo.org/836367
> >
> > As Ingo said[1] 3 years ago, I don't think in this year it makes any
> > sense to compress pages anymore.  However, since it's simple for me
> > to add support for that, and it can be interesting for testing
> > purposes, I added support for installing the Linux man-pages
> > compressed with bzip2 using the Makefile[2].  While I was at it, I
> > also added support for generating .tar.bz2 release tarballs[3].
> >
> > With this, I was able to test a bit more than what I did yesterday:
> >
> >
> > $ sudo rm -rf /opt/local/man/
> > $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
> > 2570
> > $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
> > 2570
> > $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
> > 2570
> > $ du -sh /opt/local/man/*
> > 5.4M	/opt/local/man/bz2
> > 5.5M	/opt/local/man/gz_
> > 9.4M	/opt/local/man/man
> >
> >
> > $ export MANPATH=/opt/local/man/gz_/share/man
> > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> > 37
> > 0.31
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.56
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.56
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.24
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.14
> >
> >
> > $ export MANPATH=/opt/local/man/bz2/share/man
> > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> > 37
> > 10.90
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.33
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.31
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.21
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.22
> >
> >
> > $ export MANPATH=/opt/local/man/man/share/man
> > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> > 37
> > 0.56
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 0.01
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 0.01
> >
> > Weird thing: today, the symlink bug in man(1) was reproducible in
> > all kinds of pages, while yesterday it only reproduced in
> > uncompressed ones.
> >
> > Another weird thing: times today changed considerably for the
> > find(1) pipelines (half of yesterday's).  It's not a thing of
> > using dash(1), because I get similar times with bash(1) and its
> > builtin time(1).
> >
> > Important note: Sam, are you sure you want your pages compressed
> > with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
> > find a word in the pages?  I suggest that at least you try to
> > reproduce these tests in your machine, and see if it's just me or
> > man-db's man(1) is pretty bad at non-gz pages.
> >
> > Test results:
> >
> > -  man-db's man(1) is slower with plain man(7) source than with .gz
> >    pages for some misterious reason.
> >
> > -  man-db's man(1) is turtle slow with .bz2 pages.
> 
> I started looking into changing to xz (or just.. not bz2, anyway),
> partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 /
> just interest locally (without having done measurements to see if it
> would be worth a global change) and the xz maintainer ended up
> recommending a different implementation to how man-db currently handles
> external utilties entirely (which I have a draft of).
> 
> The xz author had some suggestions on the best parameters to use
> for man pages too which I need to look into and dig up...
> 
> https://bugs.gentoo.org/169260 was an interesting discussion
> about our choice of bz2 (it came up a bit in
> https://bugs.gentoo.org/372653 too).

Oh, I remember this. Soon after #372653 was closed, I experimented further and found xz --lzma2=preset=6e,pb=0 to be more effective than bzip -9, both in terms of compression ratio and subsequent decompression performance, so I used those settings for a time. Nowadays, I would be more concerned with the time taken to render a man page than in reducing the footprint of the installed documentation.

> 
> (I'll get back and read the rest of the thread later, but wanted
> to add this tidbit.)
> 
> Definitely surprised to learn bz2 is *that* bad though!
> 
> best,
> sam

-- 
Kerin Millar

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
@ 2023-04-12 14:24                         ` Alejandro Colomar
  2023-04-12 18:52                           ` Mingye Wang
  0 siblings, 1 reply; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-12 14:24 UTC (permalink / raw)
  To: linux-man
  Cc: Alexis, groff, Ingo Schwarze, Dirk Gouders, Colin Watson,
	Ralph Corderoy, Mingye Wang, Kerin Millar, Sam James


[-- Attachment #1.1: Type: text/plain, Size: 7495 bytes --]

Hi all,

After the suggestion by Ralph of trying .lz, Sam's comment about .xz),
and Kerin's comment about tuning the compression parameters, I decided
to try out everything at once, so we can see the effects of the
alternatives.

TL;DR:  For manual pages, use uncompressed source, or gzip(1).
        Everything else is unreasonably slow.


Here go the numbers.  Below, will be a conclusion I get from them.
The following tests have been produced with man-db's man(1) built
from source, since Colin fixed an relevant bug a few days ago[1].
This improves performance considerably compared to the latest
release.


$ sudo make install-man prefix=/opt/local/man/bz2_1 -j LINK_PAGES=symlink Z=.bz2 BZIP2FLAGS=-1 | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/bz2_9 -j LINK_PAGES=symlink Z=.bz2 BZIP2FLAGS=-9 | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/bz2__ -j LINK_PAGES=symlink Z=.bz2 BZIP2FLAGS=   | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/gz__1 -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=-1  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/gz__9 -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=-9  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/gz___ -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/lz__1 -j LINK_PAGES=symlink Z=.lz  LZIPFLAGS=-1  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/lz__9 -j LINK_PAGES=symlink Z=.lz  LZIPFLAGS=-9  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/lz___ -j LINK_PAGES=symlink Z=.lz  LZIPFLAGS=    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/xz__1 -j LINK_PAGES=symlink Z=.xz  XZFLAGS=-1    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/xz__9 -j LINK_PAGES=symlink Z=.xz  XZFLAGS=-9    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/xz___ -j LINK_PAGES=symlink Z=.xz  XZFLAGS=      | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/man__ -j LINK_PAGES=symlink Z=                   | wc -l
2571
$ du -sh /opt/local/man/*
5.4M	/opt/local/man/bz2_1
5.4M	/opt/local/man/bz2_9
5.4M	/opt/local/man/bz2__
5.7M	/opt/local/man/gz__1
5.5M	/opt/local/man/gz__9
5.5M	/opt/local/man/gz___
5.5M	/opt/local/man/lz__1
5.4M	/opt/local/man/lz__9
5.4M	/opt/local/man/lz___
9.4M	/opt/local/man/man__
5.5M	/opt/local/man/xz__1
5.4M	/opt/local/man/xz__9
5.4M	/opt/local/man/xz___


$ export MANPATH=/opt/local/man/bz2_1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.15
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.22

$ export MANPATH=/opt/local/man/bz2_9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.15
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.23

$ export MANPATH=/opt/local/man/bz2__/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.23


$ export MANPATH=/opt/local/man/gz__1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.16

$ export MANPATH=/opt/local/man/gz__9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.20
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.17

$ export MANPATH=/opt/local/man/gz___/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.20
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.15


$ export MANPATH=/opt/local/man/lz__1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.95
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do lzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.40

$ export MANPATH=/opt/local/man/lz__9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.93
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do lzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.40

$ export MANPATH=/opt/local/man/lz___/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.94
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do lzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.40


$ export MANPATH=/opt/local/man/xz__1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.43
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do xz -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.24

$ export MANPATH=/opt/local/man/xz__9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 4.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do xz -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.55

$ export MANPATH=/opt/local/man/xz___/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 4.17
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do xz -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.55


$ export MANPATH=/opt/local/man/man__/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.55
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.01


Conclussions:

Any compression formats other than .gz are unreasonably slow.
I'd say either use .gz, or plain text, or prepare to
contribute code yourself to man-db to optimize for your favourite
compression format.

.bz2, .lz, and .xz have similar times, and tuning the compression
doesn't produce important changes in speed (except slightly for
.xz, but I don't see any advantage of using .xz).

Similarly, tuning the compression of .gz doesn't produce
important changes in speed.

Plain text has the advantage that you can use all the power of
Unix tools to search through the source code of the pages
instantaneously, without being restricted to what man(1) allows.


I hope this was useful.

Cheers,
Alex


[1]:  <https://lists.nongnu.org/archive/html/man-db-devel/2023-04/msg00000.html>

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
  2023-04-12 14:24                         ` Alejandro Colomar
@ 2023-04-12 18:52                           ` Mingye Wang
  2023-04-12 20:23                             ` Compressed man pages Alejandro Colomar
  2023-04-13 10:09                             ` Ralph Corderoy
  0 siblings, 2 replies; 73+ messages in thread
From: Mingye Wang @ 2023-04-12 18:52 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, Alexis, groff, Ingo Schwarze, Dirk Gouders,
	Colin Watson, Ralph Corderoy, Kerin Millar, Sam James

On Wed, Apr 12, 2023 at 10:24 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
> $ sudo make install-man prefix=/opt/local/man/xz___ -j LINK_PAGES=symlink Z=.xz  XZFLAGS=      | wc -l

Small nitpick here as Kerin's recommended pb=0 isn't actually used.
https://bugs.gentoo.org/169260#c19 (from Kerin) suggests that we might
get one-third more.

I'm having trouble getting the Makefile to behave on MSYS2, but it
does shrink a manual copy of man*/ totalling 7.2 M (probably because
`exit` and `nan` didn't get checked out by git -- case-insensitivity
issues) down to 2.8 M (both `du --apparent-size -sh`).

> .bz2, .lz, and .xz have similar times, and tuning the compression
> doesn't produce important changes in speed

Or size. This is to be expected, since man pages are really tiny
files, to the point that compressors can't see much context. [Zstd and
brotli each have a "dictionary mode" to deal with this, but (a) Zstd
dict-file requires an extra flag on decompress (b) nobody has brotli,
which has a default dictionary, installed.]

> .xz, but I don't see any advantage of using .xz).

Going for `xz -9` only unnecessarily inflates the dictionary size
beyond the file size and therefore the mem requirement. The dictionary
size at -0 is 256 KiB, already enough for almost every man page in
existence. (gz -9 is 32 KiB, if I recall correctly.)

> Conclussions:
>
> Any compression formats other than .gz are unreasonably slow.
> I'd say either use .gz, or plain text, or prepare to
> contribute code yourself to man-db to optimize for your favourite
> compression format.

For every compression format someone adds, there's going to be one
more optional dependency...

Cheers,
Mingye

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages
  2023-04-12 18:52                           ` Mingye Wang
@ 2023-04-12 20:23                             ` Alejandro Colomar
  2023-04-13 10:09                             ` Ralph Corderoy
  1 sibling, 0 replies; 73+ messages in thread
From: Alejandro Colomar @ 2023-04-12 20:23 UTC (permalink / raw)
  To: Mingye Wang
  Cc: linux-man, Alexis, groff, Ingo Schwarze, Dirk Gouders,
	Colin Watson, Ralph Corderoy, Kerin Millar, Sam James


[-- Attachment #1.1: Type: text/plain, Size: 2467 bytes --]

Hi Mingye,

On 4/12/23 20:52, Mingye Wang wrote:
> On Wed, Apr 12, 2023 at 10:24 PM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>> $ sudo make install-man prefix=/opt/local/man/xz___ -j LINK_PAGES=symlink Z=.xz  XZFLAGS=      | wc -l
> 
> Small nitpick here as Kerin's recommended pb=0 isn't actually used.
> https://bugs.gentoo.org/169260#c19 (from Kerin) suggests that we might
> get one-third more.

Hmm, might be interesting to try at some point,  but for now, since
man(1) is very unoptimized for non-gz, as we saw, I don't thinks it's
worth trying now.

> 
> I'm having trouble getting the Makefile to behave on MSYS2, but it
> does shrink a manual copy of man*/ totalling 7.2 M (probably because
> `exit` and `nan` didn't get checked out by git -- case-insensitivity
> issues) down to 2.8 M (both `du --apparent-size -sh`).

I didn't push the changes needed to use .lz and .xz.  Maybe that was
the issue?


* bc15c1d7b - Wed, 12 Apr 2023 16:54:01 +0200 (5 hours ago) (tar)
|           Makefile: tfix - Alejandro Colomar
* db5795531 - Wed, 12 Apr 2023 16:53:32 +0200 (5 hours ago)
|           *.mk: $Z: Support installing xz(1) compressed pages - Alejandro Colomar
* c2fffefba - Wed, 12 Apr 2023 16:46:16 +0200 (6 hours ago)
|           *.mk: Add *FLAGS variables for compression commands - Alejandro Colomar
* b220bc5b0 - Wed, 12 Apr 2023 14:43:00 +0200 (8 hours ago)
|           *.mk: $Z: Support installing lzip(1) compressed pages - Alejandro Colomar
* 69ad95988 - Wed, 12 Apr 2023 14:37:08 +0200 (8 hours ago)
|           *.mk: dist, dist-lz: Create tarballs compressed with lzip(1) - Alejandro Colomar
* 254fe38b2 - Tue, 11 Apr 2023 22:33:44 +0200 (24 hours ago) (tag: man-pages-6.05-a1)
|           dist.mk, version.mk: Create reproducible tarballs - Alejandro Colomar
| * c7e9f0ffe - Tue, 11 Apr 2023 22:13:00 +0200 (24 hours ago) (set)
|/            build-catman.mk: Use .set suffix for troff(1) output - Alejandro Colomar
* 121c8de01 - Tue, 11 Apr 2023 16:55:17 +0200 (29 hours ago) (HEAD -> master, korg/master)
|           fts.3: SYNOPSIS: Fix nullability - Alejandro Colomar


I'll push in a moment so you can try that (already done at the time of
sending this email).  Or did you see different issues about the Makefile?
Please report anything uncomfortable about it.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Compressed man pages
  2023-04-12 18:52                           ` Mingye Wang
  2023-04-12 20:23                             ` Compressed man pages Alejandro Colomar
@ 2023-04-13 10:09                             ` Ralph Corderoy
  1 sibling, 0 replies; 73+ messages in thread
From: Ralph Corderoy @ 2023-04-13 10:09 UTC (permalink / raw)
  To: linux-man, groff

Hi Mingye,

> [Zstd and brotli each have a "dictionary mode" to deal with this, but
> (a) Zstd dict-file requires an extra flag on decompress (b) nobody has
> brotli, which has a default dictionary, installed.]

I found brotli was already installed here.
So here's some numbers, just for the lists' info.

    $ ls | grep '\.gz$' | shuf -n10 |
    > while read -r f; do
    >     printf '%32s  %5d  %5d\n' "$f" `wc -c <"$f"` \
    >         `zcat "$f" | brotli | wc -c`
    > done |
    > awk '{print $0 "  " $3/$2}'
			postmap.1.gz   4125   3333  0.808
	       gnutls-cli-debug.1.gz   2627   2108  0.802436
			  cwebp.1.gz   5074   4106  0.809223
			findsmb.1.gz   1810   1474  0.814365
			ppmntsc.1.gz   1282    973  0.75897
			  libuv.1.gz  76363  62274  0.8155
			  xmlwf.1.gz   3486   2760  0.791738
			  users.1.gz    763    572  0.749672
		   gpgparsemail.1.gz    294    231  0.785714
	       perl561delta.1perl.gz  51764  42957  0.829862
    $

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: reformatting man pages at SIGWINCH
  2023-04-11  9:39                     ` Dirk Gouders
@ 2023-04-17  6:23                       ` G. Branden Robinson
  0 siblings, 0 replies; 73+ messages in thread
From: G. Branden Robinson @ 2023-04-17  6:23 UTC (permalink / raw)
  To: Dirk Gouders; +Cc: Alejandro Colomar, linux-man, groff

[-- Attachment #1: Type: text/plain, Size: 3234 bytes --]

[CC list trimmed of Texinfo people/lists]

At 2023-04-11T11:39:11+0200, Dirk Gouders wrote:
[I wrote:]
> > 4.  A habit has grown up among man(1) programs and pagers to call
> > for and support, respectively, a "blank line squeezing" feature: any
> > runs of more than one blank line are condensed to 1 blank line each.
> > In groff 1.23.0, this will no longer be necessary when continuously
> > rendering.  (Historically, this squeezing feature was used to
> > "tighten up" vertical space after the page header, prior to the
> > "NAME" section heading of the document.)  In my opinion, pager
> > programs should perform as few transformations as possible on the
> > output of grotty(1), the groff output driver that supports terminal
> > devices.  The long-time author and maintainer of less(1) does not
> > agree, so you have to call that program with its "-R" flag to view
> > grotty(1) output as groff intends it.  (To see what those intentions
> > are, format the document without paging it.)
> 
> Thank you for the detailled assessment.  Perhaps my misunderstanding
> is because I'm not a native speaker but which document should I format
> to see what those intentions are?

Just about any man page will do.  By "intentions" I mean things like
typeface changes and, in the forthcoming groff 1.23.0,[1] OSC 8 escape
sequences to encode hyperlinks.

For instance, if I want to look at groff_man(7)'s man page without the
intermediation of man(1) or a pager, I can do this.

$ man -w groff_man # to tell me where the document is installed
/usr/share/man/man7/groff_man.7.gz
$ zcat $(!!) | nroff -t -mandoc

I recommend the above as an early troubleshooting step with rendering
problems, though your terminal emulator may need a lot of scrollback
buffer, depending on the document.

(On rare occasions, a document may require a preprocessor other than
tbl(1), but the parts that use them generally won't produce good (eqn)
or any (pic) results on terminal devices.  "-t -mandoc" should suffice
for well over 95% of man pages.)

> > Since I flogged groff 1.23.0 three times in this email, I suppose I
> > should point people to where they can get the 1.23.0.rc3 release
> > candidate source archive.  Feedback would be appreciated.
> 
> Oh well, I didn't measure it but I spent quite some time to work on
> doc/lsp-help.1 and try to find a solution for that "nasty empty line"
> that appeared in of the tables that I use for the online help -- I was
> convinced it was my fault.

I am sure a lot of people thought that.  I was quite pleased to track
down and stomp that bug.

> Gentoo already has an ebuild for groff-1.23.0-rc3 and simply using
> this fixes that problem in the table.  So, from now on all my testing
> happens with groff-1.23.0-rc3 and I will report should I recognize
> problems.

Please do.  Bruno Haible has found a passel of portability problems to
non-GNU/Linux systems, and helped us to resolve several of them; I am
hopeful that 1.23.0 will be the most easily deployed groff in quite some
time.

Regards,
Branden

[1] We just tagged and put out 1.23.0.rc4 this past weekend.

https://lists.gnu.org/archive/html/groff/2023-04/msg00135.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2023-04-17  6:23 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
2023-03-25 20:47 ` Dirk Gouders
2023-04-04 23:45   ` Alejandro Colomar
2023-04-05  5:35     ` Eli Zaretskii
2023-04-06  1:10       ` Alejandro Colomar
2023-04-06  8:11         ` Eli Zaretskii
2023-04-06  8:48           ` Gavin Smith
2023-04-07 22:01           ` Alejandro Colomar
2023-04-08  7:05             ` Eli Zaretskii
2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-08 13:42                 ` Eli Zaretskii
2023-04-08 16:06                   ` Alejandro Colomar
2023-04-08 13:47                 ` Colin Watson
2023-04-08 15:42                   ` Alejandro Colomar
2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
2023-04-08 20:02                     ` Eli Zaretskii
2023-04-08 20:46                       ` Dirk Gouders
2023-04-08 21:53                         ` Alejandro Colomar
2023-04-08 22:33                           ` Alejandro Colomar
2023-04-09 10:28                       ` Ralph Corderoy
2023-04-08 20:31                     ` Ingo Schwarze
2023-04-08 20:59                       ` Dirk Gouders
2023-04-08 22:39                         ` Ingo Schwarze
2023-04-09  9:50                           ` Dirk Gouders
2023-04-09 10:35                             ` Dirk Gouders
     [not found]                 ` <87a5zhwntt.fsf@ada>
2023-04-09 12:05                   ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-09 12:17                     ` Alejandro Colomar
2023-04-09 18:55                       ` G. Branden Robinson
2023-04-09 12:29                     ` Colin Watson
2023-04-09 13:36                       ` Alejandro Colomar
2023-04-09 13:47                         ` Compressed man pages Ralph Corderoy
2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
2023-04-12 10:35                         ` Mingye Wang
2023-04-12 10:55                           ` Ralph Corderoy
2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
2023-04-12 14:24                         ` Alejandro Colomar
2023-04-12 18:52                           ` Mingye Wang
2023-04-12 20:23                             ` Compressed man pages Alejandro Colomar
2023-04-13 10:09                             ` Ralph Corderoy
2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
2023-04-07  6:36           ` Eli Zaretskii
2023-04-07 11:03             ` Gavin Smith
2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
2023-04-07 15:06               ` Eli Zaretskii
2023-04-07 15:08                 ` Larry McVoy
2023-04-07 17:07                 ` man page rendering speed Ingo Schwarze
2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-07 19:28                   ` Gavin Smith
2023-04-07 20:43                     ` Alejandro Colomar
2023-04-07 16:08               ` Colin Watson
2023-04-08 11:24               ` Ralph Corderoy
2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
2023-04-07 22:16               ` Alejandro Colomar
2023-04-10 19:05                 ` Dirk Gouders
2023-04-10 19:57                   ` Alejandro Colomar
2023-04-10 20:24                   ` G. Branden Robinson
2023-04-11  9:20                     ` Ralph Corderoy
2023-04-11  9:39                     ` Dirk Gouders
2023-04-17  6:23                       ` G. Branden Robinson
2023-04-08 11:40               ` Ralph Corderoy
2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
2023-04-05 14:19       ` Arsen Arsenović
2023-04-05 18:01         ` Dirk Gouders
2023-04-05 19:07           ` Eli Zaretskii
2023-04-05 19:56             ` Dirk Gouders
2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
2023-04-06  8:14               ` Eli Zaretskii
2023-04-06  8:56                 ` Gavin Smith
2023-04-07 13:14                 ` Arsen Arsenović
2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
2023-04-06  6:01         ` Dirk Gouders

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).