linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: groff@gnu.org, linux-man <linux-man@vger.kernel.org>
Cc: Ingo Schwarze <schwarze@usta.de>, Dirk Gouders <dirk@gouders.net>,
	Colin Watson <cjwatson@debian.org>, Sam James <sam@gentoo.org>,
	Ralph Corderoy <ralph@inputplus.co.uk>,
	Alexis <flexibeast@gmail.com>
Subject: Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Sun, 9 Apr 2023 14:17:57 +0200	[thread overview]
Message-ID: <53b0f991-7187-07ed-b2f8-4b6d8d7ffc3a@gmail.com> (raw)
In-Reply-To: <c6e9eb6a-a2ba-1de1-211f-bc6ccc3f7a9a@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 5931 bytes --]



On 4/9/23 14:05, Alejandro Colomar wrote:
> [Added back linux-man@, and people that commented on this (sub)topic]
> [Added Sam, I've got a question for you]
> 
> Hi Alexis,
> 
> Please keep (at least) linux-man@ in the loop.
> 
> On 4/9/23 08:44, Alexis wrote:
>>
>> As a related data point, i'd like to mention Gentoo's position on 
>> this, i.e. that man pages will continue to be bzip2-compressed by 
>> default:
>>
>> "app-text/mandoc bzip2 support"
>> https://bugs.gentoo.org/854267
>>
>> "Remove /usr/share/man from default inclusion list for docompress"
>> https://bugs.gentoo.org/836367
> 
> As Ingo said[1] 3 years ago, I don't think in this year it makes any
> sense to compress pages anymore.  However, since it's simple for me
> to add support for that, and it can be interesting for testing
> purposes, I added support for installing the Linux man-pages
> compressed with bzip2 using the Makefile[2].  While I was at it, I
> also added support for generating .tar.bz2 release tarballs[3].
> 
> With this, I was able to test a bit more than what I did yesterday:
> 
> 
> $ sudo rm -rf /opt/local/man/
> $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
> 2570
> $ du -sh /opt/local/man/*
> 5.4M	/opt/local/man/bz2
> 5.5M	/opt/local/man/gz_
> 9.4M	/opt/local/man/man
> 
> 
> $ export MANPATH=/opt/local/man/gz_/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.24
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.14
> 
> 
> $ export MANPATH=/opt/local/man/bz2/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 10.90
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.33
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.21
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.22
> 
> 
> $ export MANPATH=/opt/local/man/man/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> 
> Weird thing: today, the symlink bug in man(1) was reproducible in
> all kinds of pages, while yesterday it only reproduced in
> uncompressed ones.
> 
> Another weird thing: times today changed considerably for the
> find(1) pipelines (half of yesterday's).  It's not a thing of
> using dash(1), because I get similar times with bash(1) and its
> builtin time(1).
> 
> Important note: Sam, are you sure you want your pages compressed
> with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages?  I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.
> 
> Test results:
> 
> -  man-db's man(1) is slower with plain man(7) source than with .gz
>    pages for some misterious reason.
> 
> -  man-db's man(1) is turtle slow with .bz2 pages.
> 
> -  xargs -P0 doesn't affect significantly.  As Ralph said, this is
>    probably because the main issue with find(1) was having the
>    bottleneck in clone/fork+exec, and xargs(1) already solves that.
> 
>    Expanding the pipeline to use zcat(1) instead of zgrep(1)
>    improves a little bit more, because the zgrep(1) script is
>    probably quite inefficient, while zcat(1) is just a simple
>    wrapper around gzip(1).  We see that zgrep(1) is more
>    inefficient than running ourselves a few programs per file in a
>    pipeline!
> 
>    Calling gzip(1) directly is even faster, since we avoid invoking
>    a shell for such a small script.
> 
>    Expanding the bzgrep(1) pipeline into one using bzcat(1) has
>    similar improvements.  However, since bzcat(1) is a binary, we
>    don't get further improvement from calling bzip2(1) directly.

And I forgot the obvious one:

-  Using plain man(7) source is blazingly fast.  So much that I
   don't miss mdoc(7)'s indexability so much.

However, I must admit that I do miss mdoc(7)'s power sometimes.
The man_lsfunc() and man_lsvar() functions for finding function
prototypes and variable declarations in man(7) source would be
much simpler using mdoc(1), and I could even use mandoc(1) to
find such things.

> 
> 
> Cheers,
> Alex
> 
>>
>>
>> Alexis.
>>
> 
> 
> [1]:  <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2>
> 
> [2]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056>
> 
> [3]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5>
> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-04-09 12:18 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
2023-03-25 20:47 ` Dirk Gouders
2023-04-04 23:45   ` Alejandro Colomar
2023-04-05  5:35     ` Eli Zaretskii
2023-04-06  1:10       ` Alejandro Colomar
2023-04-06  8:11         ` Eli Zaretskii
2023-04-06  8:48           ` Gavin Smith
2023-04-07 22:01           ` Alejandro Colomar
2023-04-08  7:05             ` Eli Zaretskii
2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-08 13:42                 ` Eli Zaretskii
2023-04-08 16:06                   ` Alejandro Colomar
2023-04-08 13:47                 ` Colin Watson
2023-04-08 15:42                   ` Alejandro Colomar
2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
2023-04-08 20:02                     ` Eli Zaretskii
2023-04-08 20:46                       ` Dirk Gouders
2023-04-08 21:53                         ` Alejandro Colomar
2023-04-08 22:33                           ` Alejandro Colomar
2023-04-09 10:28                       ` Ralph Corderoy
2023-04-08 20:31                     ` Ingo Schwarze
2023-04-08 20:59                       ` Dirk Gouders
2023-04-08 22:39                         ` Ingo Schwarze
2023-04-09  9:50                           ` Dirk Gouders
2023-04-09 10:35                             ` Dirk Gouders
     [not found]                 ` <87a5zhwntt.fsf@ada>
2023-04-09 12:05                   ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-09 12:17                     ` Alejandro Colomar [this message]
2023-04-09 18:55                       ` G. Branden Robinson
2023-04-09 12:29                     ` Colin Watson
2023-04-09 13:36                       ` Alejandro Colomar
2023-04-09 13:47                         ` Compressed man pages Ralph Corderoy
2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
2023-04-12 10:35                         ` Mingye Wang
2023-04-12 10:55                           ` Ralph Corderoy
2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
2023-04-12 14:24                         ` Alejandro Colomar
2023-04-12 18:52                           ` Mingye Wang
2023-04-12 20:23                             ` Compressed man pages Alejandro Colomar
2023-04-13 10:09                             ` Ralph Corderoy
2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
2023-04-07  6:36           ` Eli Zaretskii
2023-04-07 11:03             ` Gavin Smith
2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
2023-04-07 15:06               ` Eli Zaretskii
2023-04-07 15:08                 ` Larry McVoy
2023-04-07 17:07                 ` man page rendering speed Ingo Schwarze
2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-07 19:28                   ` Gavin Smith
2023-04-07 20:43                     ` Alejandro Colomar
2023-04-07 16:08               ` Colin Watson
2023-04-08 11:24               ` Ralph Corderoy
2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
2023-04-07 22:16               ` Alejandro Colomar
2023-04-10 19:05                 ` Dirk Gouders
2023-04-10 19:57                   ` Alejandro Colomar
2023-04-10 20:24                   ` G. Branden Robinson
2023-04-11  9:20                     ` Ralph Corderoy
2023-04-11  9:39                     ` Dirk Gouders
2023-04-17  6:23                       ` G. Branden Robinson
2023-04-08 11:40               ` Ralph Corderoy
2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
2023-04-05 14:19       ` Arsen Arsenović
2023-04-05 18:01         ` Dirk Gouders
2023-04-05 19:07           ` Eli Zaretskii
2023-04-05 19:56             ` Dirk Gouders
2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
2023-04-06  8:14               ` Eli Zaretskii
2023-04-06  8:56                 ` Gavin Smith
2023-04-07 13:14                 ` Arsen Arsenović
2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
2023-04-06  6:01         ` Dirk Gouders

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53b0f991-7187-07ed-b2f8-4b6d8d7ffc3a@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=cjwatson@debian.org \
    --cc=dirk@gouders.net \
    --cc=flexibeast@gmail.com \
    --cc=groff@gnu.org \
    --cc=linux-man@vger.kernel.org \
    --cc=ralph@inputplus.co.uk \
    --cc=sam@gentoo.org \
    --cc=schwarze@usta.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).