linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Alexis <flexibeast@gmail.com>,
	groff@gnu.org, linux-man <linux-man@vger.kernel.org>
Cc: Ingo Schwarze <schwarze@usta.de>, Dirk Gouders <dirk@gouders.net>,
	Colin Watson <cjwatson@debian.org>, Sam James <sam@gentoo.org>,
	Ralph Corderoy <ralph@inputplus.co.uk>
Subject: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Sun, 9 Apr 2023 14:05:08 +0200	[thread overview]
Message-ID: <c6e9eb6a-a2ba-1de1-211f-bc6ccc3f7a9a@gmail.com> (raw)
In-Reply-To: <87a5zhwntt.fsf@ada>


[-- Attachment #1.1: Type: text/plain, Size: 5185 bytes --]

[Added back linux-man@, and people that commented on this (sub)topic]
[Added Sam, I've got a question for you]

Hi Alexis,

Please keep (at least) linux-man@ in the loop.

On 4/9/23 08:44, Alexis wrote:
> 
> As a related data point, i'd like to mention Gentoo's position on 
> this, i.e. that man pages will continue to be bzip2-compressed by 
> default:
> 
> "app-text/mandoc bzip2 support"
> https://bugs.gentoo.org/854267
> 
> "Remove /usr/share/man from default inclusion list for docompress"
> https://bugs.gentoo.org/836367

As Ingo said[1] 3 years ago, I don't think in this year it makes any
sense to compress pages anymore.  However, since it's simple for me
to add support for that, and it can be interesting for testing
purposes, I added support for installing the Linux man-pages
compressed with bzip2 using the Makefile[2].  While I was at it, I
also added support for generating .tar.bz2 release tarballs[3].

With this, I was able to test a bit more than what I did yesterday:


$ sudo rm -rf /opt/local/man/
$ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
2570
$ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
2570
$ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
2570
$ du -sh /opt/local/man/*
5.4M	/opt/local/man/bz2
5.5M	/opt/local/man/gz_
9.4M	/opt/local/man/man


$ export MANPATH=/opt/local/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.31
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
17
1.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
17
1.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.24
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


$ export MANPATH=/opt/local/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
10.90
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
17
1.33
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
17
1.31
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.22


$ export MANPATH=/opt/local/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
17
0.01

Weird thing: today, the symlink bug in man(1) was reproducible in
all kinds of pages, while yesterday it only reproduced in
uncompressed ones.

Another weird thing: times today changed considerably for the
find(1) pipelines (half of yesterday's).  It's not a thing of
using dash(1), because I get similar times with bash(1) and its
builtin time(1).

Important note: Sam, are you sure you want your pages compressed
with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
find a word in the pages?  I suggest that at least you try to
reproduce these tests in your machine, and see if it's just me or
man-db's man(1) is pretty bad at non-gz pages.

Test results:

-  man-db's man(1) is slower with plain man(7) source than with .gz
   pages for some misterious reason.

-  man-db's man(1) is turtle slow with .bz2 pages.

-  xargs -P0 doesn't affect significantly.  As Ralph said, this is
   probably because the main issue with find(1) was having the
   bottleneck in clone/fork+exec, and xargs(1) already solves that.

   Expanding the pipeline to use zcat(1) instead of zgrep(1)
   improves a little bit more, because the zgrep(1) script is
   probably quite inefficient, while zcat(1) is just a simple
   wrapper around gzip(1).  We see that zgrep(1) is more
   inefficient than running ourselves a few programs per file in a
   pipeline!

   Calling gzip(1) directly is even faster, since we avoid invoking
   a shell for such a small script.

   Expanding the bzgrep(1) pipeline into one using bzcat(1) has
   similar improvements.  However, since bzcat(1) is a binary, we
   don't get further improvement from calling bzip2(1) directly.


Cheers,
Alex

> 
> 
> Alexis.
> 


[1]:  <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2>

[2]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056>

[3]:  <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5>

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-04-09 12:05 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
2023-03-25 20:47 ` Dirk Gouders
2023-04-04 23:45   ` Alejandro Colomar
2023-04-05  5:35     ` Eli Zaretskii
2023-04-06  1:10       ` Alejandro Colomar
2023-04-06  8:11         ` Eli Zaretskii
2023-04-06  8:48           ` Gavin Smith
2023-04-07 22:01           ` Alejandro Colomar
2023-04-08  7:05             ` Eli Zaretskii
2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-08 13:42                 ` Eli Zaretskii
2023-04-08 16:06                   ` Alejandro Colomar
2023-04-08 13:47                 ` Colin Watson
2023-04-08 15:42                   ` Alejandro Colomar
2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
2023-04-08 20:02                     ` Eli Zaretskii
2023-04-08 20:46                       ` Dirk Gouders
2023-04-08 21:53                         ` Alejandro Colomar
2023-04-08 22:33                           ` Alejandro Colomar
2023-04-09 10:28                       ` Ralph Corderoy
2023-04-08 20:31                     ` Ingo Schwarze
2023-04-08 20:59                       ` Dirk Gouders
2023-04-08 22:39                         ` Ingo Schwarze
2023-04-09  9:50                           ` Dirk Gouders
2023-04-09 10:35                             ` Dirk Gouders
     [not found]                 ` <87a5zhwntt.fsf@ada>
2023-04-09 12:05                   ` Alejandro Colomar [this message]
2023-04-09 12:17                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-09 18:55                       ` G. Branden Robinson
2023-04-09 12:29                     ` Colin Watson
2023-04-09 13:36                       ` Alejandro Colomar
2023-04-09 13:47                         ` Compressed man pages Ralph Corderoy
2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
2023-04-12 10:35                         ` Mingye Wang
2023-04-12 10:55                           ` Ralph Corderoy
2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
2023-04-12 14:24                         ` Alejandro Colomar
2023-04-12 18:52                           ` Mingye Wang
2023-04-12 20:23                             ` Compressed man pages Alejandro Colomar
2023-04-13 10:09                             ` Ralph Corderoy
2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
2023-04-07  6:36           ` Eli Zaretskii
2023-04-07 11:03             ` Gavin Smith
2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
2023-04-07 15:06               ` Eli Zaretskii
2023-04-07 15:08                 ` Larry McVoy
2023-04-07 17:07                 ` man page rendering speed Ingo Schwarze
2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-07 19:28                   ` Gavin Smith
2023-04-07 20:43                     ` Alejandro Colomar
2023-04-07 16:08               ` Colin Watson
2023-04-08 11:24               ` Ralph Corderoy
2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
2023-04-07 22:16               ` Alejandro Colomar
2023-04-10 19:05                 ` Dirk Gouders
2023-04-10 19:57                   ` Alejandro Colomar
2023-04-10 20:24                   ` G. Branden Robinson
2023-04-11  9:20                     ` Ralph Corderoy
2023-04-11  9:39                     ` Dirk Gouders
2023-04-17  6:23                       ` G. Branden Robinson
2023-04-08 11:40               ` Ralph Corderoy
2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
2023-04-05 14:19       ` Arsen Arsenović
2023-04-05 18:01         ` Dirk Gouders
2023-04-05 19:07           ` Eli Zaretskii
2023-04-05 19:56             ` Dirk Gouders
2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
2023-04-06  8:14               ` Eli Zaretskii
2023-04-06  8:56                 ` Gavin Smith
2023-04-07 13:14                 ` Arsen Arsenović
2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
2023-04-06  6:01         ` Dirk Gouders

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6e9eb6a-a2ba-1de1-211f-bc6ccc3f7a9a@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=cjwatson@debian.org \
    --cc=dirk@gouders.net \
    --cc=flexibeast@gmail.com \
    --cc=groff@gnu.org \
    --cc=linux-man@vger.kernel.org \
    --cc=ralph@inputplus.co.uk \
    --cc=sam@gentoo.org \
    --cc=schwarze@usta.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).