linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: linux-man <linux-man@vger.kernel.org>
Cc: Alexis <flexibeast@gmail.com>,
	groff@gnu.org, Ingo Schwarze <schwarze@usta.de>,
	Dirk Gouders <dirk@gouders.net>,
	Colin Watson <cjwatson@debian.org>,
	Ralph Corderoy <ralph@inputplus.co.uk>,
	Mingye Wang <arthur200126@gmail.com>,
	Kerin Millar <kfm@plushkava.net>, Sam James <sam@gentoo.org>
Subject: Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Wed, 12 Apr 2023 16:24:42 +0200	[thread overview]
Message-ID: <44768e26-ed92-0562-2318-68fec781126b@gmail.com> (raw)
In-Reply-To: <20230412140451.f03a6c19983694fe844bbb5a@plushkava.net>


[-- Attachment #1.1: Type: text/plain, Size: 7495 bytes --]

Hi all,

After the suggestion by Ralph of trying .lz, Sam's comment about .xz),
and Kerin's comment about tuning the compression parameters, I decided
to try out everything at once, so we can see the effects of the
alternatives.

TL;DR:  For manual pages, use uncompressed source, or gzip(1).
        Everything else is unreasonably slow.


Here go the numbers.  Below, will be a conclusion I get from them.
The following tests have been produced with man-db's man(1) built
from source, since Colin fixed an relevant bug a few days ago[1].
This improves performance considerably compared to the latest
release.


$ sudo make install-man prefix=/opt/local/man/bz2_1 -j LINK_PAGES=symlink Z=.bz2 BZIP2FLAGS=-1 | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/bz2_9 -j LINK_PAGES=symlink Z=.bz2 BZIP2FLAGS=-9 | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/bz2__ -j LINK_PAGES=symlink Z=.bz2 BZIP2FLAGS=   | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/gz__1 -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=-1  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/gz__9 -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=-9  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/gz___ -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/lz__1 -j LINK_PAGES=symlink Z=.lz  LZIPFLAGS=-1  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/lz__9 -j LINK_PAGES=symlink Z=.lz  LZIPFLAGS=-9  | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/lz___ -j LINK_PAGES=symlink Z=.lz  LZIPFLAGS=    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/xz__1 -j LINK_PAGES=symlink Z=.xz  XZFLAGS=-1    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/xz__9 -j LINK_PAGES=symlink Z=.xz  XZFLAGS=-9    | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/xz___ -j LINK_PAGES=symlink Z=.xz  XZFLAGS=      | wc -l
2571
$ sudo make install-man prefix=/opt/local/man/man__ -j LINK_PAGES=symlink Z=                   | wc -l
2571
$ du -sh /opt/local/man/*
5.4M	/opt/local/man/bz2_1
5.4M	/opt/local/man/bz2_9
5.4M	/opt/local/man/bz2__
5.7M	/opt/local/man/gz__1
5.5M	/opt/local/man/gz__9
5.5M	/opt/local/man/gz___
5.5M	/opt/local/man/lz__1
5.4M	/opt/local/man/lz__9
5.4M	/opt/local/man/lz___
9.4M	/opt/local/man/man__
5.5M	/opt/local/man/xz__1
5.4M	/opt/local/man/xz__9
5.4M	/opt/local/man/xz___


$ export MANPATH=/opt/local/man/bz2_1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.15
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.22

$ export MANPATH=/opt/local/man/bz2_9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.15
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.23

$ export MANPATH=/opt/local/man/bz2__/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.23


$ export MANPATH=/opt/local/man/gz__1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.16

$ export MANPATH=/opt/local/man/gz__9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.20
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.17

$ export MANPATH=/opt/local/man/gz___/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.20
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.15


$ export MANPATH=/opt/local/man/lz__1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.95
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do lzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.40

$ export MANPATH=/opt/local/man/lz__9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.93
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do lzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.40

$ export MANPATH=/opt/local/man/lz___/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.94
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do lzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.40


$ export MANPATH=/opt/local/man/xz__1/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 3.43
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do xz -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.24

$ export MANPATH=/opt/local/man/xz__9/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 4.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do xz -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.55

$ export MANPATH=/opt/local/man/xz___/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 4.17
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do xz -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
17; 1.55


$ export MANPATH=/opt/local/man/man__/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.55
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
17; 0.01


Conclussions:

Any compression formats other than .gz are unreasonably slow.
I'd say either use .gz, or plain text, or prepare to
contribute code yourself to man-db to optimize for your favourite
compression format.

.bz2, .lz, and .xz have similar times, and tuning the compression
doesn't produce important changes in speed (except slightly for
.xz, but I don't see any advantage of using .xz).

Similarly, tuning the compression of .gz doesn't produce
important changes in speed.

Plain text has the advantage that you can use all the power of
Unix tools to search through the source code of the pages
instantaneously, without being restricted to what man(1) allows.


I hope this was useful.

Cheers,
Alex


[1]:  <https://lists.nongnu.org/archive/html/man-db-devel/2023-04/msg00000.html>

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-04-12 14:24 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
2023-03-25 20:47 ` Dirk Gouders
2023-04-04 23:45   ` Alejandro Colomar
2023-04-05  5:35     ` Eli Zaretskii
2023-04-06  1:10       ` Alejandro Colomar
2023-04-06  8:11         ` Eli Zaretskii
2023-04-06  8:48           ` Gavin Smith
2023-04-07 22:01           ` Alejandro Colomar
2023-04-08  7:05             ` Eli Zaretskii
2023-04-08 13:02               ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-08 13:42                 ` Eli Zaretskii
2023-04-08 16:06                   ` Alejandro Colomar
2023-04-08 13:47                 ` Colin Watson
2023-04-08 15:42                   ` Alejandro Colomar
2023-04-08 19:48                   ` Accessibility of man pages Dirk Gouders
2023-04-08 20:02                     ` Eli Zaretskii
2023-04-08 20:46                       ` Dirk Gouders
2023-04-08 21:53                         ` Alejandro Colomar
2023-04-08 22:33                           ` Alejandro Colomar
2023-04-09 10:28                       ` Ralph Corderoy
2023-04-08 20:31                     ` Ingo Schwarze
2023-04-08 20:59                       ` Dirk Gouders
2023-04-08 22:39                         ` Ingo Schwarze
2023-04-09  9:50                           ` Dirk Gouders
2023-04-09 10:35                             ` Dirk Gouders
     [not found]                 ` <87a5zhwntt.fsf@ada>
2023-04-09 12:05                   ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-09 12:17                     ` Alejandro Colomar
2023-04-09 18:55                       ` G. Branden Robinson
2023-04-09 12:29                     ` Colin Watson
2023-04-09 13:36                       ` Alejandro Colomar
2023-04-09 13:47                         ` Compressed man pages Ralph Corderoy
2023-04-12  8:13                     ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
2023-04-12  8:32                       ` Compressed man pages Ralph Corderoy
2023-04-12 10:35                         ` Mingye Wang
2023-04-12 10:55                           ` Ralph Corderoy
2023-04-12 13:04                       ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
2023-04-12 14:24                         ` Alejandro Colomar [this message]
2023-04-12 18:52                           ` Mingye Wang
2023-04-12 20:23                             ` Compressed man pages Alejandro Colomar
2023-04-13 10:09                             ` Ralph Corderoy
2023-04-07  2:18         ` Playground pager lsp(1) G. Branden Robinson
2023-04-07  6:36           ` Eli Zaretskii
2023-04-07 11:03             ` Gavin Smith
2023-04-07 14:43             ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
2023-04-07 15:06               ` Eli Zaretskii
2023-04-07 15:08                 ` Larry McVoy
2023-04-07 17:07                 ` man page rendering speed Ingo Schwarze
2023-04-07 19:04                 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-07 19:28                   ` Gavin Smith
2023-04-07 20:43                     ` Alejandro Colomar
2023-04-07 16:08               ` Colin Watson
2023-04-08 11:24               ` Ralph Corderoy
2023-04-07 21:26           ` reformatting man pages at SIGWINCH " Alejandro Colomar
2023-04-07 22:09             ` reformatting man pages at SIGWINCH Dirk Gouders
2023-04-07 22:16               ` Alejandro Colomar
2023-04-10 19:05                 ` Dirk Gouders
2023-04-10 19:57                   ` Alejandro Colomar
2023-04-10 20:24                   ` G. Branden Robinson
2023-04-11  9:20                     ` Ralph Corderoy
2023-04-11  9:39                     ` Dirk Gouders
2023-04-17  6:23                       ` G. Branden Robinson
2023-04-08 11:40               ` Ralph Corderoy
2023-04-05 10:02     ` Playground pager lsp(1) Dirk Gouders
2023-04-05 14:19       ` Arsen Arsenović
2023-04-05 18:01         ` Dirk Gouders
2023-04-05 19:07           ` Eli Zaretskii
2023-04-05 19:56             ` Dirk Gouders
2023-04-05 20:38             ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
2023-04-06  8:14               ` Eli Zaretskii
2023-04-06  8:56                 ` Gavin Smith
2023-04-07 13:14                 ` Arsen Arsenović
2023-04-06  1:31       ` Playground pager lsp(1) Alejandro Colomar
2023-04-06  6:01         ` Dirk Gouders

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44768e26-ed92-0562-2318-68fec781126b@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=arthur200126@gmail.com \
    --cc=cjwatson@debian.org \
    --cc=dirk@gouders.net \
    --cc=flexibeast@gmail.com \
    --cc=groff@gnu.org \
    --cc=kfm@plushkava.net \
    --cc=linux-man@vger.kernel.org \
    --cc=ralph@inputplus.co.uk \
    --cc=sam@gentoo.org \
    --cc=schwarze@usta.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).