From: Kerin Millar <kfm@plushkava.net>
To: Sam James <sam@gentoo.org>
Cc: Alejandro Colomar <alx.manpages@gmail.com>,
Alexis <flexibeast@gmail.com>,
groff@gnu.org, linux-man <linux-man@vger.kernel.org>,
Ingo Schwarze <schwarze@usta.de>, Dirk Gouders <dirk@gouders.net>,
Colin Watson <cjwatson@debian.org>,
Ralph Corderoy <ralph@inputplus.co.uk>
Subject: Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Wed, 12 Apr 2023 14:04:51 +0100 [thread overview]
Message-ID: <20230412140451.f03a6c19983694fe844bbb5a@plushkava.net> (raw)
In-Reply-To: <875ya1ecq1.fsf@gentoo.org>
On Wed, 12 Apr 2023 09:13:13 +0100
Sam James <sam@gentoo.org> wrote:
>
> Alejandro Colomar <alx.manpages@gmail.com> writes:
>
> > [[PGP Signed Part:Undecided]]
> > [Added back linux-man@, and people that commented on this (sub)topic]
> > [Added Sam, I've got a question for you]
> >
> > Hi Alexis,
> >
> > Please keep (at least) linux-man@ in the loop.
> >
> > On 4/9/23 08:44, Alexis wrote:
> >>
> >> As a related data point, i'd like to mention Gentoo's position on
> >> this, i.e. that man pages will continue to be bzip2-compressed by
> >> default:
> >>
> >> "app-text/mandoc bzip2 support"
> >> https://bugs.gentoo.org/854267
> >>
> >> "Remove /usr/share/man from default inclusion list for docompress"
> >> https://bugs.gentoo.org/836367
> >
> > As Ingo said[1] 3 years ago, I don't think in this year it makes any
> > sense to compress pages anymore. However, since it's simple for me
> > to add support for that, and it can be interesting for testing
> > purposes, I added support for installing the Linux man-pages
> > compressed with bzip2 using the Makefile[2]. While I was at it, I
> > also added support for generating .tar.bz2 release tarballs[3].
> >
> > With this, I was able to test a bit more than what I did yesterday:
> >
> >
> > $ sudo rm -rf /opt/local/man/
> > $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
> > 2570
> > $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
> > 2570
> > $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
> > 2570
> > $ du -sh /opt/local/man/*
> > 5.4M /opt/local/man/bz2
> > 5.5M /opt/local/man/gz_
> > 9.4M /opt/local/man/man
> >
> >
> > $ export MANPATH=/opt/local/man/gz_/share/man
> > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> > 37
> > 0.31
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.56
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.56
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.24
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.14
> >
> >
> > $ export MANPATH=/opt/local/man/bz2/share/man
> > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> > 37
> > 10.90
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.33
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 1.31
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.21
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> > 17
> > 1.22
> >
> >
> > $ export MANPATH=/opt/local/man/man/share/man
> > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> > 37
> > 0.56
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 0.01
> > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
> > 17
> > 0.01
> >
> > Weird thing: today, the symlink bug in man(1) was reproducible in
> > all kinds of pages, while yesterday it only reproduced in
> > uncompressed ones.
> >
> > Another weird thing: times today changed considerably for the
> > find(1) pipelines (half of yesterday's). It's not a thing of
> > using dash(1), because I get similar times with bash(1) and its
> > builtin time(1).
> >
> > Important note: Sam, are you sure you want your pages compressed
> > with bz2? Have you seen the 10 seconds it takes man-db's man(1) to
> > find a word in the pages? I suggest that at least you try to
> > reproduce these tests in your machine, and see if it's just me or
> > man-db's man(1) is pretty bad at non-gz pages.
> >
> > Test results:
> >
> > - man-db's man(1) is slower with plain man(7) source than with .gz
> > pages for some misterious reason.
> >
> > - man-db's man(1) is turtle slow with .bz2 pages.
>
> I started looking into changing to xz (or just.. not bz2, anyway),
> partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 /
> just interest locally (without having done measurements to see if it
> would be worth a global change) and the xz maintainer ended up
> recommending a different implementation to how man-db currently handles
> external utilties entirely (which I have a draft of).
>
> The xz author had some suggestions on the best parameters to use
> for man pages too which I need to look into and dig up...
>
> https://bugs.gentoo.org/169260 was an interesting discussion
> about our choice of bz2 (it came up a bit in
> https://bugs.gentoo.org/372653 too).
Oh, I remember this. Soon after #372653 was closed, I experimented further and found xz --lzma2=preset=6e,pb=0 to be more effective than bzip -9, both in terms of compression ratio and subsequent decompression performance, so I used those settings for a time. Nowadays, I would be more concerned with the time taken to render a man page than in reducing the footprint of the installed documentation.
>
> (I'll get back and read the rest of the thread later, but wanted
> to add this tidbit.)
>
> Definitely surprised to learn bz2 is *that* bad though!
>
> best,
> sam
--
Kerin Millar
next prev parent reply other threads:[~2023-04-12 13:05 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
2023-03-25 20:47 ` Dirk Gouders
2023-04-04 23:45 ` Alejandro Colomar
2023-04-05 5:35 ` Eli Zaretskii
2023-04-06 1:10 ` Alejandro Colomar
2023-04-06 8:11 ` Eli Zaretskii
2023-04-06 8:48 ` Gavin Smith
2023-04-07 22:01 ` Alejandro Colomar
2023-04-08 7:05 ` Eli Zaretskii
2023-04-08 13:02 ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-08 13:42 ` Eli Zaretskii
2023-04-08 16:06 ` Alejandro Colomar
2023-04-08 13:47 ` Colin Watson
2023-04-08 15:42 ` Alejandro Colomar
2023-04-08 19:48 ` Accessibility of man pages Dirk Gouders
2023-04-08 20:02 ` Eli Zaretskii
2023-04-08 20:46 ` Dirk Gouders
2023-04-08 21:53 ` Alejandro Colomar
2023-04-08 22:33 ` Alejandro Colomar
2023-04-09 10:28 ` Ralph Corderoy
2023-04-08 20:31 ` Ingo Schwarze
2023-04-08 20:59 ` Dirk Gouders
2023-04-08 22:39 ` Ingo Schwarze
2023-04-09 9:50 ` Dirk Gouders
2023-04-09 10:35 ` Dirk Gouders
[not found] ` <87a5zhwntt.fsf@ada>
2023-04-09 12:05 ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-09 12:17 ` Alejandro Colomar
2023-04-09 18:55 ` G. Branden Robinson
2023-04-09 12:29 ` Colin Watson
2023-04-09 13:36 ` Alejandro Colomar
2023-04-09 13:47 ` Compressed man pages Ralph Corderoy
2023-04-12 8:13 ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Sam James
2023-04-12 8:32 ` Compressed man pages Ralph Corderoy
2023-04-12 10:35 ` Mingye Wang
2023-04-12 10:55 ` Ralph Corderoy
2023-04-12 13:04 ` Kerin Millar [this message]
2023-04-12 14:24 ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-12 18:52 ` Mingye Wang
2023-04-12 20:23 ` Compressed man pages Alejandro Colomar
2023-04-13 10:09 ` Ralph Corderoy
2023-04-07 2:18 ` Playground pager lsp(1) G. Branden Robinson
2023-04-07 6:36 ` Eli Zaretskii
2023-04-07 11:03 ` Gavin Smith
2023-04-07 14:43 ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
2023-04-07 15:06 ` Eli Zaretskii
2023-04-07 15:08 ` Larry McVoy
2023-04-07 17:07 ` man page rendering speed Ingo Schwarze
2023-04-07 19:04 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-07 19:28 ` Gavin Smith
2023-04-07 20:43 ` Alejandro Colomar
2023-04-07 16:08 ` Colin Watson
2023-04-08 11:24 ` Ralph Corderoy
2023-04-07 21:26 ` reformatting man pages at SIGWINCH " Alejandro Colomar
2023-04-07 22:09 ` reformatting man pages at SIGWINCH Dirk Gouders
2023-04-07 22:16 ` Alejandro Colomar
2023-04-10 19:05 ` Dirk Gouders
2023-04-10 19:57 ` Alejandro Colomar
2023-04-10 20:24 ` G. Branden Robinson
2023-04-11 9:20 ` Ralph Corderoy
2023-04-11 9:39 ` Dirk Gouders
2023-04-17 6:23 ` G. Branden Robinson
2023-04-08 11:40 ` Ralph Corderoy
2023-04-05 10:02 ` Playground pager lsp(1) Dirk Gouders
2023-04-05 14:19 ` Arsen Arsenović
2023-04-05 18:01 ` Dirk Gouders
2023-04-05 19:07 ` Eli Zaretskii
2023-04-05 19:56 ` Dirk Gouders
2023-04-05 20:38 ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
2023-04-06 8:14 ` Eli Zaretskii
2023-04-06 8:56 ` Gavin Smith
2023-04-07 13:14 ` Arsen Arsenović
2023-04-06 1:31 ` Playground pager lsp(1) Alejandro Colomar
2023-04-06 6:01 ` Dirk Gouders
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230412140451.f03a6c19983694fe844bbb5a@plushkava.net \
--to=kfm@plushkava.net \
--cc=alx.manpages@gmail.com \
--cc=cjwatson@debian.org \
--cc=dirk@gouders.net \
--cc=flexibeast@gmail.com \
--cc=groff@gnu.org \
--cc=linux-man@vger.kernel.org \
--cc=ralph@inputplus.co.uk \
--cc=sam@gentoo.org \
--cc=schwarze@usta.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).