All of lore.kernel.org
 help / color / mirror / Atom feed
* man -K finds repeated entries for each symlink page
@ 2023-04-09 13:58 Alejandro Colomar
  2023-04-09 14:55 ` Colin Watson
  0 siblings, 1 reply; 3+ messages in thread
From: Alejandro Colomar @ 2023-04-09 13:58 UTC (permalink / raw)
  To: Colin Watson, man-db-devel; +Cc: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1359 bytes --]

Hi Colin,

For a reproducer, run the following commands from a clone of the Linux
man-pages repo (although you should be able to reproduce in any Debian
installation, I guess).

$ sudo rm -r /opt/local/man/
$ sudo make install-man2 prefix=/opt/local/man LINK_PAGES=symlink -j | wc -l
503
$ export MANPATH=/opt/local/man/share/man
$ man -Kaw RLIMIT_NOFILE | sort | uniq -c
      3 /opt/local/man/share/man/man2/dup.2
      2 /opt/local/man/share/man/man2/fcntl.2
      5 /opt/local/man/share/man/man2/getrlimit.2
      3 /opt/local/man/share/man/man2/open.2
      1 /opt/local/man/share/man/man2/pidfd_getfd.2
      1 /opt/local/man/share/man/man2/pidfd_open.2
      2 /opt/local/man/share/man/man2/poll.2
      1 /opt/local/man/share/man/man2/seccomp_unotify.2
      4 /opt/local/man/share/man/man2/select.2

Those numbers coincide with 1+ the number of symlinks for each of the
pages.  For example, see select.2:

$ find /opt/local/man/share/man -type l | xargs readlink | grep -c /select.2
3

man(1) found the original page, plus the 3 symlinks.

The solution should be that man(1) ignores link pages for -K, since
looking at the source code of one page won't change the results from
a different page.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: man -K finds repeated entries for each symlink page
  2023-04-09 13:58 man -K finds repeated entries for each symlink page Alejandro Colomar
@ 2023-04-09 14:55 ` Colin Watson
  2023-04-09 15:20   ` Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) Alejandro Colomar
  0 siblings, 1 reply; 3+ messages in thread
From: Colin Watson @ 2023-04-09 14:55 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: man-db-devel, linux-man

On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>       3 /opt/local/man/share/man/man2/dup.2
>       2 /opt/local/man/share/man/man2/fcntl.2
>       5 /opt/local/man/share/man/man2/getrlimit.2
>       3 /opt/local/man/share/man/man2/open.2
>       1 /opt/local/man/share/man/man2/pidfd_getfd.2
>       1 /opt/local/man/share/man/man2/pidfd_open.2
>       2 /opt/local/man/share/man/man2/poll.2
>       1 /opt/local/man/share/man/man2/seccomp_unotify.2
>       4 /opt/local/man/share/man/man2/select.2
> 
> Those numbers coincide with 1+ the number of symlinks for each of the
> pages.  For example, see select.2:

Thanks for the report.  Fixed by this commit:

  https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page)
  2023-04-09 14:55 ` Colin Watson
@ 2023-04-09 15:20   ` Alejandro Colomar
  0 siblings, 0 replies; 3+ messages in thread
From: Alejandro Colomar @ 2023-04-09 15:20 UTC (permalink / raw)
  To: Colin Watson; +Cc: man-db-devel, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 2418 bytes --]

Hi Colin,

On 4/9/23 16:55, Colin Watson wrote:
> On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
>> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>>       3 /opt/local/man/share/man/man2/dup.2
>>       2 /opt/local/man/share/man/man2/fcntl.2
>>       5 /opt/local/man/share/man/man2/getrlimit.2
>>       3 /opt/local/man/share/man/man2/open.2
>>       1 /opt/local/man/share/man/man2/pidfd_getfd.2
>>       1 /opt/local/man/share/man/man2/pidfd_open.2
>>       2 /opt/local/man/share/man/man2/poll.2
>>       1 /opt/local/man/share/man/man2/seccomp_unotify.2
>>       4 /opt/local/man/share/man/man2/select.2
>>
>> Those numbers coincide with 1+ the number of symlinks for each of the
>> pages.  For example, see select.2:
> 
> Thanks for the report.  Fixed by this commit:
> 
>   https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993

Heh, that was fast :)

As a side effect of not reading too many files, performance improved
considerably for bzip2 (~3x), and for gzip (~2x).

I built man from source (tweaking with -O3, so I cheated a little bit),
and here are the results:


$ export MANPATH=/tmp/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


$ export MANPATH=/tmp/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
3.05
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.20


$ export MANPATH=/tmp/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.52
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01


Please consider this a new bug report, about performance.  See the last
block of commands.  man(1) takes half a second, while my loop with
find(1) and grep(1) is almost non-measurable.  I could understand that
man(1) has some overhead, but 52x feels like there's some serious
performance problem; especially when man(1) is faster reading
uncompressed pages (see at the top).


Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-04-09 15:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-09 13:58 man -K finds repeated entries for each symlink page Alejandro Colomar
2023-04-09 14:55 ` Colin Watson
2023-04-09 15:20   ` Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) Alejandro Colomar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.