linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Donald Hunter <donald.hunter@gmail.com>
To: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	 Mauro Carvalho Chehab <mchehab@kernel.org>,
	 Vegard Nossum <vegard.nossum@oracle.com>,
	 Akira Yokosawa <akiyks@gmail.com>,
	 Jani Nikula <jani.nikula@linux.intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] docs: drop the version constraints for sphinx and dependencies
Date: Mon, 18 Mar 2024 16:44:55 +0000	[thread overview]
Message-ID: <m21q8732wo.fsf@gmail.com> (raw)
In-Reply-To: <20240301141800.30218-1-lukas.bulwahn@gmail.com> (Lukas Bulwahn's message of "Fri, 1 Mar 2024 15:18:00 +0100")

Lukas Bulwahn <lukas.bulwahn@gmail.com> writes:

> As discussed (see Links), there is some inertia to move to the recent
> Sphinx versions for the doc build environment.
>
> [...]
>
> Link: https://lore.kernel.org/linux-doc/874jf4m384.fsf@meer.lwn.net/
> Link: https://lore.kernel.org/linux-doc/20240226093854.47830-1-lukas.bulwahn@gmail.com/
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> ---
> v1 -> v2:
>   drop jinja2 as suggested by Vegard.
>   add tags from v1 review
>
>  Documentation/doc-guide/sphinx.rst    | 11 ++++++-----
>  Documentation/sphinx/requirements.txt |  7 ++-----
>  scripts/sphinx-pre-install            | 19 +++----------------
>  3 files changed, 11 insertions(+), 26 deletions(-)

Apologies if I am a little late to the party here - I am just catching
up with the changes on docs-next.

I went to install Sphinx 2.4.4 using requirements.txt for some doc work
and hit the upstream Sphinx dependency breakage. So I pulled docs-next
with the intention of sending a patch to requirements.txt with pinned
dependences. When I noticed that things have already moved on in
docs-next, I decided to spend some time investigating the performance
regression that has been present in Sphinx from 3.0.0 until now.

With Sphinx 2.4.4 I always get timings in this ballpark:

% time make htmldocs
...
real	4m5.417s
user	17m0.379s
sys	1m11.889s

With Sphinx 7.2.6 it's typically over 9 minutes:

% time make htmldocs
...
real	9m0.533s
user	15m38.397s
sys	1m0.907s

I collected profiling data using cProfile:

export srctree=`pwd`
export BUILDDIR=`pwd`/Documentation/output
python3 -m cProfile -o profile.dat ./sphinx_latest/bin/sphinx-build \
    -b html \
    -c ./Documentation \
    -d ./Documentation/output/.doctrees \
    -D version=6.8.0 -D release= \
    -D kerneldoc_srctree=. -D kerneldoc_bin=./scripts/kernel-doc \
    ./Documentation \
    ./Documentation/output

Here's some of the profiling output:

$ python3 -m pstats profile.dat
Welcome to the profile statistics browser.
profile.dat% sort tottime
profile.dat% stats 10
Fri Mar 15 17:09:39 2024    profile.dat

         3960680702 function calls (3696376639 primitive calls) in 1394.384 seconds

   Ordered by: internal time
   List reduced from 6733 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
770364892  165.102    0.000  165.102    0.000 sphinx/domains/c.py:153(__eq__)
   104124  163.968    0.002  544.788    0.005 sphinx/domains/c.py:1731(_find_named_symbols)
543888397  123.767    0.000  176.685    0.000 sphinx/domains/c.py:1679(children_recurse_anon)
     4292   74.081    0.017   74.081    0.017 {method 'poll' of 'select.poll' objects}
631233096   69.389    0.000  246.017    0.000 sphinx/domains/c.py:1746(candidates)
121406721/3359598   65.689    0.000   76.762    0.000 docutils/nodes.py:202(_fast_findall)
  3477076   64.387    0.000   65.758    0.000 sphinx/util/nodes.py:633(_copy_except__document)
544032973   52.950    0.000   52.950    0.000 sphinx/domains/c.py:156(is_anon)
79012597/3430   36.395    0.000   36.395    0.011 sphinx/domains/c.py:1656(clear_doc)
286882978   31.271    0.000   31.279    0.000 {built-in method builtins.isinstance}

profile.dat% callers c.py:153
   Ordered by: internal time
   List reduced from 6733 to 4 due to restriction <'c.py:153'>

Function                            was called by...
                                       ncalls  tottime  cumtime
sphinx/domains/c.py:153(__eq__)  <- 631153346  134.803  134.803  sphinx/domains/c.py:1731(_find_named_symbols)
                                       154878    0.041    0.041  sphinx/domains/c.py:2085(find_identifier)
                                    139056533   30.259   30.259  sphinx/domains/c.py:2116(direct_lookup)
                                          135    0.000    0.000  sphinx/util/cfamily.py:89(__eq__)

From that you can see there is a significant call amplification from
_find_named_symbols (100k calls) to __eq__ (630 million calls), plus
several other expensive functions. Looking at the code [1], you can see
why. It's doing a list walk to find matching symbols. When adding new
symbols it does an exhaustive walk to check for duplicates, so you get
worst-case performance, with ~13k symbols in a list during the doc
build.

I have an experimental fix that uses a dict for lookups. With the fix, I
consistently get times in the sub 5 minute range:

% time make htmldocs
...
real	4m27.085s
user	10m56.985s
sys	0m56.385s

I expect there are other speedups to be found. I will clean up my Sphinx
changes and share them on a GitHub branch (as well as push them
upstream) so that others can try them out.

For some reason, if I run sphinx-build manually with -j 12 (I have a 12
core machine) I get better performance than make htmldocs:

% sphinx-build -j 12 ...
...
real	3m56.074s
user	9m52.775s
sys	0m52.905s

I haven't had a chance to look at what makes the difference here, but
will investigate when I have time.

Cheers,
Donald.

[1] https://github.com/sphinx-doc/sphinx/blob/ff252861a7b295e8dd8085ea9f6ed85e085273fc/sphinx/domains/c/_symbol.py#L235-L283

> diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt
> index 5d47ed443949..5017f307c8a4 100644
> --- a/Documentation/sphinx/requirements.txt
> +++ b/Documentation/sphinx/requirements.txt
> @@ -1,6 +1,3 @@
> -# jinja2>=3.1 is not compatible with Sphinx<4.0
> -jinja2<3.1
> -# alabaster>=0.7.14 is not compatible with Sphinx<=3.3
> -alabaster<0.7.14
> -Sphinx==2.4.4
> +alabaster
> +Sphinx
>  pyyaml

  parent reply	other threads:[~2024-03-18 16:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 14:18 [PATCH v2] docs: drop the version constraints for sphinx and dependencies Lukas Bulwahn
2024-03-03 15:17 ` Jonathan Corbet
2024-03-18 16:44 ` Donald Hunter [this message]
2024-03-18 16:54   ` Vegard Nossum
2024-03-18 17:10     ` Donald Hunter
2024-03-19 17:59       ` Donald Hunter
2024-03-21 16:56         ` Donald Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m21q8732wo.fsf@gmail.com \
    --to=donald.hunter@gmail.com \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=jani.nikula@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=mchehab@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).