From: Donald Hunter <donald.hunter@gmail.com>
To: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Vegard Nossum <vegard.nossum@oracle.com>,
Akira Yokosawa <akiyks@gmail.com>,
Jani Nikula <jani.nikula@linux.intel.com>,
Randy Dunlap <rdunlap@infradead.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] docs: drop the version constraints for sphinx and dependencies
Date: Mon, 18 Mar 2024 16:44:55 +0000 [thread overview]
Message-ID: <m21q8732wo.fsf@gmail.com> (raw)
In-Reply-To: <20240301141800.30218-1-lukas.bulwahn@gmail.com> (Lukas Bulwahn's message of "Fri, 1 Mar 2024 15:18:00 +0100")
Lukas Bulwahn <lukas.bulwahn@gmail.com> writes:
> As discussed (see Links), there is some inertia to move to the recent
> Sphinx versions for the doc build environment.
>
> [...]
>
> Link: https://lore.kernel.org/linux-doc/874jf4m384.fsf@meer.lwn.net/
> Link: https://lore.kernel.org/linux-doc/20240226093854.47830-1-lukas.bulwahn@gmail.com/
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> ---
> v1 -> v2:
> drop jinja2 as suggested by Vegard.
> add tags from v1 review
>
> Documentation/doc-guide/sphinx.rst | 11 ++++++-----
> Documentation/sphinx/requirements.txt | 7 ++-----
> scripts/sphinx-pre-install | 19 +++----------------
> 3 files changed, 11 insertions(+), 26 deletions(-)
Apologies if I am a little late to the party here - I am just catching
up with the changes on docs-next.
I went to install Sphinx 2.4.4 using requirements.txt for some doc work
and hit the upstream Sphinx dependency breakage. So I pulled docs-next
with the intention of sending a patch to requirements.txt with pinned
dependences. When I noticed that things have already moved on in
docs-next, I decided to spend some time investigating the performance
regression that has been present in Sphinx from 3.0.0 until now.
With Sphinx 2.4.4 I always get timings in this ballpark:
% time make htmldocs
...
real 4m5.417s
user 17m0.379s
sys 1m11.889s
With Sphinx 7.2.6 it's typically over 9 minutes:
% time make htmldocs
...
real 9m0.533s
user 15m38.397s
sys 1m0.907s
I collected profiling data using cProfile:
export srctree=`pwd`
export BUILDDIR=`pwd`/Documentation/output
python3 -m cProfile -o profile.dat ./sphinx_latest/bin/sphinx-build \
-b html \
-c ./Documentation \
-d ./Documentation/output/.doctrees \
-D version=6.8.0 -D release= \
-D kerneldoc_srctree=. -D kerneldoc_bin=./scripts/kernel-doc \
./Documentation \
./Documentation/output
Here's some of the profiling output:
$ python3 -m pstats profile.dat
Welcome to the profile statistics browser.
profile.dat% sort tottime
profile.dat% stats 10
Fri Mar 15 17:09:39 2024 profile.dat
3960680702 function calls (3696376639 primitive calls) in 1394.384 seconds
Ordered by: internal time
List reduced from 6733 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
770364892 165.102 0.000 165.102 0.000 sphinx/domains/c.py:153(__eq__)
104124 163.968 0.002 544.788 0.005 sphinx/domains/c.py:1731(_find_named_symbols)
543888397 123.767 0.000 176.685 0.000 sphinx/domains/c.py:1679(children_recurse_anon)
4292 74.081 0.017 74.081 0.017 {method 'poll' of 'select.poll' objects}
631233096 69.389 0.000 246.017 0.000 sphinx/domains/c.py:1746(candidates)
121406721/3359598 65.689 0.000 76.762 0.000 docutils/nodes.py:202(_fast_findall)
3477076 64.387 0.000 65.758 0.000 sphinx/util/nodes.py:633(_copy_except__document)
544032973 52.950 0.000 52.950 0.000 sphinx/domains/c.py:156(is_anon)
79012597/3430 36.395 0.000 36.395 0.011 sphinx/domains/c.py:1656(clear_doc)
286882978 31.271 0.000 31.279 0.000 {built-in method builtins.isinstance}
profile.dat% callers c.py:153
Ordered by: internal time
List reduced from 6733 to 4 due to restriction <'c.py:153'>
Function was called by...
ncalls tottime cumtime
sphinx/domains/c.py:153(__eq__) <- 631153346 134.803 134.803 sphinx/domains/c.py:1731(_find_named_symbols)
154878 0.041 0.041 sphinx/domains/c.py:2085(find_identifier)
139056533 30.259 30.259 sphinx/domains/c.py:2116(direct_lookup)
135 0.000 0.000 sphinx/util/cfamily.py:89(__eq__)
From that you can see there is a significant call amplification from
_find_named_symbols (100k calls) to __eq__ (630 million calls), plus
several other expensive functions. Looking at the code [1], you can see
why. It's doing a list walk to find matching symbols. When adding new
symbols it does an exhaustive walk to check for duplicates, so you get
worst-case performance, with ~13k symbols in a list during the doc
build.
I have an experimental fix that uses a dict for lookups. With the fix, I
consistently get times in the sub 5 minute range:
% time make htmldocs
...
real 4m27.085s
user 10m56.985s
sys 0m56.385s
I expect there are other speedups to be found. I will clean up my Sphinx
changes and share them on a GitHub branch (as well as push them
upstream) so that others can try them out.
For some reason, if I run sphinx-build manually with -j 12 (I have a 12
core machine) I get better performance than make htmldocs:
% sphinx-build -j 12 ...
...
real 3m56.074s
user 9m52.775s
sys 0m52.905s
I haven't had a chance to look at what makes the difference here, but
will investigate when I have time.
Cheers,
Donald.
[1] https://github.com/sphinx-doc/sphinx/blob/ff252861a7b295e8dd8085ea9f6ed85e085273fc/sphinx/domains/c/_symbol.py#L235-L283
> diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt
> index 5d47ed443949..5017f307c8a4 100644
> --- a/Documentation/sphinx/requirements.txt
> +++ b/Documentation/sphinx/requirements.txt
> @@ -1,6 +1,3 @@
> -# jinja2>=3.1 is not compatible with Sphinx<4.0
> -jinja2<3.1
> -# alabaster>=0.7.14 is not compatible with Sphinx<=3.3
> -alabaster<0.7.14
> -Sphinx==2.4.4
> +alabaster
> +Sphinx
> pyyaml
next prev parent reply other threads:[~2024-03-18 16:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 14:18 [PATCH v2] docs: drop the version constraints for sphinx and dependencies Lukas Bulwahn
2024-03-03 15:17 ` Jonathan Corbet
2024-03-18 16:44 ` Donald Hunter [this message]
2024-03-18 16:54 ` Vegard Nossum
2024-03-18 17:10 ` Donald Hunter
2024-03-19 17:59 ` Donald Hunter
2024-03-21 16:56 ` Donald Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m21q8732wo.fsf@gmail.com \
--to=donald.hunter@gmail.com \
--cc=akiyks@gmail.com \
--cc=corbet@lwn.net \
--cc=jani.nikula@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lukas.bulwahn@gmail.com \
--cc=mchehab@kernel.org \
--cc=rdunlap@infradead.org \
--cc=vegard.nossum@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).