linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Donald Hunter <donald.hunter@gmail.com>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Mauro Carvalho Chehab <mchehab@kernel.org>,
	Akira Yokosawa <akiyks@gmail.com>,
	 Jani Nikula <jani.nikula@linux.intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] docs: drop the version constraints for sphinx and dependencies
Date: Tue, 19 Mar 2024 17:59:19 +0000	[thread overview]
Message-ID: <CAD4GDZyABi3wjKY4SUV804OyBDBaC=Ckz5b0GZ34JmCX8S6V_g@mail.gmail.com> (raw)
In-Reply-To: <CAD4GDZzjc6=-Gzw23tRgCDE7=AxsenXqpD+qnh6gj1+MYYU2fA@mail.gmail.com>

On Mon, 18 Mar 2024 at 17:10, Donald Hunter <donald.hunter@gmail.com> wrote:
>
> On Mon, 18 Mar 2024 at 16:54, Vegard Nossum <vegard.nossum@oracle.com> wrote:
> >
> > > % time make htmldocs
> > > ...
> > > real  9m0.533s
> > > user  15m38.397s
> > > sys   1m0.907s
> >
> > Was this running 'make cleandocs' (or otherwise removing the output
> > directory) in between? Sphinx is known to be slower if you already have
>
> Yes, times were after 'make cleandocs'.
>
> > an output directory with existing-but-obsolete data, I believe this is
> > the case even when switching from one Sphinx version to another. Akira
> > also wrote about the 7.x performance:
> >
> > https://lore.kernel.org/linux-doc/6e4b66fe-dbb3-4149-ac7e-8ae333d6fc9d@gmail.com/
>
> Having looked at the Sphinx code, it doesn't surprise me that
> incremental builds can have worse performance. There's probably going
> to be some speedups to be found when we go looking for them.

Following up on this, Symbol.clear_doc(docname) does a linear walk of
symbols which impacts incremental builds. The implementation of
clear_doc() looks broken in other ways which I think would further
worsen the incremental build performance.

Incremental builds also seem to do far more work than I'd expect. A
single modified .rst file is quick to build but a handful of modified
.rst files seems to trigger a far larger rebuild. That would be worth
investigating too.

> > > I have an experimental fix that uses a dict for lookups. With the fix, I
> > > consistently get times in the sub 5 minute range:
> >
> > Fantastic!

I pushed my performance changes to GitHub if you want to try them out:

https://github.com/donaldh/sphinx/tree/c-domain-speedup

I noticed that write performance (the second phase of sphinx-build) is
quite slow and doesn't really benefit from multi processing with -j
nn. It turns out that the bulk of the write work is done in the main
process and only the eventual writing is farmed out to forked
processes. I experimented with pushing more work out to the forked
processes (diff below) and it gives a significant speedup at the cost
of breaking index generation. It might be a  viable enhancement if
indexing can be fixed thru persisting the indices from the
sub-processes and merging them in the main process.

With the below patch, this is the build time I get:

% time make htmldocs SPHINXOPTS=-j12
...
real 1m58.988s
user 9m57.817s
sys 0m49.411s

Note that I get better performance with -j12 than -jauto which auto
detects 24 cores.

diff --git a/sphinx/builders/__init__.py b/sphinx/builders/__init__.py
index 6afb5d4cc44d..6b203799390e 100644
--- a/sphinx/builders/__init__.py
+++ b/sphinx/builders/__init__.py
@@ -581,9 +581,11 @@ class Builder:
                 self.write_doc(docname, doctree)

     def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None:
-        def write_process(docs: list[tuple[str, nodes.document]]) -> None:
+        def write_process(docs: list[str]) -> None:
             self.app.phase = BuildPhase.WRITING
-            for docname, doctree in docs:
+            for docname in docs:
+                doctree = self.env.get_and_resolve_doctree(docname, self)
+                self.write_doc_serialized(docname, doctree)
                 self.write_doc(docname, doctree)

         # warm up caches/compile templates using the first document
@@ -596,6 +598,7 @@ class Builder:

         tasks = ParallelTasks(nproc)
         chunks = make_chunks(docnames, nproc)
+        logger.info(f"_write_parallel: {len(chunks)} chunks")

         # create a status_iterator to step progressbar after writing a document
         # (see: ``on_chunk_done()`` function)
@@ -607,12 +610,7 @@ class Builder:

         self.app.phase = BuildPhase.RESOLVING
         for chunk in chunks:
-            arg = []
-            for docname in chunk:
-                doctree = self.env.get_and_resolve_doctree(docname, self)
-                self.write_doc_serialized(docname, doctree)
-                arg.append((docname, doctree))
-            tasks.add_task(write_process, arg, on_chunk_done)
+            tasks.add_task(write_process, chunk, on_chunk_done)

         # make sure all threads have finished
         tasks.join()

  reply	other threads:[~2024-03-19 17:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 14:18 [PATCH v2] docs: drop the version constraints for sphinx and dependencies Lukas Bulwahn
2024-03-03 15:17 ` Jonathan Corbet
2024-03-18 16:44 ` Donald Hunter
2024-03-18 16:54   ` Vegard Nossum
2024-03-18 17:10     ` Donald Hunter
2024-03-19 17:59       ` Donald Hunter [this message]
2024-03-21 16:56         ` Donald Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD4GDZyABi3wjKY4SUV804OyBDBaC=Ckz5b0GZ34JmCX8S6V_g@mail.gmail.com' \
    --to=donald.hunter@gmail.com \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=jani.nikula@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=mchehab@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).