ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: Jani Nikula <jani.nikula@intel.com>,
	ksummit-discuss@lists.linuxfoundation.org,
	ksummit@lists.linux.dev,
	Markus Heiser <markus.heiser@darmarit.de>
Subject: Re: [Ksummit-discuss] [TECH TOPIC] What kernel documentation could be
Date: Sat, 25 Jun 2022 10:10:29 +0100	[thread overview]
Message-ID: <20220625101029.67f14c4c@sal.lan> (raw)
In-Reply-To: <87tu891xfv.fsf@meer.lwn.net>

Em Fri, 24 Jun 2022 16:57:56 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Mauro Carvalho Chehab <mchehab@kernel.org> writes:
> 
> > Em Thu, 23 Jun 2022 07:40:45 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:  
> >> I've actually, in a spare moment or two, been doing some profiling of
> >> the kernel docs build and trying to track down the sources of the
> >> slowness.  I am thinking that nearly 700 *million* calls to the iterator
> >> for the C-domain symbol list might have something to do with it...  
> >
> > Wow, that's a lot!  
> 
> Just for the curious ...
> 
> The Sphinx C domain code makes this elaborate tree data structure out of
> all the identifier names it has picked up - including the names of
> function parameters when they are provided in prototypes for some
> reason.  This is the structure that is consulted whenever we want to
> resolve a cross-reference, which is fairly often with automarkup
> enabled.
> 
> How does it work?  The algorithm is, as far as I can tell:
> 
>  - Serialize the whole tree in a sort of breadth-first traversal
> 
>  - Make a linear pass through the *entire* list, comparing the
>    identifier name with each entry and accumulating a list of all the
>    matches found.
> 
>  - Return the first match and throw away the rest.
> 
> The result is O(n^2) behavior and, in the kernel docs build, n gets to
> be fairly large.
> 
> I went in with a crowbar and sledgehammer and replaced some of the list
> searches with a dict lookup, resulting in about a 20% speedup in the
> full htmldocs build with Sphinx 5.0.2. 

Great to have a 20% speedup here! Yet, I would expect an even better
performance improvement by replacing 700 million calls from linear search
to dict, as it would change from O(n^2) to O(1) at the average case [1],
but the speedup gain actually depends on the actual number of symbols we
have defined.

[1] According with:
	https://wiki.python.org/moin/TimeComplexity#dict
    Worse case is O(n) - when collisions are common which is unlikely.

> A couple of automarkup
> optimizations result in about a 27% speedup overall.  And I didn't break
> any cross-references.

Great!

> I think it's possible to do better, but this is a start.  I'll post the
> automarkup changes as soon as I can, but I need to verify them across
> the whole range of Sphinx versions. 

> For the Sphinx stuff, I'll need to
> turn my hatchetwork into something presentable and figure out how to
> contribute it upstream, but it seems worth the effort.

Indeed.

Regards,
Mauro

  reply	other threads:[~2022-06-25  9:10 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17 20:57 [TECH TOPIC] What kernel documentation could be Jonathan Corbet
2022-06-17 20:57 ` [Ksummit-discuss] " Jonathan Corbet
2022-06-17 21:48 ` Laurent Pinchart
2022-06-17 21:48   ` Laurent Pinchart
2022-06-27 15:18   ` Jonathan Corbet
2022-06-18  8:24 ` Mauro Carvalho Chehab
2022-06-18  8:24   ` Mauro Carvalho Chehab
2022-06-18 11:03   ` Miguel Ojeda
2022-06-18 11:16     ` Mauro Carvalho Chehab
2022-06-18 11:16       ` Mauro Carvalho Chehab
2022-06-18 14:37       ` Miguel Ojeda
2022-06-23  9:18   ` Jani Nikula
2022-06-23  9:57     ` Mauro Carvalho Chehab
2022-06-23 10:30       ` Jani Nikula
2022-06-23 13:40       ` Jonathan Corbet
2022-06-24  7:33         ` Mauro Carvalho Chehab
2022-06-24 16:37           ` Markus Heiser
2022-06-27 15:27             ` Jonathan Corbet
2022-06-27 15:31               ` Christoph Hellwig
2022-06-28  7:43               ` Mauro Carvalho Chehab
2022-06-28  7:57                 ` Geert Uytterhoeven
2022-06-28 11:01                   ` Mauro Carvalho Chehab
2022-07-02 12:43                     ` Stephen Rothwell
2022-06-24 22:57           ` Jonathan Corbet
2022-06-25  9:10             ` Mauro Carvalho Chehab [this message]
2022-06-25 14:00               ` Jonathan Corbet
2022-06-25 18:11                 ` Linus Torvalds
2022-06-26  7:55                   ` Mauro Carvalho Chehab
2022-06-26  9:26                     ` Mauro Carvalho Chehab
2022-06-26  9:53                     ` Mauro Carvalho Chehab
2022-06-27 15:28                       ` Liam Howlett
2022-06-27 15:54                         ` Christian Brauner
2022-06-27 16:27                         ` Mark Brown
2022-06-28 10:53                           ` Mauro Carvalho Chehab
2022-06-28 16:13                         ` Luck, Tony
2022-06-27 15:34                   ` Jonathan Corbet
2022-06-27 17:07                     ` Linus Torvalds
2022-07-02 10:52                   ` Mauro Carvalho Chehab
2022-06-25 17:43 ` Steven Rostedt
2022-06-25 17:48   ` Laurent Pinchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220625101029.67f14c4c@sal.lan \
    --to=mchehab@kernel.org \
    --cc=corbet@lwn.net \
    --cc=jani.nikula@intel.com \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    --cc=ksummit@lists.linux.dev \
    --cc=markus.heiser@darmarit.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).