All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist
@ 2022-11-01 17:59 Edwin Török
  2022-11-01 17:59 ` [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive Edwin Török
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Edwin Török @ 2022-11-01 17:59 UTC (permalink / raw)
  To: xen-devel
  Cc: pau.safont, Edwin Török, Christian Lindig, David Scott,
	Wei Liu, Anthony PERARD

Pau has performed some performance tests by booting 1000 mirage
unikernels test VMs and shutting them down.
We've noticed on the flamegraphs that a lot of time is spent in Xenctrl
`domain_getinfolist`, 17.7% of overall time
(This needs to be multiplied by 16 because Dom0 100% usage = 16 vCPUs)
In particular time is spent in camlXenctrl___getlist_339
as can be seen from this flamegraph, and it also creates a very deep
call stack:
https://cdn.jsdelivr.net/gh/edwintorok/xen@xenctrl-coverletter/docs/tmp/perf-merge-boot.svg?x=948.9&y=2213

After some algorithmic improvements to the code now the function barely
shows up at all on a flamegraph, taking only 0.02%.
The function is called camlXenctrl___getlist_343, but that is just due
to the changed arguments, still the same function:
https://cdn.jsdelivr.net/gh/edwintorok/xen@xenctrl-coverletter/docs/tmp/perf-xen-boot-1150.svg?x=1188.0&y=1941&s=infolist

It was calling the Xen hypercall ~500*1000 times for 1000 VMs, and
instead it is now calling it "only" 1000 times.

I would suggest to try to take this in 4.17 given the massive
improvement in scalability (number of VMs on a Xen host).

There are further improvements possible here, but they'll be in xenopsd
(part of XAPI) to avoid calling domain_getinfolist and just use
domain_getinfo: the only reason it needs use infolist is that it does
the lookup by VM UUID and not by domid, but it could have a small cache
of UUID->domid mappings and then call just domain_getinfo (or get the
mapping from xenstore if not in the cache), but it looks like that
improvement is not even needed if this function barely registers on a
flamegraph now.

P.S.: the mirage test VM is a very old PV version, at some point we'll
repeat the test with a Solo5 based PVH one.

Edwin Török (2):
  xenctrl.ml: make domain_getinfolist tail recursive
  xenctrl: use larger chunksize in domain_getinfolist

 tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-02 17:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01 17:59 [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Edwin Török
2022-11-01 17:59 ` [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive Edwin Török
2022-11-01 17:59 ` [PATCH for-4.17 v1 2/2] xenctrl: use larger chunksize in domain_getinfolist Edwin Török
2022-11-02  9:11 ` [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Christian Lindig
2022-11-02  9:31   ` Edwin Torok
2022-11-02  9:37   ` Edwin Torok
2022-11-02 17:26 ` Further issues " Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.