All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist
@ 2022-11-01 17:59 Edwin Török
  2022-11-01 17:59 ` [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive Edwin Török
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Edwin Török @ 2022-11-01 17:59 UTC (permalink / raw)
  To: xen-devel
  Cc: pau.safont, Edwin Török, Christian Lindig, David Scott,
	Wei Liu, Anthony PERARD

Pau has performed some performance tests by booting 1000 mirage
unikernels test VMs and shutting them down.
We've noticed on the flamegraphs that a lot of time is spent in Xenctrl
`domain_getinfolist`, 17.7% of overall time
(This needs to be multiplied by 16 because Dom0 100% usage = 16 vCPUs)
In particular time is spent in camlXenctrl___getlist_339
as can be seen from this flamegraph, and it also creates a very deep
call stack:
https://cdn.jsdelivr.net/gh/edwintorok/xen@xenctrl-coverletter/docs/tmp/perf-merge-boot.svg?x=948.9&y=2213

After some algorithmic improvements to the code now the function barely
shows up at all on a flamegraph, taking only 0.02%.
The function is called camlXenctrl___getlist_343, but that is just due
to the changed arguments, still the same function:
https://cdn.jsdelivr.net/gh/edwintorok/xen@xenctrl-coverletter/docs/tmp/perf-xen-boot-1150.svg?x=1188.0&y=1941&s=infolist

It was calling the Xen hypercall ~500*1000 times for 1000 VMs, and
instead it is now calling it "only" 1000 times.

I would suggest to try to take this in 4.17 given the massive
improvement in scalability (number of VMs on a Xen host).

There are further improvements possible here, but they'll be in xenopsd
(part of XAPI) to avoid calling domain_getinfolist and just use
domain_getinfo: the only reason it needs use infolist is that it does
the lookup by VM UUID and not by domid, but it could have a small cache
of UUID->domid mappings and then call just domain_getinfo (or get the
mapping from xenstore if not in the cache), but it looks like that
improvement is not even needed if this function barely registers on a
flamegraph now.

P.S.: the mirage test VM is a very old PV version, at some point we'll
repeat the test with a Solo5 based PVH one.

Edwin Török (2):
  xenctrl.ml: make domain_getinfolist tail recursive
  xenctrl: use larger chunksize in domain_getinfolist

 tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive
  2022-11-01 17:59 [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Edwin Török
@ 2022-11-01 17:59 ` Edwin Török
  2022-11-01 17:59 ` [PATCH for-4.17 v1 2/2] xenctrl: use larger chunksize in domain_getinfolist Edwin Török
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Edwin Török @ 2022-11-01 17:59 UTC (permalink / raw)
  To: xen-devel
  Cc: pau.safont, Edwin Török, Christian Lindig, David Scott,
	Wei Liu, Anthony PERARD

On a host with ~1000 VMs (the support limit for XAPI) domain_getinfolist
took O(n^2) time (n=number of domains).
This couples with xenopsd making inefficient calls to
domain_getinfolist(1 call/VM) resulted in visibly bad performance of
XAPI.

It was calling the Xen domainfolist hypercall N/2 times.
Optimize this such that it is called at most 2 times during normal use.

Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
and use it instead of calling `@` multiple times.

The added benefit is that the list of domains should now actually be in
increasing order instead of having pairs of 2 changing direction every time.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
---
 tools/ocaml/libs/xc/xenctrl.ml | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 28ed642231..3ebedffdc7 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -226,14 +226,25 @@ external domain_shutdown: handle -> domid -> shutdown_reason -> unit
 external _domain_getinfolist: handle -> domid -> int -> domaininfo list
        = "stub_xc_domain_getinfolist"
 
+let rev_append_fold acc e = List.rev_append e acc
+
+(**
+	[rev_concat lst] is equivalent to [lst |> List.concat |> List.rev]
+	except it is tail recursive, whereas [List.concat] isn't.
+	Example:
+	rev_concat [[10;9;8];[7;6];[5]]] = [5; 6; 7; 8; 9; 10]
+*)
+let rev_concat lst = List.fold_left rev_append_fold [] lst
+
 let domain_getinfolist handle first_domain =
 	let nb = 2 in
-	let last_domid l = (List.hd l).domid + 1 in
-	let rec __getlist from =
-		let l = _domain_getinfolist handle from nb in
-		(if List.length l = nb then __getlist (last_domid l) else []) @ l
-		in
-	List.rev (__getlist first_domain)
+	let rec __getlist lst from =
+		(* _domain_getinfolist returns domains in reverse order, largest first *)
+		match _domain_getinfolist handle from nb with
+		| [] -> rev_concat lst
+		| (hd :: _) as l -> __getlist (l :: lst) (hd.domid + 1)
+	in
+	__getlist [] first_domain
 
 external domain_getinfo: handle -> domid -> domaininfo= "stub_xc_domain_getinfo"
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH for-4.17 v1 2/2] xenctrl: use larger chunksize in domain_getinfolist
  2022-11-01 17:59 [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Edwin Török
  2022-11-01 17:59 ` [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive Edwin Török
@ 2022-11-01 17:59 ` Edwin Török
  2022-11-02  9:11 ` [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Christian Lindig
  2022-11-02 17:26 ` Further issues " Andrew Cooper
  3 siblings, 0 replies; 7+ messages in thread
From: Edwin Török @ 2022-11-01 17:59 UTC (permalink / raw)
  To: xen-devel
  Cc: pau.safont, Edwin Török, Christian Lindig, David Scott,
	Wei Liu, Anthony PERARD

The support limit of XAPI is 1000, so using 1024 will very likely get
everything in one go.
Other code in Xen also uses chunk sizes of 256 or 1024, and would be
much better than 2, especially now that list construction is more
efficient.

Xenopsd should also use `domain_getinfo` instead of `domain_getinfolist`
in a lot of places where info list is used, but that is another
optimization.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
---
 tools/ocaml/libs/xc/xenctrl.ml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 3ebedffdc7..a56539ff2c 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -237,7 +237,7 @@ let rev_append_fold acc e = List.rev_append e acc
 let rev_concat lst = List.fold_left rev_append_fold [] lst
 
 let domain_getinfolist handle first_domain =
-	let nb = 2 in
+	let nb = 1024 in
 	let rec __getlist lst from =
 		(* _domain_getinfolist returns domains in reverse order, largest first *)
 		match _domain_getinfolist handle from nb with
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist
  2022-11-01 17:59 [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Edwin Török
  2022-11-01 17:59 ` [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive Edwin Török
  2022-11-01 17:59 ` [PATCH for-4.17 v1 2/2] xenctrl: use larger chunksize in domain_getinfolist Edwin Török
@ 2022-11-02  9:11 ` Christian Lindig
  2022-11-02  9:31   ` Edwin Torok
  2022-11-02  9:37   ` Edwin Torok
  2022-11-02 17:26 ` Further issues " Andrew Cooper
  3 siblings, 2 replies; 7+ messages in thread
From: Christian Lindig @ 2022-11-02  9:11 UTC (permalink / raw)
  To: Edwin Torok
  Cc: Xen-devel, Pau Ruiz Safont, David Scott, Wei Liu, Anthony Perard



> On 1 Nov 2022, at 17:59, Edwin Török <edvin.torok@citrix.com> wrote:
> 
> 
> Edwin Török (2):
>  xenctrl.ml: make domain_getinfolist tail recursive
>  xenctrl: use larger chunksize in domain_getinfolist
> 
> tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)

Acked-by: Christian Lindig <christian.lindig@citrix.com>


> It was calling the Xen domainfolist hypercall N/2 times.
> Optimize this such that it is called at most 2 times during normal use.
> 
> Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
> and use it instead of calling `@` multiple times.

Are there any assurances about the order in elements returned by domain_getinfolist? I understand that the change maintains the current behaviour but are we even required to maintain that order? Because otherwise we could return the reverse list and save more work.

— C



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist
  2022-11-02  9:11 ` [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Christian Lindig
@ 2022-11-02  9:31   ` Edwin Torok
  2022-11-02  9:37   ` Edwin Torok
  1 sibling, 0 replies; 7+ messages in thread
From: Edwin Torok @ 2022-11-02  9:31 UTC (permalink / raw)
  To: Christian Lindig
  Cc: Xen-devel, Pau Ruiz Safont, David Scott, Wei Liu, Anthony Perard



> On 2 Nov 2022, at 09:11, Christian Lindig <christian.lindig@citrix.com> wrote:
> 
> 
> 
>> On 1 Nov 2022, at 17:59, Edwin Török <edvin.torok@citrix.com> wrote:
>> 
>> 
>> Edwin Török (2):
>> xenctrl.ml: make domain_getinfolist tail recursive
>> xenctrl: use larger chunksize in domain_getinfolist
>> 
>> tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
>> 1 file changed, 18 insertions(+), 7 deletions(-)
> 
> Acked-by: Christian Lindig <christian.lindig@citrix.com>
> 
> 
>> It was calling the Xen domainfolist hypercall N/2 times.
>> Optimize this such that it is called at most 2 times during normal use.
>> 
>> Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
>> and use it instead of calling `@` multiple times.
> 
> Are there any assurances about the order in elements returned by domain_getinfolist? I understand that the change maintains the current behaviour but are we even required to maintain that order? Because otherwise we could return the reverse list and save more work.


Maintaining the current (reversed) order in the C stubs is useful to be able to extract the largest domid so we know where to continue.
However we don't necessarily have to maintain the order on the higher level function available in the .mli, it just happens that `rev_concat` gives us the results in the order that we want.

Although perhaps using an array rather a list here might be better, and we could move resizing/reallocing that array into the C stub, and have just a nicer (and perhaps faster) interface in the C/OCaml API,
by producing less garbage than lists (a single Array.t allocation rather than N list elements).
(Although I don't usually like pushing complexity like that into C stubs, for now I'm happy with the performance of the code as is).

There are more issues in these C stubs in xenctrl, I'll be pouring over the code and try to send a few more patches.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist
  2022-11-02  9:11 ` [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Christian Lindig
  2022-11-02  9:31   ` Edwin Torok
@ 2022-11-02  9:37   ` Edwin Torok
  1 sibling, 0 replies; 7+ messages in thread
From: Edwin Torok @ 2022-11-02  9:37 UTC (permalink / raw)
  To: Christian Lindig
  Cc: Xen-devel, Pau Ruiz Safont, David Scott, Wei Liu, Anthony Perard,
	Andrew Cooper



> On 2 Nov 2022, at 09:11, Christian Lindig <christian.lindig@citrix.com> wrote:
> 
> 
> 
>> On 1 Nov 2022, at 17:59, Edwin Török <edvin.torok@citrix.com> wrote:
>> 
>> 
>> Edwin Török (2):
>> xenctrl.ml: make domain_getinfolist tail recursive
>> xenctrl: use larger chunksize in domain_getinfolist
>> 
>> tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
>> 1 file changed, 18 insertions(+), 7 deletions(-)
> 
> Acked-by: Christian Lindig <christian.lindig@citrix.com>
> 
> 
>> It was calling the Xen domainfolist hypercall N/2 times.
>> Optimize this such that it is called at most 2 times during normal use.
>> 
>> Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
>> and use it instead of calling `@` multiple times.
> 
> Are there any assurances about the order in elements returned by domain_getinfolist? I understand that the change maintains the current behaviour but are we even required to maintain that order? Because otherwise we could return the reverse list and save more work.


After some discussion with Andrew Cooper apparently the xenctrl API is broken and cannot be used safely as is, so I'll be rewriting this patch.
Domids can be assigned randomly, or toolstacks can set a custom domid, so it is not guaranteed that new domids always show up as higher numbers,
and the only safe way to use infolist is to either request 1 domid, or request them all (domids are 15-bit, so 32768).
Anything else is prone to race conditions and might give you an inconsistent snapshot.

This is probably better fixed by changing the prototype of the C function in xenctrl to not take a count to force updating all callers
(the callers in the 'xl' toolstack are just as buggy as the one in this OCaml binding, and probably better to fix the API in xenctrl than working around the race condition in each caller).

Stay tuned for more patches...

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Further issues Re: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist
  2022-11-01 17:59 [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Edwin Török
                   ` (2 preceding siblings ...)
  2022-11-02  9:11 ` [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Christian Lindig
@ 2022-11-02 17:26 ` Andrew Cooper
  3 siblings, 0 replies; 7+ messages in thread
From: Andrew Cooper @ 2022-11-02 17:26 UTC (permalink / raw)
  To: Edwin Torok, xen-devel
  Cc: Pau Ruiz Safont, Christian Lindig, David Scott, Wei Liu,
	Anthony Perard, Henry Wang

Hi,

To be clear, what is presented here are clear improvements in the status
quo, and qualify for inclusion on their own merits.  And definitely
should be considered.


That said, this is a swamp with future problems, and one rather
fundamental one in Xen which I is not going to be easy to fix.

1) (simple), there are a bunch of stubs, including
stub_xc_domain_getinfo() which don't use
caml_{enter,leave}_blocking_section() when they should.

2) stub_xc_domain_getinfo() reuses xc_domain_getinfolist() meaning that
it uses XEN_SYSCTL_getdomaininfolist rather than
XEN_DOMCTL_getdomaininfo, which is a problem because...

3) While both of these hypercalls have crazy APIs leading to loads of
broken callers, at least the DOMCTL has a fastpath for when you specify
a valid domid.  The SYSCTL (and DOMCTL slowpath) is an O(N) search of
all domains starting from d0 to find the first domain with an id >= the
input request.

The DOMCTL slowpath is useless.  Not a single caller (I've ever
encountered) wants that behaviour, whereas I've needed to bugfix caller
which didn't have an "&& info.domid == domid" to have one, to get the
semantics they wanted.  Cleaning this up will be a good improvement.

4) The (adjusted) algorithm in patch 1 still loops until it gets a
result with no entries.  Meaning that it's still going to make one
hypercall too many, searching the entire domlist, to return no data.  In
principle you could optimise to stop at any hypercall which returns
fewer than the requested number of domains, except...

5) ... if you ever use more than a single hypercall, Xen has dropped and
re-acquired the domlist read lock, meaning that you can't use the result
anyway.  Causality couldn't be broken when domains were allocated
sequentially, but we have a random allocation mode now so you can
observe things out of order.

The only safe action is to ask for all 32k domains in a single go, and
use a single hypercall.

~Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-02 17:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01 17:59 [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Edwin Török
2022-11-01 17:59 ` [PATCH for-4.17 v1 1/2] xenctrl.ml: make domain_getinfolist tail recursive Edwin Török
2022-11-01 17:59 ` [PATCH for-4.17 v1 2/2] xenctrl: use larger chunksize in domain_getinfolist Edwin Török
2022-11-02  9:11 ` [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Christian Lindig
2022-11-02  9:31   ` Edwin Torok
2022-11-02  9:37   ` Edwin Torok
2022-11-02 17:26 ` Further issues " Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.