All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
@ 2021-11-09 14:31 ` Roger Pau Monne
  2021-11-09 14:34   ` Andrew Cooper
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Roger Pau Monne @ 2021-11-09 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Hongyan Xia,
	Dmitry Isaikin, Ian Jackson

This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Performance analysis has shown that dropping the domctl lock during
domain destruction greatly increases the contention in the heap_lock,
thus making parallel destruction of domains slower.

The following lockperf data shows the difference between the current
code and the reverted one:

lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
lock:  2788704(0.362311723s), block:   222681( 0.091152276s)

Given the current point in the release, revert the commit and
reinstate holding the domctl lock during domain destruction. Further
work should be done in order to re-add more fine grained locking to
the domain destruction path once a proper solution to avoid the
heap_lock contention is found.

Reported-by: Hongyan Xia <hongyxia@amazon.com>
Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Jackson <iwj@xenproject.org>

Since this is a revert and not new code I think the risk is lower.
There's however some risk, as the original commit was from 2017, and
hence the surrounding code has changed a bit. It's also a possibility
that some other parts of the domain destruction code now rely on this
more fine grained locking. Local tests however haven't shown issues.
---
 xen/common/domain.c | 12 ++----------
 xen/common/domctl.c |  5 +----
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 56d47dd664..093bb4403f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -917,21 +917,13 @@ int domain_kill(struct domain *d)
     if ( d == current->domain )
         return -EINVAL;
 
-    /* Protected by d->domain_lock. */
+    /* Protected by domctl_lock. */
     switch ( d->is_dying )
     {
     case DOMDYING_alive:
-        domain_unlock(d);
         domain_pause(d);
-        domain_lock(d);
-        /*
-         * With the domain lock dropped, d->is_dying may have changed. Call
-         * ourselves recursively if so, which is safe as then we won't come
-         * back here.
-         */
-        if ( d->is_dying != DOMDYING_alive )
-            return domain_kill(d);
         d->is_dying = DOMDYING_dying;
+        spin_barrier(&d->domain_lock);
         argo_destroy(d);
         vnuma_destroy(d->vnuma);
         domain_set_outstanding_pages(d, 0);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 271862ae58..879a2adcbe 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -497,14 +497,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         break;
 
     case XEN_DOMCTL_destroydomain:
-        domctl_lock_release();
-        domain_lock(d);
         ret = domain_kill(d);
-        domain_unlock(d);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(
                 __HYPERVISOR_domctl, "h", u_domctl);
-        goto domctl_out_unlock_domonly;
+        break;
 
     case XEN_DOMCTL_setnodeaffinity:
     {
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
  2021-11-09 14:31 ` [PATCH for-4.16] " Roger Pau Monne
@ 2021-11-09 14:34   ` Andrew Cooper
  2021-11-09 14:42   ` Julien Grall
  2021-11-09 15:04   ` Ian Jackson
  2 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2021-11-09 14:34 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Wei Liu, Hongyan Xia, Dmitry Isaikin,
	Ian Jackson

On 09/11/2021 14:31, Roger Pau Monne wrote:
> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
>
> Performance analysis has shown that dropping the domctl lock during
> domain destruction greatly increases the contention in the heap_lock,
> thus making parallel destruction of domains slower.
>
> The following lockperf data shows the difference between the current
> code and the reverted one:
>
> lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
> lock:  2788704(0.362311723s), block:   222681( 0.091152276s)
>
> Given the current point in the release, revert the commit and
> reinstate holding the domctl lock during domain destruction. Further
> work should be done in order to re-add more fine grained locking to
> the domain destruction path once a proper solution to avoid the
> heap_lock contention is found.
>
> Reported-by: Hongyan Xia <hongyxia@amazon.com>
> Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
  2021-11-09 14:31 ` [PATCH for-4.16] " Roger Pau Monne
  2021-11-09 14:34   ` Andrew Cooper
@ 2021-11-09 14:42   ` Julien Grall
  2021-11-09 14:55     ` Roger Pau Monné
  2021-11-09 15:04   ` Ian Jackson
  2 siblings, 1 reply; 10+ messages in thread
From: Julien Grall @ 2021-11-09 14:42 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Hongyan Xia, Dmitry Isaikin, Ian Jackson

Hi Roger,

On 09/11/2021 14:31, Roger Pau Monne wrote:
> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> 
> Performance analysis has shown that dropping the domctl lock during
> domain destruction greatly increases the contention in the heap_lock,
> thus making parallel destruction of domains slower.
> 
> The following lockperf data shows the difference between the current
> code and the reverted one:
> 
> lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
> lock:  2788704(0.362311723s), block:   222681( 0.091152276s)

Thanks for the numbers, this is already an improvement from the reverted.

Can you also please provide some details on the setup that was used to 
get the number? (e.g. how many guests, amount of memory...).

For instance, in the case of Amazon our setup was:

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes

> 
> Given the current point in the release, revert the commit and
> reinstate holding the domctl lock during domain destruction. Further
> work should be done in order to re-add more fine grained locking to
> the domain destruction path once a proper solution to avoid the
> heap_lock contention is found.
> 
> Reported-by: Hongyan Xia <hongyxia@amazon.com>
> Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Ian Jackson <iwj@xenproject.org>
> 
> Since this is a revert and not new code I think the risk is lower.
> There's however some risk, as the original commit was from 2017, and
> hence the surrounding code has changed a bit. It's also a possibility
> that some other parts of the domain destruction code now rely on this
> more fine grained locking. Local tests however haven't shown issues.
> ---
>   xen/common/domain.c | 12 ++----------
>   xen/common/domctl.c |  5 +----
>   2 files changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 56d47dd664..093bb4403f 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -917,21 +917,13 @@ int domain_kill(struct domain *d)
>       if ( d == current->domain )
>           return -EINVAL;
>   
> -    /* Protected by d->domain_lock. */
> +    /* Protected by domctl_lock. */
>       switch ( d->is_dying )
>       {
>       case DOMDYING_alive:
> -        domain_unlock(d);
>           domain_pause(d);
> -        domain_lock(d);
> -        /*
> -         * With the domain lock dropped, d->is_dying may have changed. Call
> -         * ourselves recursively if so, which is safe as then we won't come
> -         * back here.
> -         */
> -        if ( d->is_dying != DOMDYING_alive )
> -            return domain_kill(d);
>           d->is_dying = DOMDYING_dying;
> +        spin_barrier(&d->domain_lock);
>           argo_destroy(d);
>           vnuma_destroy(d->vnuma);
>           domain_set_outstanding_pages(d, 0);
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 271862ae58..879a2adcbe 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -497,14 +497,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>           break;
>   
>       case XEN_DOMCTL_destroydomain:
> -        domctl_lock_release();
> -        domain_lock(d);
>           ret = domain_kill(d);
> -        domain_unlock(d);
>           if ( ret == -ERESTART )
>               ret = hypercall_create_continuation(
>                   __HYPERVISOR_domctl, "h", u_domctl);
> -        goto domctl_out_unlock_domonly;
> +        break;
>   
>       case XEN_DOMCTL_setnodeaffinity:
>       {
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
  2021-11-09 14:42   ` Julien Grall
@ 2021-11-09 14:55     ` Roger Pau Monné
  2021-11-09 15:28       ` Julien Grall
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2021-11-09 14:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Hongyan Xia, Dmitry Isaikin,
	Ian Jackson

On Tue, Nov 09, 2021 at 02:42:58PM +0000, Julien Grall wrote:
> Hi Roger,
> 
> On 09/11/2021 14:31, Roger Pau Monne wrote:
> > This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> > 
> > Performance analysis has shown that dropping the domctl lock during
> > domain destruction greatly increases the contention in the heap_lock,
> > thus making parallel destruction of domains slower.
> > 
> > The following lockperf data shows the difference between the current
> > code and the reverted one:
> > 
> > lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
> > lock:  2788704(0.362311723s), block:   222681( 0.091152276s)
> 
> Thanks for the numbers, this is already an improvement from the reverted.
> 
> Can you also please provide some details on the setup that was used to get
> the number? (e.g. how many guests, amount of memory...).

Those are from Dmitry, and are gathered after destroying 5 guests in
parallel. Given his previous emails he seems to use 2GB HVM guests for
other tests, so I would assume that's the case for the lock profile data
also (albeit it's not said explicitly):

https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01515.html

I'm not sure it's worth adding this explicitly, as it's not a very
complex test case. Probably any attempts to destroy a minimal amount
of guests in parallel (5?) will already show the lock contention in
the profiling.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
  2021-11-09 14:31 ` [PATCH for-4.16] " Roger Pau Monne
  2021-11-09 14:34   ` Andrew Cooper
  2021-11-09 14:42   ` Julien Grall
@ 2021-11-09 15:04   ` Ian Jackson
  2021-11-09 15:15     ` Roger Pau Monné
  2 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2021-11-09 15:04 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Hongyan Xia,
	Dmitry Isaikin

Roger Pau Monne writes ("[PATCH for-4.16] Revert "domctl: improve locking during domain destruction""):
> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> 
> Performance analysis has shown that dropping the domctl lock during
> domain destruction greatly increases the contention in the heap_lock,
> thus making parallel destruction of domains slower.
...
> Given the current point in the release, revert the commit and
> reinstate holding the domctl lock during domain destruction. Further
> work should be done in order to re-add more fine grained locking to
> the domain destruction path once a proper solution to avoid the
> heap_lock contention is found.
> ---
...
> Since this is a revert and not new code I think the risk is lower.
> There's however some risk, as the original commit was from 2017, and
> hence the surrounding code has changed a bit. It's also a possibility
> that some other parts of the domain destruction code now rely on this
> more fine grained locking. Local tests however haven't shown issues.

From a release management point of view I don't regard this as the
kind of "revert" that ought to get any kind of special consideration.
The tree has been like this since 2017 and Xen 4.11 and many changes
have been happened since.

So I am going to treat this as an effectively new change.

AIUI it is a proposal to improve performance, not a bugfix.  Was this
change posted (or, proposed on-list) before the Xen 4.16 Last Posting
Date (24th of September) ?  Even if it was, it would need a freeze
exception.

Ian.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
  2021-11-09 15:04   ` Ian Jackson
@ 2021-11-09 15:15     ` Roger Pau Monné
  2021-11-16 14:51       ` [PATCH for-4.16] Revert "domctl: improve locking during domain destruction" [and 2 more messages] Ian Jackson
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2021-11-09 15:15 UTC (permalink / raw)
  To: Ian Jackson
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Hongyan Xia,
	Dmitry Isaikin

On Tue, Nov 09, 2021 at 03:04:56PM +0000, Ian Jackson wrote:
> Roger Pau Monne writes ("[PATCH for-4.16] Revert "domctl: improve locking during domain destruction""):
> > This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> > 
> > Performance analysis has shown that dropping the domctl lock during
> > domain destruction greatly increases the contention in the heap_lock,
> > thus making parallel destruction of domains slower.
> ...
> > Given the current point in the release, revert the commit and
> > reinstate holding the domctl lock during domain destruction. Further
> > work should be done in order to re-add more fine grained locking to
> > the domain destruction path once a proper solution to avoid the
> > heap_lock contention is found.
> > ---
> ...
> > Since this is a revert and not new code I think the risk is lower.
> > There's however some risk, as the original commit was from 2017, and
> > hence the surrounding code has changed a bit. It's also a possibility
> > that some other parts of the domain destruction code now rely on this
> > more fine grained locking. Local tests however haven't shown issues.
> 
> From a release management point of view I don't regard this as the
> kind of "revert" that ought to get any kind of special consideration.
> The tree has been like this since 2017 and Xen 4.11 and many changes
> have been happened since.
> 
> So I am going to treat this as an effectively new change.
> 
> AIUI it is a proposal to improve performance, not a bugfix.  Was this
> change posted (or, proposed on-list) before the Xen 4.16 Last Posting
> Date (24th of September) ?  Even if it was, it would need a freeze
> exception.

It was posted here:

https://lore.kernel.org/xen-devel/de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@amazon.com/

Which was missing a spin_barrier, and in a different form here:

https://lore.kernel.org/xen-devel/2e7044de3cd8a6768a20250e61fe262f3a018724.1631790362.git.isaikin-dmitry@yandex.ru/

Thanks, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction"
  2021-11-09 14:55     ` Roger Pau Monné
@ 2021-11-09 15:28       ` Julien Grall
  0 siblings, 0 replies; 10+ messages in thread
From: Julien Grall @ 2021-11-09 15:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Hongyan Xia, Dmitry Isaikin,
	Ian Jackson

Hi,

On 09/11/2021 14:55, Roger Pau Monné wrote:
> On Tue, Nov 09, 2021 at 02:42:58PM +0000, Julien Grall wrote:
>> Hi Roger,
>>
>> On 09/11/2021 14:31, Roger Pau Monne wrote:
>>> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
>>>
>>> Performance analysis has shown that dropping the domctl lock during
>>> domain destruction greatly increases the contention in the heap_lock,
>>> thus making parallel destruction of domains slower.
>>>
>>> The following lockperf data shows the difference between the current
>>> code and the reverted one:
>>>
>>> lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
>>> lock:  2788704(0.362311723s), block:   222681( 0.091152276s)
>>
>> Thanks for the numbers, this is already an improvement from the reverted.
>>
>> Can you also please provide some details on the setup that was used to get
>> the number? (e.g. how many guests, amount of memory...).
> 
> Those are from Dmitry, and are gathered after destroying 5 guests in
> parallel. Given his previous emails he seems to use 2GB HVM guests for
> other tests, so I would assume that's the case for the lock profile data
> also (albeit it's not said explicitly):
> 
> https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01515.html
> 
> I'm not sure it's worth adding this explicitly, as it's not a very
> complex test case. Probably any attempts to destroy a minimal amount
> of guests in parallel (5?) will already show the lock contention in
> the profiling.

In this case, I am not too concerned about not been able to reproduce 
it. However, I think it is a good practice to always post the setup 
along with the numbers.

This makes easier to understand the context of the patch and avoid 
spending time digging into the archives to find the original report.

Anyway, you already wrote everything above. So this is a matter of 
adding your first paragraph in the commit message + maybe a link to the 
original discussion(s).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction"
@ 2021-11-12 12:02 Roger Pau Monne
  2021-11-09 14:31 ` [PATCH for-4.16] " Roger Pau Monne
  2021-11-15 11:51 ` [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction" Julien Grall
  0 siblings, 2 replies; 10+ messages in thread
From: Roger Pau Monne @ 2021-11-12 12:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Hongyan Xia,
	Dmitry Isaikin, Ian Jackson

This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Performance analysis has shown that dropping the domctl lock during
domain destruction greatly increases the contention in the heap_lock,
thus making parallel destruction of domains slower.

The following lockperf data shows the difference between the current
code and the reverted one:

lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
lock:  2788704(0.362311723s), block:   222681( 0.091152276s)

Those figures are from Dmitry Isaikin, and are gathered after
destroying 5 2GB HVM guests in parallel:

https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01515.html

Given the current point in the release, revert the commit and
reinstate holding the domctl lock during domain destruction. Further
work should be done in order to re-add more fine grained locking to
the domain destruction path once a proper solution to avoid the
heap_lock contention is found.

Reported-by: Hongyan Xia <hongyxia@amazon.com>
Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Ian Jackson <iwj@xenproject.org>

Since this is a revert and not new code I think the risk is lower.
There's however some risk, as the original commit was from 2017, and
hence the surrounding code has changed a bit. It's also a possibility
that some other parts of the domain destruction code now rely on this
more fine grained locking. Local tests however haven't shown issues.
---
Changes since v1:
 - Expand commit message.
---
 xen/common/domain.c | 12 ++----------
 xen/common/domctl.c |  5 +----
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 56d47dd664..093bb4403f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -917,21 +917,13 @@ int domain_kill(struct domain *d)
     if ( d == current->domain )
         return -EINVAL;
 
-    /* Protected by d->domain_lock. */
+    /* Protected by domctl_lock. */
     switch ( d->is_dying )
     {
     case DOMDYING_alive:
-        domain_unlock(d);
         domain_pause(d);
-        domain_lock(d);
-        /*
-         * With the domain lock dropped, d->is_dying may have changed. Call
-         * ourselves recursively if so, which is safe as then we won't come
-         * back here.
-         */
-        if ( d->is_dying != DOMDYING_alive )
-            return domain_kill(d);
         d->is_dying = DOMDYING_dying;
+        spin_barrier(&d->domain_lock);
         argo_destroy(d);
         vnuma_destroy(d->vnuma);
         domain_set_outstanding_pages(d, 0);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 271862ae58..879a2adcbe 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -497,14 +497,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         break;
 
     case XEN_DOMCTL_destroydomain:
-        domctl_lock_release();
-        domain_lock(d);
         ret = domain_kill(d);
-        domain_unlock(d);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(
                 __HYPERVISOR_domctl, "h", u_domctl);
-        goto domctl_out_unlock_domonly;
+        break;
 
     case XEN_DOMCTL_setnodeaffinity:
     {
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction"
  2021-11-12 12:02 [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction" Roger Pau Monne
  2021-11-09 14:31 ` [PATCH for-4.16] " Roger Pau Monne
@ 2021-11-15 11:51 ` Julien Grall
  1 sibling, 0 replies; 10+ messages in thread
From: Julien Grall @ 2021-11-15 11:51 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Hongyan Xia, Dmitry Isaikin, Ian Jackson

Hi Roger,

On 12/11/2021 12:02, Roger Pau Monne wrote:
> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> 
> Performance analysis has shown that dropping the domctl lock during
> domain destruction greatly increases the contention in the heap_lock,
> thus making parallel destruction of domains slower.
> 
> The following lockperf data shows the difference between the current
> code and the reverted one:
> 
> lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
> lock:  2788704(0.362311723s), block:   222681( 0.091152276s)
> 
> Those figures are from Dmitry Isaikin, and are gathered after
> destroying 5 2GB HVM guests in parallel:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01515.html
> 
> Given the current point in the release, revert the commit and
> reinstate holding the domctl lock during domain destruction. Further
> work should be done in order to re-add more fine grained locking to
> the domain destruction path once a proper solution to avoid the
> heap_lock contention is found.
> 
> Reported-by: Hongyan Xia <hongyxia@amazon.com>
> Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction" [and 2 more messages]
  2021-11-09 15:15     ` Roger Pau Monné
@ 2021-11-16 14:51       ` Ian Jackson
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Jackson @ 2021-11-16 14:51 UTC (permalink / raw)
  To: Roger Pau Monne, Julien Grall
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Hongyan Xia, Dmitry Isaikin

Roger Pau Monné writes ("Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction""):
> On Tue, Nov 09, 2021 at 03:04:56PM +0000, Ian Jackson wrote:
> > So I am going to treat this as an effectively new change.
> > 
> > AIUI it is a proposal to improve performance, not a bugfix.  Was this
> > change posted (or, proposed on-list) before the Xen 4.16 Last Posting
> > Date (24th of September) ?  Even if it was, it would need a freeze
> > exception.
> 
> It was posted here:
> 
> https://lore.kernel.org/xen-devel/de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@amazon.com/
> 
> Which was missing a spin_barrier, and in a different form here:
> 
> https://lore.kernel.org/xen-devel/2e7044de3cd8a6768a20250e61fe262f3a018724.1631790362.git.isaikin-dmitry@yandex.ru/

Thanks.

Julien Grall writes ("Re: [PATCH for-4.16] Revert "domctl: improve locking during domain destruction""):
> For instance, in the case of Amazon our setup was:
> 
> On a 144-core server with 4TiB of memory, destroying 32 guests (each
> with 4 vcpus and 122GiB memory) simultaneously takes:
> 
> before the revert: 29 minutes
> after the revert: 6 minutes

This is quite substantial!

> > Given the current point in the release, revert the commit and
> > reinstate holding the domctl lock during domain destruction. Further
> > work should be done in order to re-add more fine grained locking to
> > the domain destruction path once a proper solution to avoid the
> > heap_lock contention is found.
...
> > Since this is a revert and not new code I think the risk is lower.
> > There's however some risk, as the original commit was from 2017, and
> > hence the surrounding code has changed a bit. It's also a possibility
> > that some other parts of the domain destruction code now rely on this
> > more fine grained locking. Local tests however haven't shown issues.

I am finding this a difficult decision.  It appears from the threads
that a number of people have been running with this revert, which
would serve to mitigate the risk, but it's not clear to me what
version(s) of Xen they had applied it to.

Ultimately, though, I think I need to refer myself to the schedule I set:

    Friday 12th November                  Hard code freeze [*]

      Bugfixes for serious bugs (including regressions), and low-risk
      fixes only.
      (0.5 weeks)

I don't see any way that this change fits into this.  The point of a
freeze is, in part, that we stop trying to *improve* things and start
trying to *unbreak* them :-).

While the perf here is clearly poor I don't think it's actually
broken.

So, I'm afraid I'm saying "no".

Ian.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-16 14:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-12 12:02 [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction" Roger Pau Monne
2021-11-09 14:31 ` [PATCH for-4.16] " Roger Pau Monne
2021-11-09 14:34   ` Andrew Cooper
2021-11-09 14:42   ` Julien Grall
2021-11-09 14:55     ` Roger Pau Monné
2021-11-09 15:28       ` Julien Grall
2021-11-09 15:04   ` Ian Jackson
2021-11-09 15:15     ` Roger Pau Monné
2021-11-16 14:51       ` [PATCH for-4.16] Revert "domctl: improve locking during domain destruction" [and 2 more messages] Ian Jackson
2021-11-15 11:51 ` [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction" Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.