All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] domctl: hold domctl lock while domain is destroyed
@ 2021-09-16 11:10 Dmitry Isaikin
  2021-09-16 12:30 ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Isaikin @ 2021-09-16 11:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Dmitry Isaykin, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>

This significantly speeds up concurrent destruction of multiple domains on x86.

I identify the place taking the most time:

    do_domctl(case XEN_DOMCTL_destroydomain)
      -> domain_kill()
           -> domain_relinquish_resources()
                -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)

My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.

I use this command for test:

    for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done

Without holding the lock all calls of `relinquish_memory(d, &d->page_list, PGT_l4_page_table)`
took on my setup (for HVM with 2GB of memory) about 3 seconds for each destroying domain.

With holding the lock it took only 100 ms.

Signed-off-by: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
---
 xen/common/domctl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 12d6144d28..b9a50d3e5d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -497,14 +497,13 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         break;
 
     case XEN_DOMCTL_destroydomain:
-        domctl_lock_release();
         domain_lock(d);
         ret = domain_kill(d);
         domain_unlock(d);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(
                 __HYPERVISOR_domctl, "h", u_domctl);
-        goto domctl_out_unlock_domonly;
+        break;
 
     case XEN_DOMCTL_setnodeaffinity:
     {
-- 
2.33.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-16 11:10 [PATCH v1] domctl: hold domctl lock while domain is destroyed Dmitry Isaikin
@ 2021-09-16 12:30 ` Jan Beulich
  2021-09-16 13:08   ` Roger Pau Monné
  2021-09-16 17:52   ` Andrew Cooper
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2021-09-16 12:30 UTC (permalink / raw)
  To: Dmitry Isaikin
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Wei Liu, xen-devel

On 16.09.2021 13:10, Dmitry Isaikin wrote:
> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
> 
> This significantly speeds up concurrent destruction of multiple domains on x86.

This effectively is a simplistic revert of 228ab9992ffb ("domctl:
improve locking during domain destruction"). There it was found to
actually improve things; now you're claiming the opposite. It'll
take more justification, clearly identifying that you actually
revert an earlier change, and an explanation why then you don't
revert that change altogether. You will want to specifically also
consider the cleaning up of huge VMs, where use of the (global)
domctl lock may hamper progress of other (parallel) operations on
the system.

> I identify the place taking the most time:
> 
>     do_domctl(case XEN_DOMCTL_destroydomain)
>       -> domain_kill()
>            -> domain_relinquish_resources()
>                 -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)
> 
> My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.
> 
> I use this command for test:
> 
>     for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done
> 
> Without holding the lock all calls of `relinquish_memory(d, &d->page_list, PGT_l4_page_table)`
> took on my setup (for HVM with 2GB of memory) about 3 seconds for each destroying domain.
> 
> With holding the lock it took only 100 ms.

I'm further afraid I can't make the connection. Do you have an
explanation for why there would be such a massive difference?
What would prevent progress of relinquish_memory() with the
domctl lock not held?

Jan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-16 12:30 ` Jan Beulich
@ 2021-09-16 13:08   ` Roger Pau Monné
  2021-09-16 17:52   ` Andrew Cooper
  1 sibling, 0 replies; 10+ messages in thread
From: Roger Pau Monné @ 2021-09-16 13:08 UTC (permalink / raw)
  To: Dmitry Isaikin
  Cc: Jan Beulich, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, xen-devel

On Thu, Sep 16, 2021 at 02:30:39PM +0200, Jan Beulich wrote:
> On 16.09.2021 13:10, Dmitry Isaikin wrote:
> > From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
> > 
> > This significantly speeds up concurrent destruction of multiple domains on x86.
> 
> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
> improve locking during domain destruction"). There it was found to
> actually improve things; now you're claiming the opposite. It'll
> take more justification, clearly identifying that you actually
> revert an earlier change, and an explanation why then you don't
> revert that change altogether. You will want to specifically also
> consider the cleaning up of huge VMs, where use of the (global)
> domctl lock may hamper progress of other (parallel) operations on
> the system.
> 
> > I identify the place taking the most time:
> > 
> >     do_domctl(case XEN_DOMCTL_destroydomain)
> >       -> domain_kill()
> >            -> domain_relinquish_resources()
> >                 -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)
> > 
> > My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.
> > 
> > I use this command for test:
> > 
> >     for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done
> > 
> > Without holding the lock all calls of `relinquish_memory(d, &d->page_list, PGT_l4_page_table)`
> > took on my setup (for HVM with 2GB of memory) about 3 seconds for each destroying domain.
> > 
> > With holding the lock it took only 100 ms.
> 
> I'm further afraid I can't make the connection. Do you have an
> explanation for why there would be such a massive difference?
> What would prevent progress of relinquish_memory() with the
> domctl lock not held?

I would recommend to Dmitry to use lock profiling with and without
this change applied and try to spot which lock is causing the
contention as a starting point. That should be fairly easy and could
share some light.

Regards, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-16 12:30 ` Jan Beulich
  2021-09-16 13:08   ` Roger Pau Monné
@ 2021-09-16 17:52   ` Andrew Cooper
  2021-09-17  6:17     ` Jan Beulich
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2021-09-16 17:52 UTC (permalink / raw)
  To: Jan Beulich, Dmitry Isaikin
  Cc: George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On 16/09/2021 13:30, Jan Beulich wrote:
> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>
>> This significantly speeds up concurrent destruction of multiple domains on x86.
> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
> improve locking during domain destruction"). There it was found to
> actually improve things;

Was it?  I recall that it was simply an expectation that performance
would be better...

Amazon previously identified 228ab9992ffb as a massive perf hit, too.

Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
incomplete, and it appears as if it wasn't necessarily a wise move in
hindsight.

~Andrew



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-16 17:52   ` Andrew Cooper
@ 2021-09-17  6:17     ` Jan Beulich
  2021-09-17  9:27       ` Julien Grall
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-09-17  6:17 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel, Dmitry Isaikin

On 16.09.2021 19:52, Andrew Cooper wrote:
> On 16/09/2021 13:30, Jan Beulich wrote:
>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>
>>> This significantly speeds up concurrent destruction of multiple domains on x86.
>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>> improve locking during domain destruction"). There it was found to
>> actually improve things;
> 
> Was it?  I recall that it was simply an expectation that performance
> would be better...

My recollection is that it was, for one of our customers.

> Amazon previously identified 228ab9992ffb as a massive perf hit, too.

Interesting. I don't recall any mail to that effect.

> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
> incomplete, and it appears as if it wasn't necessarily a wise move in
> hindsight.

Possible; I continue to think though that the present observation wants
properly understanding instead of more or less blindly undoing that
change.

Jan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-17  6:17     ` Jan Beulich
@ 2021-09-17  9:27       ` Julien Grall
  2021-09-17  9:41         ` Andrew Cooper
  0 siblings, 1 reply; 10+ messages in thread
From: Julien Grall @ 2021-09-17  9:27 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	xen-devel, Dmitry Isaikin, rjstone, raphning, Paul Durrant

Hi,

(+ some AWS folks)

On 17/09/2021 11:17, Jan Beulich wrote:
> On 16.09.2021 19:52, Andrew Cooper wrote:
>> On 16/09/2021 13:30, Jan Beulich wrote:
>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>
>>>> This significantly speeds up concurrent destruction of multiple domains on x86.
>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>> improve locking during domain destruction"). There it was found to
>>> actually improve things;
>>
>> Was it?  I recall that it was simply an expectation that performance
>> would be better...
> 
> My recollection is that it was, for one of our customers.
> 
>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
> 
> Interesting. I don't recall any mail to that effect.

Here we go:

https://lore.kernel.org/xen-devel/de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@amazon.com/

We have been using the revert for quite a while in production and didn't 
notice any regression.

> 
>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>> incomplete, and it appears as if it wasn't necessarily a wise move in
>> hindsight.
> 
> Possible; I continue to think though that the present observation wants
> properly understanding instead of more or less blindly undoing that
> change.

To be honest, I think this is the other way around. You wrote and merged 
a patch with the following justification:

"
     There is no need to hold the global domctl lock across domain_kill() -
     the domain lock is fully sufficient here, and parallel cleanup after
     multiple domains performs quite a bit better this way.
"

Clearly, the original commit message is lacking details on the exact 
setups and numbers. But we now have two stakeholders with proof that 
your patch is harmful to the setup you claim perform better with your patch.

To me this is enough justification to revert the original patch. Anyone 
against the revert, should provide clear details of why the patch should 
not be reverted.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-17  9:27       ` Julien Grall
@ 2021-09-17  9:41         ` Andrew Cooper
  2021-09-17  9:47           ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2021-09-17  9:41 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	xen-devel, Dmitry Isaikin, rjstone, raphning, Paul Durrant

On 17/09/2021 10:27, Julien Grall wrote:
> Hi,
> 
> (+ some AWS folks)
> 
> On 17/09/2021 11:17, Jan Beulich wrote:
>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>>
>>>>> This significantly speeds up concurrent destruction of multiple
>>>>> domains on x86.
>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>> improve locking during domain destruction"). There it was found to
>>>> actually improve things;
>>>
>>> Was it?  I recall that it was simply an expectation that performance
>>> would be better...
>>
>> My recollection is that it was, for one of our customers.
>>
>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>
>> Interesting. I don't recall any mail to that effect.
> 
> Here we go:
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
> 
> 
> We have been using the revert for quite a while in production and didn't
> notice any regression.
> 
>>
>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>> hindsight.
>>
>> Possible; I continue to think though that the present observation wants
>> properly understanding instead of more or less blindly undoing that
>> change.
> 
> To be honest, I think this is the other way around. You wrote and merged
> a patch with the following justification:
> 
> "
>     There is no need to hold the global domctl lock across domain_kill() -
>     the domain lock is fully sufficient here, and parallel cleanup after
>     multiple domains performs quite a bit better this way.
> "
> 
> Clearly, the original commit message is lacking details on the exact
> setups and numbers. But we now have two stakeholders with proof that
> your patch is harmful to the setup you claim perform better with your
> patch.
> 
> To me this is enough justification to revert the original patch. Anyone
> against the revert, should provide clear details of why the patch should
> not be reverted.

I second a revert.

I was concerned at the time that the claim was unsubstantiated, and now
there is plenty of evidence to counter the claim.

~Andrew


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-17  9:41         ` Andrew Cooper
@ 2021-09-17  9:47           ` Jan Beulich
  2021-09-17 16:01             ` Julien Grall
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-09-17  9:47 UTC (permalink / raw)
  To: Andrew Cooper, Julien Grall
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	xen-devel, Dmitry Isaikin, rjstone, raphning, Paul Durrant

On 17.09.2021 11:41, Andrew Cooper wrote:
> On 17/09/2021 10:27, Julien Grall wrote:
>> Hi,
>>
>> (+ some AWS folks)
>>
>> On 17/09/2021 11:17, Jan Beulich wrote:
>>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>>>
>>>>>> This significantly speeds up concurrent destruction of multiple
>>>>>> domains on x86.
>>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>>> improve locking during domain destruction"). There it was found to
>>>>> actually improve things;
>>>>
>>>> Was it?  I recall that it was simply an expectation that performance
>>>> would be better...
>>>
>>> My recollection is that it was, for one of our customers.
>>>
>>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>>
>>> Interesting. I don't recall any mail to that effect.
>>
>> Here we go:
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
>>
>>
>> We have been using the revert for quite a while in production and didn't
>> notice any regression.
>>
>>>
>>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>>> hindsight.
>>>
>>> Possible; I continue to think though that the present observation wants
>>> properly understanding instead of more or less blindly undoing that
>>> change.
>>
>> To be honest, I think this is the other way around. You wrote and merged
>> a patch with the following justification:
>>
>> "
>>     There is no need to hold the global domctl lock across domain_kill() -
>>     the domain lock is fully sufficient here, and parallel cleanup after
>>     multiple domains performs quite a bit better this way.
>> "
>>
>> Clearly, the original commit message is lacking details on the exact
>> setups and numbers. But we now have two stakeholders with proof that
>> your patch is harmful to the setup you claim perform better with your
>> patch.
>>
>> To me this is enough justification to revert the original patch. Anyone
>> against the revert, should provide clear details of why the patch should
>> not be reverted.
> 
> I second a revert.
> 
> I was concerned at the time that the claim was unsubstantiated, and now
> there is plenty of evidence to counter the claim.

Well, I won't object to a proper revert. I still think we'd better get to
the bottom of this, not the least because I thought there was agreement
that mid to long term we should get rid of global locking wherever
possible. Or are both of you saying that using a global lock here is
obviously fine? And does either of you have at least a theory to explain
the observation? I can only say that I find it puzzling.

Jan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-17  9:47           ` Jan Beulich
@ 2021-09-17 16:01             ` Julien Grall
  2021-09-20  8:19               ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Julien Grall @ 2021-09-17 16:01 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	xen-devel, Dmitry Isaikin, rjstone, raphning, Paul Durrant

Hi Jan,

On 17/09/2021 14:47, Jan Beulich wrote:
> On 17.09.2021 11:41, Andrew Cooper wrote:
>> On 17/09/2021 10:27, Julien Grall wrote:
>>> Hi,
>>>
>>> (+ some AWS folks)
>>>
>>> On 17/09/2021 11:17, Jan Beulich wrote:
>>>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>>>>
>>>>>>> This significantly speeds up concurrent destruction of multiple
>>>>>>> domains on x86.
>>>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>>>> improve locking during domain destruction"). There it was found to
>>>>>> actually improve things;
>>>>>
>>>>> Was it?  I recall that it was simply an expectation that performance
>>>>> would be better...
>>>>
>>>> My recollection is that it was, for one of our customers.
>>>>
>>>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>>>
>>>> Interesting. I don't recall any mail to that effect.
>>>
>>> Here we go:
>>>
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
>>>
>>>
>>> We have been using the revert for quite a while in production and didn't
>>> notice any regression.
>>>
>>>>
>>>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>>>> hindsight.
>>>>
>>>> Possible; I continue to think though that the present observation wants
>>>> properly understanding instead of more or less blindly undoing that
>>>> change.
>>>
>>> To be honest, I think this is the other way around. You wrote and merged
>>> a patch with the following justification:
>>>
>>> "
>>>      There is no need to hold the global domctl lock across domain_kill() -
>>>      the domain lock is fully sufficient here, and parallel cleanup after
>>>      multiple domains performs quite a bit better this way.
>>> "
>>>
>>> Clearly, the original commit message is lacking details on the exact
>>> setups and numbers. But we now have two stakeholders with proof that
>>> your patch is harmful to the setup you claim perform better with your
>>> patch.
>>>
>>> To me this is enough justification to revert the original patch. Anyone
>>> against the revert, should provide clear details of why the patch should
>>> not be reverted.
>>
>> I second a revert.
>>
>> I was concerned at the time that the claim was unsubstantiated, and now
>> there is plenty of evidence to counter the claim.
> 
> Well, I won't object to a proper revert. I still think we'd better get to
> the bottom of this, not the least because I thought there was agreement
> that mid to long term we should get rid of global locking wherever
> possible. Or are both of you saying that using a global lock here is
> obviously fine? And does either of you have at least a theory to explain
> the observation? I can only say that I find it puzzling.

I will quote what Hongyan wrote back on the first report:

"
The best solution is to make the heap scalable instead of a global
lock, but that is not going to be trivial.

Of course, another solution is to keep the domctl lock dropped in
domain_kill() but have another domain_kill lock so that competing
domain_kill()s will try to take that lock and back off with hypercall
continuation. But this is kind of hacky (we introduce a lock to reduce
spinlock contention elsewhere), which is probably not a solution but a
workaround.
"

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] domctl: hold domctl lock while domain is destroyed
  2021-09-17 16:01             ` Julien Grall
@ 2021-09-20  8:19               ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2021-09-20  8:19 UTC (permalink / raw)
  To: Julien Grall
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	xen-devel, Dmitry Isaikin, rjstone, raphning, Paul Durrant,
	Andrew Cooper

On 17.09.2021 18:01, Julien Grall wrote:
> I will quote what Hongyan wrote back on the first report:
> 
> "
> The best solution is to make the heap scalable instead of a global
> lock, but that is not going to be trivial.
> 
> Of course, another solution is to keep the domctl lock dropped in
> domain_kill() but have another domain_kill lock so that competing
> domain_kill()s will try to take that lock and back off with hypercall
> continuation. But this is kind of hacky (we introduce a lock to reduce
> spinlock contention elsewhere), which is probably not a solution but a
> workaround.
> "

Interesting. Is contention on the heap lock merely suspected or
was that lock proven to be the problem one?

Jan



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-20  8:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16 11:10 [PATCH v1] domctl: hold domctl lock while domain is destroyed Dmitry Isaikin
2021-09-16 12:30 ` Jan Beulich
2021-09-16 13:08   ` Roger Pau Monné
2021-09-16 17:52   ` Andrew Cooper
2021-09-17  6:17     ` Jan Beulich
2021-09-17  9:27       ` Julien Grall
2021-09-17  9:41         ` Andrew Cooper
2021-09-17  9:47           ` Jan Beulich
2021-09-17 16:01             ` Julien Grall
2021-09-20  8:19               ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.