All of lore.kernel.org
 help / color / mirror / Atom feed
* xen/arm: Domain not fully destroyed when using credit2
@ 2017-01-23 19:42 Julien Grall
  2017-01-24  0:16 ` Stefano Stabellini
  2017-01-24  8:20 ` Jan Beulich
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-23 19:42 UTC (permalink / raw)
  To: Stefano Stabellini, Dario Faggioli, George Dunlap, Andrew Cooper,
	Jan Beulich, Konrad Rzeszutek Wilk, Wei Liu, Ian Jackson,
	Tim Deegan
  Cc: xen-devel

Hi all,

Before someone dig into the scheduler, I don't think this is an issue in 
credit2 but the use of it highlight a bug in another component (I think 
RCU).

Whilst testing other patches today, I have noticed that some part of the 
resources allocated to a guest were not released during the destruction.

The configuration of the test is:
	- ARM platform with 6 cores
	- staging Xen with credit2 enabled by default
	- DOM0 using 2 pinned vCPUs

The test is creating a guest vCPUs and then destroyed. After the test, 
some resourced are not released (or could be released a long time
after).

Looking at the code, domain resources are released in 2 phases:
	- domain_destroy: called when there is no more reference on the domain 
(see put_domain)
	- complete_domain_destroy: called when the RCU is quiescent

The function domain_destroy will setup the RCU callback 
(complete_domain_destroy) by calling call_rcu. call_rcu will add the 
callback into the RCU list and then will may send an IPI (see 
force_quiescent_state) if the threshold reached. This IPI is here to 
make sure all CPUs are quiescent before calling the callbacks (e.g 
complete_domain_destroy). In my case, the threshold has not reached and 
therefore an IPI is not sent.

On ARM, the idle will run when the pCPU has no work to do. This loop 
will wait to receive an interrupt (see wfi) and check if there is some 
work to do when the CPU has waken-up (i.e an interrupt was received).

The problem I encountered is the idle CPU will never receive interrupts 
(no timer, nor IPI...) and therefore never check whether the RCU has 
some work to do.

 From my understanding, this is a bug in how RCU is handled (see comment 
above rcu_start_batch), it expects each CPU (no broadcast) to check 
whether there is RCU work. But this is relying on someone else (timer?) 
to fire an interrupt.

Any incoming interrupts will make a pCPU checking the RCU state. On ARM, 
the biggest source of IPI was credit1 or timer if a guest vCPU was 
scheduled on that pCPU. But it looks like the IPI traffic with credit2 
was reduced to none (which is a really good thing :)), and no guest 
timer was scheduled because no vCPU ever run on this pCPU.

I think the bug has always been here (both ARM and x86), but never 
detected because any incoming interrupts will make the pCPU to check the 
RCU state.

However, I am not sure how to resolve this issue. Any thoughts?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall
@ 2017-01-24  0:16 ` Stefano Stabellini
  2017-01-24 12:52   ` Julien Grall
  2017-01-24  8:20 ` Jan Beulich
  1 sibling, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2017-01-24  0:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel

On Mon, 23 Jan 2017, Julien Grall wrote:
> Hi all,
> 
> Before someone dig into the scheduler, I don't think this is an issue in
> credit2 but the use of it highlight a bug in another component (I think RCU).
> 
> Whilst testing other patches today, I have noticed that some part of the
> resources allocated to a guest were not released during the destruction.
> 
> The configuration of the test is:
> 	- ARM platform with 6 cores
> 	- staging Xen with credit2 enabled by default
> 	- DOM0 using 2 pinned vCPUs
> 
> The test is creating a guest vCPUs and then destroyed. After the test, some
> resourced are not released (or could be released a long time
> after).
> 
> Looking at the code, domain resources are released in 2 phases:
> 	- domain_destroy: called when there is no more reference on the domain
> (see put_domain)
> 	- complete_domain_destroy: called when the RCU is quiescent
> 
> The function domain_destroy will setup the RCU callback
> (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback
> into the RCU list and then will may send an IPI (see force_quiescent_state) if
> the threshold reached. This IPI is here to make sure all CPUs are quiescent
> before calling the callbacks (e.g complete_domain_destroy). In my case, the
> threshold has not reached and therefore an IPI is not sent.
> 
> On ARM, the idle will run when the pCPU has no work to do. This loop will wait
> to receive an interrupt (see wfi) and check if there is some work to do when
> the CPU has waken-up (i.e an interrupt was received).
> 
> The problem I encountered is the idle CPU will never receive interrupts (no
> timer, nor IPI...) and therefore never check whether the RCU has some work to
> do.
> 
> From my understanding, this is a bug in how RCU is handled (see comment above
> rcu_start_batch), it expects each CPU (no broadcast) to check whether there is
> RCU work. But this is relying on someone else (timer?) to fire an interrupt.
> 
> Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the
> biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on
> that pCPU. But it looks like the IPI traffic with credit2 was reduced to none
> (which is a really good thing :)), and no guest timer was scheduled because no
> vCPU ever run on this pCPU.
> 
> I think the bug has always been here (both ARM and x86), but never detected
> because any incoming interrupts will make the pCPU to check the RCU state.
> 
> However, I am not sure how to resolve this issue. Any thoughts?

Well done for finding the bug!

Sending an IPI on call_rcu is easy, but it would be better not to wake
up the sleeping cpus at all. If they are running the idle_loop, they
cannot be holding any rcu references for the domain which is about to be
destroyed, right?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall
  2017-01-24  0:16 ` Stefano Stabellini
@ 2017-01-24  8:20 ` Jan Beulich
  2017-01-24 10:50   ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2017-01-24  8:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel

>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
> Whilst testing other patches today, I have noticed that some part of the 
> resources allocated to a guest were not released during the destruction.
> 
> The configuration of the test is:
> 	- ARM platform with 6 cores
> 	- staging Xen with credit2 enabled by default
> 	- DOM0 using 2 pinned vCPUs
> 
> The test is creating a guest vCPUs and then destroyed. After the test, 
> some resourced are not released (or could be released a long time
> after).
> 
> Looking at the code, domain resources are released in 2 phases:
> 	- domain_destroy: called when there is no more reference on the domain 
> (see put_domain)
> 	- complete_domain_destroy: called when the RCU is quiescent
> 
> The function domain_destroy will setup the RCU callback 
> (complete_domain_destroy) by calling call_rcu. call_rcu will add the 
> callback into the RCU list and then will may send an IPI (see 
> force_quiescent_state) if the threshold reached. This IPI is here to 
> make sure all CPUs are quiescent before calling the callbacks (e.g 
> complete_domain_destroy). In my case, the threshold has not reached and 
> therefore an IPI is not sent.

But wait - isn't it the nature of RCU that it may take arbitrary time
until the actual call(s) happen(s)? If an upper limit is required by
a user of RCU, I think it would need to be that entity to arrange
for early expiry. I notice in this context that we don't even have
synchronize_rcu() in our sources.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24  8:20 ` Jan Beulich
@ 2017-01-24 10:50   ` Julien Grall
  2017-01-24 11:02     ` Jan Beulich
  2017-01-24 12:53     ` Dario Faggioli
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-24 10:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel

Hi Jan,

On 24/01/2017 08:20, Jan Beulich wrote:
>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
>> Whilst testing other patches today, I have noticed that some part of the
>> resources allocated to a guest were not released during the destruction.
>>
>> The configuration of the test is:
>> 	- ARM platform with 6 cores
>> 	- staging Xen with credit2 enabled by default
>> 	- DOM0 using 2 pinned vCPUs
>>
>> The test is creating a guest vCPUs and then destroyed. After the test,
>> some resourced are not released (or could be released a long time
>> after).
>>
>> Looking at the code, domain resources are released in 2 phases:
>> 	- domain_destroy: called when there is no more reference on the domain
>> (see put_domain)
>> 	- complete_domain_destroy: called when the RCU is quiescent
>>
>> The function domain_destroy will setup the RCU callback
>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the
>> callback into the RCU list and then will may send an IPI (see
>> force_quiescent_state) if the threshold reached. This IPI is here to
>> make sure all CPUs are quiescent before calling the callbacks (e.g
>> complete_domain_destroy). In my case, the threshold has not reached and
>> therefore an IPI is not sent.
>
> But wait - isn't it the nature of RCU that it may take arbitrary time
> until the actual call(s) happen(s)?

Today this arbitrary time could be infinite if an idle pCPU does not 
receive an interrupt. So some part of domain resource will never be freed.

If I am power-cycling a domain in loop, after some time the toolstack 
will fail to allocate memory because of exhausted resources. Previous 
instance of the domain was not yet fully destroyed (e.g 
complete_domain_destroy was not called).

> If an upper limit is required by
> a user of RCU, I think it would need to be that entity to arrange
> for early expiry.

This is happening with all the user and not only a domain. Looking at 
the code, there are already some upper limit:
	- call_rcu will call force_quiescent_state if the number of element in 
the RCU queue is > 10000
	- the RCU has a grace period (not sure how long), but no timer to 
ensure the RCU will be called

Reducing the threshold in call_rcu (see qhimark) will not help as you 
may still face memory exhaustion (see above). So I think the only best 
solution is to actually implement properly the grace period.

> I notice in this context that we don't even have
> synchronize_rcu() in our sources.

I don't think this is a problem here if we handle properly the grace period.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 10:50   ` Julien Grall
@ 2017-01-24 11:02     ` Jan Beulich
  2017-01-24 12:30       ` Julien Grall
  2017-01-24 12:53     ` Dario Faggioli
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2017-01-24 11:02 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel

>>> On 24.01.17 at 11:50, <julien.grall@arm.com> wrote:
> On 24/01/2017 08:20, Jan Beulich wrote:
>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
>>> Whilst testing other patches today, I have noticed that some part of the
>>> resources allocated to a guest were not released during the destruction.
>>>
>>> The configuration of the test is:
>>> 	- ARM platform with 6 cores
>>> 	- staging Xen with credit2 enabled by default
>>> 	- DOM0 using 2 pinned vCPUs
>>>
>>> The test is creating a guest vCPUs and then destroyed. After the test,
>>> some resourced are not released (or could be released a long time
>>> after).
>>>
>>> Looking at the code, domain resources are released in 2 phases:
>>> 	- domain_destroy: called when there is no more reference on the domain
>>> (see put_domain)
>>> 	- complete_domain_destroy: called when the RCU is quiescent
>>>
>>> The function domain_destroy will setup the RCU callback
>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the
>>> callback into the RCU list and then will may send an IPI (see
>>> force_quiescent_state) if the threshold reached. This IPI is here to
>>> make sure all CPUs are quiescent before calling the callbacks (e.g
>>> complete_domain_destroy). In my case, the threshold has not reached and
>>> therefore an IPI is not sent.
>>
>> But wait - isn't it the nature of RCU that it may take arbitrary time
>> until the actual call(s) happen(s)?
> 
> Today this arbitrary time could be infinite if an idle pCPU does not 
> receive an interrupt. So some part of domain resource will never be freed.
> 
> If I am power-cycling a domain in loop, after some time the toolstack 
> will fail to allocate memory because of exhausted resources. Previous 
> instance of the domain was not yet fully destroyed (e.g 
> complete_domain_destroy was not called).
> 
>> If an upper limit is required by
>> a user of RCU, I think it would need to be that entity to arrange
>> for early expiry.
> 
> This is happening with all the user and not only a domain. Looking at 
> the code, there are already some upper limit:
> 	- call_rcu will call force_quiescent_state if the number of element in 
> the RCU queue is > 10000
> 	- the RCU has a grace period (not sure how long), but no timer to 
> ensure the RCU will be called

This remark in parentheses is quite relevant here, I think: There
simply is no upper bound, aiui. This is a conceptional aspect. But
I'm in no way an RCU expert, so I may well be entirely off.

> Reducing the threshold in call_rcu (see qhimark) will not help as you 
> may still face memory exhaustion (see above). So I think the only best 
> solution is to actually implement properly the grace period.

Well, with the above in mind - what does "properly" mean here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 11:02     ` Jan Beulich
@ 2017-01-24 12:30       ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-24 12:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel

Hi,

On 24/01/17 11:02, Jan Beulich wrote:
>>>> On 24.01.17 at 11:50, <julien.grall@arm.com> wrote:
>> On 24/01/2017 08:20, Jan Beulich wrote:
>>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
>>>> Whilst testing other patches today, I have noticed that some part of the
>>>> resources allocated to a guest were not released during the destruction.
>>>>
>>>> The configuration of the test is:
>>>> 	- ARM platform with 6 cores
>>>> 	- staging Xen with credit2 enabled by default
>>>> 	- DOM0 using 2 pinned vCPUs
>>>>
>>>> The test is creating a guest vCPUs and then destroyed. After the test,
>>>> some resourced are not released (or could be released a long time
>>>> after).
>>>>
>>>> Looking at the code, domain resources are released in 2 phases:
>>>> 	- domain_destroy: called when there is no more reference on the domain
>>>> (see put_domain)
>>>> 	- complete_domain_destroy: called when the RCU is quiescent
>>>>
>>>> The function domain_destroy will setup the RCU callback
>>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the
>>>> callback into the RCU list and then will may send an IPI (see
>>>> force_quiescent_state) if the threshold reached. This IPI is here to
>>>> make sure all CPUs are quiescent before calling the callbacks (e.g
>>>> complete_domain_destroy). In my case, the threshold has not reached and
>>>> therefore an IPI is not sent.
>>>
>>> But wait - isn't it the nature of RCU that it may take arbitrary time
>>> until the actual call(s) happen(s)?
>>
>> Today this arbitrary time could be infinite if an idle pCPU does not
>> receive an interrupt. So some part of domain resource will never be freed.
>>
>> If I am power-cycling a domain in loop, after some time the toolstack
>> will fail to allocate memory because of exhausted resources. Previous
>> instance of the domain was not yet fully destroyed (e.g
>> complete_domain_destroy was not called).
>>
>>> If an upper limit is required by
>>> a user of RCU, I think it would need to be that entity to arrange
>>> for early expiry.
>>
>> This is happening with all the user and not only a domain. Looking at
>> the code, there are already some upper limit:
>> 	- call_rcu will call force_quiescent_state if the number of element in
>> the RCU queue is > 10000
>> 	- the RCU has a grace period (not sure how long), but no timer to
>> ensure the RCU will be called
>
> This remark in parentheses is quite relevant here, I think: There
> simply is no upper bound, aiui. This is a conceptional aspect. But
> I'm in no way an RCU expert, so I may well be entirely off.

I would be surprised that it is a normal behavior to have an idle pCPU 
(because of wfi or equivalent instruction on x86) blocking the RCU 
forever as it is the case today.

>
>> Reducing the threshold in call_rcu (see qhimark) will not help as you
>> may still face memory exhaustion (see above). So I think the only best
>> solution is to actually implement properly the grace period.
>
> Well, with the above in mind - what does "properly" mean here?

By properly, I meant that either the idle pCPU should not be taken into 
account into the grace period or we need a timer (or else) on the idle 
pCPU to check whether there is some work to do (see rcu_pending).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24  0:16 ` Stefano Stabellini
@ 2017-01-24 12:52   ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-24 12:52 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli,
	Ian Jackson, Tim Deegan, Jan Beulich, xen-devel

Hi Stefano,

On 24/01/17 00:16, Stefano Stabellini wrote:
> On Mon, 23 Jan 2017, Julien Grall wrote:
>> Hi all,
>>
>> Before someone dig into the scheduler, I don't think this is an issue in
>> credit2 but the use of it highlight a bug in another component (I think RCU).
>>
>> Whilst testing other patches today, I have noticed that some part of the
>> resources allocated to a guest were not released during the destruction.
>>
>> The configuration of the test is:
>> 	- ARM platform with 6 cores
>> 	- staging Xen with credit2 enabled by default
>> 	- DOM0 using 2 pinned vCPUs
>>
>> The test is creating a guest vCPUs and then destroyed. After the test, some
>> resourced are not released (or could be released a long time
>> after).
>>
>> Looking at the code, domain resources are released in 2 phases:
>> 	- domain_destroy: called when there is no more reference on the domain
>> (see put_domain)
>> 	- complete_domain_destroy: called when the RCU is quiescent
>>
>> The function domain_destroy will setup the RCU callback
>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback
>> into the RCU list and then will may send an IPI (see force_quiescent_state) if
>> the threshold reached. This IPI is here to make sure all CPUs are quiescent
>> before calling the callbacks (e.g complete_domain_destroy). In my case, the
>> threshold has not reached and therefore an IPI is not sent.
>>
>> On ARM, the idle will run when the pCPU has no work to do. This loop will wait
>> to receive an interrupt (see wfi) and check if there is some work to do when
>> the CPU has waken-up (i.e an interrupt was received).
>>
>> The problem I encountered is the idle CPU will never receive interrupts (no
>> timer, nor IPI...) and therefore never check whether the RCU has some work to
>> do.
>>
>> From my understanding, this is a bug in how RCU is handled (see comment above
>> rcu_start_batch), it expects each CPU (no broadcast) to check whether there is
>> RCU work. But this is relying on someone else (timer?) to fire an interrupt.
>>
>> Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the
>> biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on
>> that pCPU. But it looks like the IPI traffic with credit2 was reduced to none
>> (which is a really good thing :)), and no guest timer was scheduled because no
>> vCPU ever run on this pCPU.
>>
>> I think the bug has always been here (both ARM and x86), but never detected
>> because any incoming interrupts will make the pCPU to check the RCU state.
>>
>> However, I am not sure how to resolve this issue. Any thoughts?
>
> Well done for finding the bug!
>
> Sending an IPI on call_rcu is easy, but it would be better not to wake
> up the sleeping cpus at all. If they are running the idle_loop, they
> cannot be holding any rcu references for the domain which is about to be
> destroyed, right?

The problem is not only about domain but anything using the RCU.

idle pCPU may have to process softirq time to time. I can't find any 
reason for a softirq to be forbidden to hold an RCU reference. So I 
think we have to ensure that this pCPU is really doing nothing.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 10:50   ` Julien Grall
  2017-01-24 11:02     ` Jan Beulich
@ 2017-01-24 12:53     ` Dario Faggioli
  2017-01-24 13:04       ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-01-24 12:53 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2101 bytes --]

On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote:
> On 24/01/2017 08:20, Jan Beulich wrote:
> > > > > On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
> > > The function domain_destroy will setup the RCU callback
> > > (complete_domain_destroy) by calling call_rcu. call_rcu will add
> > > the
> > > callback into the RCU list and then will may send an IPI (see
> > > force_quiescent_state) if the threshold reached. This IPI is here
> > > to
> > > make sure all CPUs are quiescent before calling the callbacks
> > > (e.g
> > > complete_domain_destroy). In my case, the threshold has not
> > > reached and
> > > therefore an IPI is not sent.
> > 
> > But wait - isn't it the nature of RCU that it may take arbitrary
> > time
> > until the actual call(s) happen(s)?
> 
> Today this arbitrary time could be infinite if an idle pCPU does not 
> receive an interrupt. So some part of domain resource will never be
> freed.
> 
> If I am power-cycling a domain in loop, after some time the
> toolstack 
> will fail to allocate memory because of exhausted resources.
> Previous 
> instance of the domain was not yet fully destroyed (e.g 
> complete_domain_destroy was not called).
> 
Do you have a script and/or some more info for letting me try to
reproduce it (e.g., you say some otf the vCPUs are pinned, which one?
etc)?

I'm a bit curious about why you're saying this is being exposed by
using Credit2. In fact:
 1) I've power-cycled quite a few domains in these last months, while 
    under Credit2, and I don't think I have encountered it on x86;
 2) I see how it may be related to Credit2 being more deterministic 
    and not trying to schedule stuff around pseudo-randomly like 
    Credit1 does... but I'd like to try investigating a bit more.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 12:53     ` Dario Faggioli
@ 2017-01-24 13:04       ` Julien Grall
  2017-01-24 13:05         ` Julien Grall
  2017-01-24 13:19         ` Dario Faggioli
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-24 13:04 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

Hi Dario,

On 24/01/17 12:53, Dario Faggioli wrote:
> On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote:
>> On 24/01/2017 08:20, Jan Beulich wrote:
>>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
>>>> The function domain_destroy will setup the RCU callback
>>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add
>>>> the
>>>> callback into the RCU list and then will may send an IPI (see
>>>> force_quiescent_state) if the threshold reached. This IPI is here
>>>> to
>>>> make sure all CPUs are quiescent before calling the callbacks
>>>> (e.g
>>>> complete_domain_destroy). In my case, the threshold has not
>>>> reached and
>>>> therefore an IPI is not sent.
>>>
>>> But wait - isn't it the nature of RCU that it may take arbitrary
>>> time
>>> until the actual call(s) happen(s)?
>>
>> Today this arbitrary time could be infinite if an idle pCPU does not
>> receive an interrupt. So some part of domain resource will never be
>> freed.
>>
>> If I am power-cycling a domain in loop, after some time the
>> toolstack
>> will fail to allocate memory because of exhausted resources.
>> Previous
>> instance of the domain was not yet fully destroyed (e.g
>> complete_domain_destroy was not called).
>>
> Do you have a script and/or some more info for letting me try to
> reproduce it (e.g., you say some otf the vCPUs are pinned, which one?
> etc)?

That was mentioned in my first e-mail :). My configuration is:
	- ARM platform with 6 cores
	- staging Xen with credit2 enabled by default
	- DOM0 using 2 pinned vCPUs
	- Guest using 2 vCPUs (not pinned)

The script is really simple:

for i in `seq 1 10`; do
	sudo xl create ~/works/guest/guest.cfg;
	sudo xl destroy guest;
done

>
> I'm a bit curious about why you're saying this is being exposed by
> using Credit2.

It is been exposed by Credit2 because compared to Credit1 there is no 
interrupt traffic made by the scheduler. On ARM with credit2 the 
interrupt traffic is reduced to none for idle pCPU.

In fact:
>  1) I've power-cycled quite a few domains in these last months, while
>     under Credit2, and I don't think I have encountered it on x86;

AFAIU, IPI is often the only way to broadcast some instruction on x86. 
So compare to ARM, you have likely an higher interrupt traffic.

Also, the problem is not obvious to spot unless you look at the free 
memory (via xl info) before and after. Another solution is printing a 
message in both domain_destroy and complete_domain_destroy.

You will spot the first message directly. The latter may never be printed.

>  2) I see how it may be related to Credit2 being more deterministic
>     and not trying to schedule stuff around pseudo-randomly like
>     Credit1 does... but I'd like to try investigating a bit more.

I am able to reliable reproduce on a Juno-r2.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 13:04       ` Julien Grall
@ 2017-01-24 13:05         ` Julien Grall
  2017-01-24 13:19         ` Dario Faggioli
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-24 13:05 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel



On 24/01/17 13:04, Julien Grall wrote:
> Hi Dario,
>
> On 24/01/17 12:53, Dario Faggioli wrote:
>> On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote:
>>> On 24/01/2017 08:20, Jan Beulich wrote:
>>>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote:
>>>>> The function domain_destroy will setup the RCU callback
>>>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add
>>>>> the
>>>>> callback into the RCU list and then will may send an IPI (see
>>>>> force_quiescent_state) if the threshold reached. This IPI is here
>>>>> to
>>>>> make sure all CPUs are quiescent before calling the callbacks
>>>>> (e.g
>>>>> complete_domain_destroy). In my case, the threshold has not
>>>>> reached and
>>>>> therefore an IPI is not sent.
>>>>
>>>> But wait - isn't it the nature of RCU that it may take arbitrary
>>>> time
>>>> until the actual call(s) happen(s)?
>>>
>>> Today this arbitrary time could be infinite if an idle pCPU does not
>>> receive an interrupt. So some part of domain resource will never be
>>> freed.
>>>
>>> If I am power-cycling a domain in loop, after some time the
>>> toolstack
>>> will fail to allocate memory because of exhausted resources.
>>> Previous
>>> instance of the domain was not yet fully destroyed (e.g
>>> complete_domain_destroy was not called).
>>>
>> Do you have a script and/or some more info for letting me try to
>> reproduce it (e.g., you say some otf the vCPUs are pinned, which one?
>> etc)?
>
> That was mentioned in my first e-mail :). My configuration is:
>     - ARM platform with 6 cores
>     - staging Xen with credit2 enabled by default
>     - DOM0 using 2 pinned vCPUs

To clarify here, DOM0 has only 2 vCPUs. Both are pinned.

>     - Guest using 2 vCPUs (not pinned)
>
> The script is really simple:
>
> for i in `seq 1 10`; do
>     sudo xl create ~/works/guest/guest.cfg;
>     sudo xl destroy guest;
> done
>
>>
>> I'm a bit curious about why you're saying this is being exposed by
>> using Credit2.
>
> It is been exposed by Credit2 because compared to Credit1 there is no
> interrupt traffic made by the scheduler. On ARM with credit2 the
> interrupt traffic is reduced to none for idle pCPU.
>
> In fact:
>>  1) I've power-cycled quite a few domains in these last months, while
>>     under Credit2, and I don't think I have encountered it on x86;
>
> AFAIU, IPI is often the only way to broadcast some instruction on x86.
> So compare to ARM, you have likely an higher interrupt traffic.
>
> Also, the problem is not obvious to spot unless you look at the free
> memory (via xl info) before and after. Another solution is printing a
> message in both domain_destroy and complete_domain_destroy.
>
> You will spot the first message directly. The latter may never be printed.
>
>>  2) I see how it may be related to Credit2 being more deterministic
>>     and not trying to schedule stuff around pseudo-randomly like
>>     Credit1 does... but I'd like to try investigating a bit more.
>
> I am able to reliable reproduce on a Juno-r2.
>
> Cheers,
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 13:04       ` Julien Grall
  2017-01-24 13:05         ` Julien Grall
@ 2017-01-24 13:19         ` Dario Faggioli
  2017-01-24 13:24           ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-01-24 13:19 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2745 bytes --]

On Tue, 2017-01-24 at 13:04 +0000, Julien Grall wrote:
> On 24/01/17 12:53, Dario Faggioli wrote:
> > Do you have a script and/or some more info for letting me try to
> > reproduce it (e.g., you say some otf the vCPUs are pinned, which
> > one?
> > etc)?
> 
> That was mentioned in my first e-mail :). My configuration is:
> 	- ARM platform with 6 cores
> 	- staging Xen with credit2 enabled by default
> 	- DOM0 using 2 pinned vCPUs
> 	- Guest using 2 vCPUs (not pinned)
> 
Yeah, but some of the details were either missing, or not clear to
me... Sorry for bothering and thanks for re-stating this here. :-)

How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on
which _only_ Dom0 and _no_ DomU can run)?

> The script is really simple:
> 
> for i in `seq 1 10`; do
> 	sudo xl create ~/works/guest/guest.cfg;
> 	sudo xl destroy guest;
> done
> 
Ok.

> > I'm a bit curious about why you're saying this is being exposed by
> > using Credit2.
> 
> It is been exposed by Credit2 because compared to Credit1 there is
> no 
> interrupt traffic made by the scheduler. 
>
So, when you say "no interrupt traffic", do you perhaps mean that
SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are you
really talking about actual interrupts (either inter-processor or not)?

> On ARM with credit2 the 
> interrupt traffic is reduced to none for idle pCPU.
> 
Yes, but _iff_ we're talking about SCHEDULE_SOFTIRQ events, for a truly
idle pCPU (e.g., if I use vcpu-pin to *forbid* every vCPU to execute
there), that's _zero_ also for Credit1, at least on x86 (I've just
tried)!

Perhaps this is too extreme/unrealistic of an idle situation, but I'm
trying to understand the problem. :-)

> In fact:
> > 
> >  1) I've power-cycled quite a few domains in these last months,
> > while
> >     under Credit2, and I don't think I have encountered it on x86;
> 
> AFAIU, IPI is often the only way to broadcast some instruction on
> x86. 
> So compare to ARM, you have likely an higher interrupt traffic.
> 
Right.

> Also, the problem is not obvious to spot unless you look at the free 
> memory (via xl info) before and after. Another solution is printing
> a 
> message in both domain_destroy and complete_domain_destroy.
> 
> You will spot the first message directly. The latter may never be
> printed.
> 
Yep, I was already instrumenting the code like this... I'll let you
know.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 13:19         ` Dario Faggioli
@ 2017-01-24 13:24           ` Julien Grall
  2017-01-24 13:40             ` Dario Faggioli
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2017-01-24 13:24 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel



On 24/01/17 13:19, Dario Faggioli wrote:
> On Tue, 2017-01-24 at 13:04 +0000, Julien Grall wrote:
>> On 24/01/17 12:53, Dario Faggioli wrote:
>>> Do you have a script and/or some more info for letting me try to
>>> reproduce it (e.g., you say some otf the vCPUs are pinned, which
>>> one?
>>> etc)?
>>
>> That was mentioned in my first e-mail :). My configuration is:
>> 	- ARM platform with 6 cores
>> 	- staging Xen with credit2 enabled by default
>> 	- DOM0 using 2 pinned vCPUs
>> 	- Guest using 2 vCPUs (not pinned)
>>
> Yeah, but some of the details were either missing, or not clear to
> me... Sorry for bothering and thanks for re-stating this here. :-)
>
> How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on
> which _only_ Dom0 and _no_ DomU can run)?

I have dom0_vcpu_pins on Xen command line option (so I guess only 
pinned?), no further configuration for DOM0.

>
>> The script is really simple:
>>
>> for i in `seq 1 10`; do
>> 	sudo xl create ~/works/guest/guest.cfg;
>> 	sudo xl destroy guest;
>> done
>>
> Ok.
>
>>> I'm a bit curious about why you're saying this is being exposed by
>>> using Credit2.
>>
>> It is been exposed by Credit2 because compared to Credit1 there is
>> no
>> interrupt traffic made by the scheduler.
>>
> So, when you say "no interrupt traffic", do you perhaps mean that
> SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are you
> really talking about actual interrupts (either inter-processor or not)?

I am talking about actual physical interrupts. The traffic is reduced to 
none with credit2 on idle pCPU.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 13:24           ` Julien Grall
@ 2017-01-24 13:40             ` Dario Faggioli
  2017-01-24 13:49               ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-01-24 13:40 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --]

On Tue, 2017-01-24 at 13:24 +0000, Julien Grall wrote:
> On 24/01/17 13:19, Dario Faggioli wrote:
> > How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on
> > which _only_ Dom0 and _no_ DomU can run)?
> 
> I have dom0_vcpu_pins on Xen command line option (so I guess only 
> pinned?), no further configuration for DOM0.
> 
Ok, thanks. Yeah, that means Dom0 vCPU 0 is pinned to pCPU 0, and vCPU
1 is pinned to pCPU 1. And it's not excluside, i.e., other domains can
run on pCPUs 0 and 1, if the scheduler decides so (because they're
free, because the scheduler decides to preempt Dom0, etc).

This is of course fine, I just wanted to make sure I was understanding
the setup.

> > So, when you say "no interrupt traffic", do you perhaps mean that
> > SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are
> > you
> > really talking about actual interrupts (either inter-processor or
> > not)?
> 
> I am talking about actual physical interrupts. The traffic is reduced
> to 
> none with credit2 on idle pCPU.
> 
Ah, wow... And how --forgive my naiveness-- do you measure / check
that?

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 13:40             ` Dario Faggioli
@ 2017-01-24 13:49               ` Julien Grall
  2017-01-24 14:16                 ` Dario Faggioli
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2017-01-24 13:49 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

Hi,

On 24/01/17 13:40, Dario Faggioli wrote:
> On Tue, 2017-01-24 at 13:24 +0000, Julien Grall wrote:
>> On 24/01/17 13:19, Dario Faggioli wrote:
>>> How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on
>>> which _only_ Dom0 and _no_ DomU can run)?
>>
>> I have dom0_vcpu_pins on Xen command line option (so I guess only
>> pinned?), no further configuration for DOM0.
>>
> Ok, thanks. Yeah, that means Dom0 vCPU 0 is pinned to pCPU 0, and vCPU
> 1 is pinned to pCPU 1. And it's not excluside, i.e., other domains can
> run on pCPUs 0 and 1, if the scheduler decides so (because they're
> free, because the scheduler decides to preempt Dom0, etc).
>
> This is of course fine, I just wanted to make sure I was understanding
> the setup.
>
>>> So, when you say "no interrupt traffic", do you perhaps mean that
>>> SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are
>>> you
>>> really talking about actual interrupts (either inter-processor or
>>> not)?
>>
>> I am talking about actual physical interrupts. The traffic is reduced
>> to
>> none with credit2 on idle pCPU.
>>
> Ah, wow... And how --forgive my naiveness-- do you measure / check
> that?

I added a print in the interrupt path (gic_interrupt for ARM) to dump 
the interrupt number. This needs to be restrict to CPU2 and above to 
avoid been flooded:

if ( smp_processor_id() > 1 )
   printk("%s: CPU%u IRQ%u\n", __FUNCTION__, smp_processor_id(), irq);

I also added a print in the idle loop before and after the idling 
instruction (wfi for ARM, pm_idle for x86 I think). You can see the CPU 
to go in idle mode but never coming back.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 13:49               ` Julien Grall
@ 2017-01-24 14:16                 ` Dario Faggioli
  2017-01-24 15:06                   ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-01-24 14:16 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1305 bytes --]

On Tue, 2017-01-24 at 13:49 +0000, Julien Grall wrote:
> On 24/01/17 13:40, Dario Faggioli wrote:
> > Ah, wow... And how --forgive my naiveness-- do you measure / check
> > that?
> 
> I added a print in the interrupt path (gic_interrupt for ARM) to
> dump 
> the interrupt number. This needs to be restrict to CPU2 and above to 
> avoid been flooded:
> 
> if ( smp_processor_id() > 1 )
>    printk("%s: CPU%u IRQ%u\n", __FUNCTION__, smp_processor_id(),
> irq);
> 
Ok.

> I also added a print in the idle loop before and after the idling 
> instruction (wfi for ARM, pm_idle for x86 I think). You can see the
> CPU 
> to go in idle mode but never coming back.
> 
I see. Yes, this is very different on x86.

There, we have tracing (BTW, did that made it to ARM eventually?) and
there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of your
printk-s.

And if I look at it, I do see even totally idle (from the scheduler
point of view) pCPUs, I indeed see them going back and forth from and
to C3.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 14:16                 ` Dario Faggioli
@ 2017-01-24 15:06                   ` Julien Grall
  2017-01-25 11:10                     ` Dario Faggioli
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2017-01-24 15:06 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel



On 24/01/17 14:16, Dario Faggioli wrote:
> On Tue, 2017-01-24 at 13:49 +0000, Julien Grall wrote:
>> On 24/01/17 13:40, Dario Faggioli wrote:
>>> Ah, wow... And how --forgive my naiveness-- do you measure / check
>>> that?
>>
>> I added a print in the interrupt path (gic_interrupt for ARM) to
>> dump
>> the interrupt number. This needs to be restrict to CPU2 and above to
>> avoid been flooded:
>>
>> if ( smp_processor_id() > 1 )
>>    printk("%s: CPU%u IRQ%u\n", __FUNCTION__, smp_processor_id(),
>> irq);
>>
> Ok.
>
>> I also added a print in the idle loop before and after the idling
>> instruction (wfi for ARM, pm_idle for x86 I think). You can see the
>> CPU
>> to go in idle mode but never coming back.
>>
> I see. Yes, this is very different on x86.
>
> There, we have tracing (BTW, did that made it to ARM eventually?) and
> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of your
> printk-s.

There is patch on the ML for xentrace support (see [1]) but nothing has 
been upstreamed yet. Waiting for a new version from the contributor.

>
> And if I look at it, I do see even totally idle (from the scheduler
> point of view) pCPUs, I indeed see them going back and forth from and
> to C3.

My knowledge on x86 is limited. When does a CPU decides to leave the 
idle mode?

In the case of ARM, the wfi instruction will put the CPU in idle mode 
until an interrupt is received.

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2016-04/msg00464.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-24 15:06                   ` Julien Grall
@ 2017-01-25 11:10                     ` Dario Faggioli
  2017-01-25 12:38                       ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-01-25 11:10 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2492 bytes --]

On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote:
> On 24/01/17 14:16, Dario Faggioli wrote:
> > There, we have tracing (BTW, did that made it to ARM eventually?)
> > and
> > there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of
> > your
> > printk-s.
> 
> There is patch on the ML for xentrace support (see [1]) but nothing
> has 
> been upstreamed yet. Waiting for a new version from the contributor.
> 
Yep, that was I was remembering, and referring to. Thanks for the
update.

> > And if I look at it, I do see even totally idle (from the scheduler
> > point of view) pCPUs, I indeed see them going back and forth from
> > and
> > to C3.
> 
> My knowledge on x86 is limited. When does a CPU decides to leave the 
> idle mode?
> 
I'm not an expert of that part either. Jan and Andrew for sure know
best how monitor/mwait works (both in general, and our own
implementation).

What I know (and can quickly infer from glancing at the code), is that
timers are certainly involved.

In fact, we wake up when the most imminent timer would expire (see
mwait_idle_with_hints()), and a timer set by the scheduler fully
qualifies as being the one (if it's the most imminent).

My point was that, still from scheduling perspective, neither Credit1
nor Credit2 sets a wakeup timer for idle pCPUs.

Well, in Credit1, the master_ticker timer is never stopped (while,
e.g., the per-pCPU tick is stopped before entering deep sleep,
via sched_tick_suspend(), see commit 964fae8ac), but that's only 1
pCPU.

> In the case of ARM, the wfi instruction will put the CPU in idle
> mode 
> until an interrupt is received.
> 
Just looking up references for MWAIT, I've found this:
(http://x86.renejeschke.de/html/file_module_x86_id_215.html)

"A store to the address range armed by the MONITOR instruction, an
interrupt, an NMI or SMI, a debug exception, a machine check exception,
the BINIT# signal, the INIT# signal, or the RESET# signal will exit the
implementation-dependent-optimized state. Note that an interrupt will
cause the processor to exit only if the state was entered with
interrupts enabled."

So, yeah, interrupt, as expectable, wakes x86 up.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-25 11:10                     ` Dario Faggioli
@ 2017-01-25 12:38                       ` Julien Grall
  2017-01-25 12:40                         ` Andrew Cooper
  2017-01-25 16:00                         ` Dario Faggioli
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-25 12:38 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

Hi Dario,

On 25/01/17 11:10, Dario Faggioli wrote:
> On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote:
>> On 24/01/17 14:16, Dario Faggioli wrote:
>>> There, we have tracing (BTW, did that made it to ARM eventually?)
>>> and
>>> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of
>>> your
>>> printk-s.
>>
>> There is patch on the ML for xentrace support (see [1]) but nothing
>> has
>> been upstreamed yet. Waiting for a new version from the contributor.
>>
> Yep, that was I was remembering, and referring to. Thanks for the
> update.
>
>>> And if I look at it, I do see even totally idle (from the scheduler
>>> point of view) pCPUs, I indeed see them going back and forth from
>>> and
>>> to C3.
>>
>> My knowledge on x86 is limited. When does a CPU decides to leave the
>> idle mode?
>>
> I'm not an expert of that part either. Jan and Andrew for sure know
> best how monitor/mwait works (both in general, and our own
> implementation).
>
> What I know (and can quickly infer from glancing at the code), is that
> timers are certainly involved.
>
> In fact, we wake up when the most imminent timer would expire (see
> mwait_idle_with_hints()), and a timer set by the scheduler fully
> qualifies as being the one (if it's the most imminent).
>
> My point was that, still from scheduling perspective, neither Credit1
> nor Credit2 sets a wakeup timer for idle pCPUs.
>
> Well, in Credit1, the master_ticker timer is never stopped (while,
> e.g., the per-pCPU tick is stopped before entering deep sleep,
> via sched_tick_suspend(), see commit 964fae8ac), but that's only 1
> pCPU.

The function sched_tick_suspend is never called on ARM. The power saving 
in Xen ARM is still very limited and this would need to be updated in 
the future.

So I guess that's why I still see interrupt coming on the idle pCPU when 
credit1 is used. Looking at credit2, the callback tick_suspend is not 
called. Does it mean there is no per-pCPU timer?

Now, from my understanding, if we decide to call sched_tick_suspend on 
ARM before idling. We will likely have the same problem with credit1 
because there is no more interrupt to wake-up the pCPU.

But I don't think this is an issue in the scheduler. IHMO, the problem 
is in the RCU. Indeed a CPU in lower power mode (i.e  wfi on ARM or 
pm_idle on x86 is been executed) will never get out to tell to the RCU : 
"I am quiet, go ahead". So the RCU will never be able to reclaim the 
memory and will result on a memory exhaustion if the pCPU never receive 
an interrupt (this could happen if pCPU has never ran a guest).

The question now, is how to fix it?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-25 12:38                       ` Julien Grall
@ 2017-01-25 12:40                         ` Andrew Cooper
  2017-01-25 14:23                           ` Julien Grall
  2017-01-25 16:00                         ` Dario Faggioli
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2017-01-25 12:40 UTC (permalink / raw)
  To: Julien Grall, Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, xen-devel

On 25/01/17 12:38, Julien Grall wrote:
> Hi Dario,
>
> On 25/01/17 11:10, Dario Faggioli wrote:
>> On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote:
>>> On 24/01/17 14:16, Dario Faggioli wrote:
>>>> There, we have tracing (BTW, did that made it to ARM eventually?)
>>>> and
>>>> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of
>>>> your
>>>> printk-s.
>>>
>>> There is patch on the ML for xentrace support (see [1]) but nothing
>>> has
>>> been upstreamed yet. Waiting for a new version from the contributor.
>>>
>> Yep, that was I was remembering, and referring to. Thanks for the
>> update.
>>
>>>> And if I look at it, I do see even totally idle (from the scheduler
>>>> point of view) pCPUs, I indeed see them going back and forth from
>>>> and
>>>> to C3.
>>>
>>> My knowledge on x86 is limited. When does a CPU decides to leave the
>>> idle mode?
>>>
>> I'm not an expert of that part either. Jan and Andrew for sure know
>> best how monitor/mwait works (both in general, and our own
>> implementation).
>>
>> What I know (and can quickly infer from glancing at the code), is that
>> timers are certainly involved.
>>
>> In fact, we wake up when the most imminent timer would expire (see
>> mwait_idle_with_hints()), and a timer set by the scheduler fully
>> qualifies as being the one (if it's the most imminent).
>>
>> My point was that, still from scheduling perspective, neither Credit1
>> nor Credit2 sets a wakeup timer for idle pCPUs.
>>
>> Well, in Credit1, the master_ticker timer is never stopped (while,
>> e.g., the per-pCPU tick is stopped before entering deep sleep,
>> via sched_tick_suspend(), see commit 964fae8ac), but that's only 1
>> pCPU.
>
> The function sched_tick_suspend is never called on ARM. The power
> saving in Xen ARM is still very limited and this would need to be
> updated in the future.
>
> So I guess that's why I still see interrupt coming on the idle pCPU
> when credit1 is used. Looking at credit2, the callback tick_suspend is
> not called. Does it mean there is no per-pCPU timer?
>
> Now, from my understanding, if we decide to call sched_tick_suspend on
> ARM before idling. We will likely have the same problem with credit1
> because there is no more interrupt to wake-up the pCPU.
>
> But I don't think this is an issue in the scheduler. IHMO, the problem
> is in the RCU. Indeed a CPU in lower power mode (i.e  wfi on ARM or
> pm_idle on x86 is been executed) will never get out to tell to the RCU
> : "I am quiet, go ahead". So the RCU will never be able to reclaim the
> memory and will result on a memory exhaustion if the pCPU never
> receive an interrupt (this could happen if pCPU has never ran a guest).

Yes.  This is a core problem, not ARM specific.

x86 is saved by the time calibration rendezvous which IPIs all cores
every 1s.

>
> The question now, is how to fix it?


This is going to involve a better understanding of how RCU is supposed
to work.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-25 12:40                         ` Andrew Cooper
@ 2017-01-25 14:23                           ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-25 14:23 UTC (permalink / raw)
  To: Andrew Cooper, Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, xen-devel



On 25/01/17 12:40, Andrew Cooper wrote:
> On 25/01/17 12:38, Julien Grall wrote:
>> Hi Dario,
>>
>> On 25/01/17 11:10, Dario Faggioli wrote:
>>> On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote:
>>>> On 24/01/17 14:16, Dario Faggioli wrote:
>>>>> There, we have tracing (BTW, did that made it to ARM eventually?)
>>>>> and
>>>>> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of
>>>>> your
>>>>> printk-s.
>>>>
>>>> There is patch on the ML for xentrace support (see [1]) but nothing
>>>> has
>>>> been upstreamed yet. Waiting for a new version from the contributor.
>>>>
>>> Yep, that was I was remembering, and referring to. Thanks for the
>>> update.
>>>
>>>>> And if I look at it, I do see even totally idle (from the scheduler
>>>>> point of view) pCPUs, I indeed see them going back and forth from
>>>>> and
>>>>> to C3.
>>>>
>>>> My knowledge on x86 is limited. When does a CPU decides to leave the
>>>> idle mode?
>>>>
>>> I'm not an expert of that part either. Jan and Andrew for sure know
>>> best how monitor/mwait works (both in general, and our own
>>> implementation).
>>>
>>> What I know (and can quickly infer from glancing at the code), is that
>>> timers are certainly involved.
>>>
>>> In fact, we wake up when the most imminent timer would expire (see
>>> mwait_idle_with_hints()), and a timer set by the scheduler fully
>>> qualifies as being the one (if it's the most imminent).
>>>
>>> My point was that, still from scheduling perspective, neither Credit1
>>> nor Credit2 sets a wakeup timer for idle pCPUs.
>>>
>>> Well, in Credit1, the master_ticker timer is never stopped (while,
>>> e.g., the per-pCPU tick is stopped before entering deep sleep,
>>> via sched_tick_suspend(), see commit 964fae8ac), but that's only 1
>>> pCPU.
>>
>> The function sched_tick_suspend is never called on ARM. The power
>> saving in Xen ARM is still very limited and this would need to be
>> updated in the future.
>>
>> So I guess that's why I still see interrupt coming on the idle pCPU
>> when credit1 is used. Looking at credit2, the callback tick_suspend is
>> not called. Does it mean there is no per-pCPU timer?
>>
>> Now, from my understanding, if we decide to call sched_tick_suspend on
>> ARM before idling. We will likely have the same problem with credit1
>> because there is no more interrupt to wake-up the pCPU.
>>
>> But I don't think this is an issue in the scheduler. IHMO, the problem
>> is in the RCU. Indeed a CPU in lower power mode (i.e  wfi on ARM or
>> pm_idle on x86 is been executed) will never get out to tell to the RCU
>> : "I am quiet, go ahead". So the RCU will never be able to reclaim the
>> memory and will result on a memory exhaustion if the pCPU never
>> receive an interrupt (this could happen if pCPU has never ran a guest).
>
> Yes.  This is a core problem, not ARM specific.
>
> x86 is saved by the time calibration rendezvous which IPIs all cores
> every 1s.
>
>>
>> The question now, is how to fix it?
>
>
> This is going to involve a better understanding of how RCU is supposed
> to work.

I think we all agree that someone needs to kick the other pCPU to check 
whether the RCU is been used.

Looking at the documentation of our RCU code ([1]), section "RCU 
Implementations" it seems that we are expecting a timer to kick 
periodically pCPU and if there is some RCU work pending.

[1] http://lse.sourceforge.net/locking/rcupdate.html

>
> ~Andrew
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-25 12:38                       ` Julien Grall
  2017-01-25 12:40                         ` Andrew Cooper
@ 2017-01-25 16:00                         ` Dario Faggioli
  2017-01-31 16:30                           ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-01-25 16:00 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2679 bytes --]

On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
> Hi Dario,
> 
Hey,

> On 25/01/17 11:10, Dario Faggioli wrote:
> > My point was that, still from scheduling perspective, neither
> > Credit1
> > nor Credit2 sets a wakeup timer for idle pCPUs.
> > 
> > Well, in Credit1, the master_ticker timer is never stopped (while,
> > e.g., the per-pCPU tick is stopped before entering deep sleep,
> > via sched_tick_suspend(), see commit 964fae8ac), but that's only 1
> > pCPU.
> 
> The function sched_tick_suspend is never called on ARM. The power
> saving 
> in Xen ARM is still very limited and this would need to be updated
> in 
> the future.
> 
> So I guess that's why I still see interrupt coming on the idle pCPU
> when 
> credit1 is used. 
>
Yes. If you don't suspend the tick before going to wfi/hlt/whatever,
there will be a timer firing --and AFAICT waking you up from the low
power state-- every 10ms (with default Credit1 timeslice), even for
idle pCPUs.

> Looking at credit2, the callback tick_suspend is not 
> called. Does it mean there is no per-pCPU timer?
> 
Exactly, we (happily) don't need that in Credit2. :-)

> Now, from my understanding, if we decide to call sched_tick_suspend
> on 
> ARM before idling. We will likely have the same problem with credit1 
> because there is no more interrupt to wake-up the pCPU.
> 
Basing on what you've said so far in this thread, I tend to think that,
yes, that would be the case.

> But I don't think this is an issue in the scheduler. 
>
Agreed.

> IHMO, the problem 
> is in the RCU. Indeed a CPU in lower power mode (i.e  wfi on ARM or 
> pm_idle on x86 is been executed) will never get out to tell to the
> RCU : 
> "I am quiet, go ahead". So the RCU will never be able to reclaim the 
> memory and will result on a memory exhaustion if the pCPU never
> receive 
> an interrupt (this could happen if pCPU has never ran a guest).
> 
> The question now, is how to fix it?
> 
And a good one. I may be wrong (I certainly wasn't around at the time),
but ISTR out RCU code is imported/inspired by Linux... Looking there
again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
monster, with 100 heads and sharpen claws! :-O

And, while, in there, it has to be like that, I don't think we need all
such complexity, and hence we can't just re-sync. :-/

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-25 16:00                         ` Dario Faggioli
@ 2017-01-31 16:30                           ` Julien Grall
  2017-01-31 22:10                             ` Stefano Stabellini
  2017-02-01 18:21                             ` Wei Liu
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-31 16:30 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

Hi Dario,

On 25/01/17 16:00, Dario Faggioli wrote:
> On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
>> On 25/01/17 11:10, Dario Faggioli wrote:
> And a good one. I may be wrong (I certainly wasn't around at the time),
> but ISTR out RCU code is imported/inspired by Linux... Looking there
> again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
> monster, with 100 heads and sharpen claws! :-O
>
> And, while, in there, it has to be like that, I don't think we need all
> such complexity, and hence we can't just re-sync. :-/

Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU 
code and noticed there is a link in the header to [1].

It seems to be a documentation about the RCU code we used. From my 
understanding of the "RCU Implementations", the authors are expecting a 
timer to kick periodically pCPU and check if there is some RCU work pending.

We could add this timer but it would prevent an idle pCPU to stay in low 
power mode for a long time. Another solution would be to send an 
interrupt to each pCPU when call_rcu is called rather depending on a 
mark. Although this would still wake-up the pCPU even it was doing nothing.

Any better ideas?

Cheers,

[1] http://lse.sourceforge.net/locking/rcupdate.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-31 16:30                           ` Julien Grall
@ 2017-01-31 22:10                             ` Stefano Stabellini
  2017-02-01 18:21                             ` Wei Liu
  1 sibling, 0 replies; 33+ messages in thread
From: Stefano Stabellini @ 2017-01-31 22:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel

On Tue, 31 Jan 2017, Julien Grall wrote:
> Hi Dario,
> 
> On 25/01/17 16:00, Dario Faggioli wrote:
> > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
> > > On 25/01/17 11:10, Dario Faggioli wrote:
> > And a good one. I may be wrong (I certainly wasn't around at the time),
> > but ISTR out RCU code is imported/inspired by Linux... Looking there
> > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
> > monster, with 100 heads and sharpen claws! :-O
> > 
> > And, while, in there, it has to be like that, I don't think we need all
> > such complexity, and hence we can't just re-sync. :-/
> 
> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU code
> and noticed there is a link in the header to [1].
> 
> It seems to be a documentation about the RCU code we used. From my
> understanding of the "RCU Implementations", the authors are expecting a timer
> to kick periodically pCPU and check if there is some RCU work pending.
> 
> We could add this timer but it would prevent an idle pCPU to stay in low power
> mode for a long time. Another solution would be to send an interrupt to each
> pCPU when call_rcu is called rather depending on a mark. Although this would
> still wake-up the pCPU even it was doing nothing.
> 
> Any better ideas?

Julien, thanks for looking into this.

Instead of the RCU, could we send an interrupt to all pCPU *not* in idle
mode? We could have a shared bitmask in memory with all pCPUs currently
sleeping.


> Cheers,
> 
> [1] http://lse.sourceforge.net/locking/rcupdate.html
> 
> -- 
> Julien Grall
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-01-31 16:30                           ` Julien Grall
  2017-01-31 22:10                             ` Stefano Stabellini
@ 2017-02-01 18:21                             ` Wei Liu
  2017-02-02 11:22                               ` Jan Beulich
  1 sibling, 1 reply; 33+ messages in thread
From: Wei Liu @ 2017-02-01 18:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel

On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote:
> Hi Dario,
> 
> On 25/01/17 16:00, Dario Faggioli wrote:
> > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
> > > On 25/01/17 11:10, Dario Faggioli wrote:
> > And a good one. I may be wrong (I certainly wasn't around at the time),
> > but ISTR out RCU code is imported/inspired by Linux... Looking there
> > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
> > monster, with 100 heads and sharpen claws! :-O
> > 
> > And, while, in there, it has to be like that, I don't think we need all
> > such complexity, and hence we can't just re-sync. :-/
> 
> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU
> code and noticed there is a link in the header to [1].
> 
> It seems to be a documentation about the RCU code we used. From my
> understanding of the "RCU Implementations", the authors are expecting a
> timer to kick periodically pCPU and check if there is some RCU work pending.
> 
> We could add this timer but it would prevent an idle pCPU to stay in low
> power mode for a long time. Another solution would be to send an interrupt
> to each pCPU when call_rcu is called rather depending on a mark. Although
> this would still wake-up the pCPU even it was doing nothing.
> 
> Any better ideas?
> 

Worth checking all the RCU docs in Linux (Documentation/RCU).

I think there are descriptions about idle or no-tick variants. It would
be useful to know how Linux handles this. I suspect RCU in Linux is more
capable than the one in Xen...

Wei.

> Cheers,
> 
> [1] http://lse.sourceforge.net/locking/rcupdate.html
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-01 18:21                             ` Wei Liu
@ 2017-02-02 11:22                               ` Jan Beulich
  2017-02-02 11:53                                 ` Wei Liu
  2017-02-02 12:01                                 ` Dario Faggioli
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2017-02-02 11:22 UTC (permalink / raw)
  To: Julien Grall, Wei Liu
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Dario Faggioli,
	Ian Jackson, Tim Deegan, xen-devel

>>> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote:
> On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote:
>> Hi Dario,
>> 
>> On 25/01/17 16:00, Dario Faggioli wrote:
>> > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
>> > > On 25/01/17 11:10, Dario Faggioli wrote:
>> > And a good one. I may be wrong (I certainly wasn't around at the time),
>> > but ISTR out RCU code is imported/inspired by Linux... Looking there
>> > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
>> > monster, with 100 heads and sharpen claws! :-O
>> > 
>> > And, while, in there, it has to be like that, I don't think we need all
>> > such complexity, and hence we can't just re-sync. :-/
>> 
>> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU
>> code and noticed there is a link in the header to [1].
>> 
>> It seems to be a documentation about the RCU code we used. From my
>> understanding of the "RCU Implementations", the authors are expecting a
>> timer to kick periodically pCPU and check if there is some RCU work pending.
>> 
>> We could add this timer but it would prevent an idle pCPU to stay in low
>> power mode for a long time. Another solution would be to send an interrupt
>> to each pCPU when call_rcu is called rather depending on a mark. Although
>> this would still wake-up the pCPU even it was doing nothing.
>> 
>> Any better ideas?
> 
> Worth checking all the RCU docs in Linux (Documentation/RCU).
> 
> I think there are descriptions about idle or no-tick variants. It would
> be useful to know how Linux handles this. I suspect RCU in Linux is more
> capable than the one in Xen...

Isn't all we need an rcu_idle_{enter,exit}() implementation (and of
course calls to them placed where needed)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 11:22                               ` Jan Beulich
@ 2017-02-02 11:53                                 ` Wei Liu
  2017-02-02 12:18                                   ` Julien Grall
  2017-02-02 12:01                                 ` Dario Faggioli
  1 sibling, 1 reply; 33+ messages in thread
From: Wei Liu @ 2017-02-02 11:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, AndrewCooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Thu, Feb 02, 2017 at 04:22:53AM -0700, Jan Beulich wrote:
> >>> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote:
> > On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote:
> >> Hi Dario,
> >> 
> >> On 25/01/17 16:00, Dario Faggioli wrote:
> >> > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
> >> > > On 25/01/17 11:10, Dario Faggioli wrote:
> >> > And a good one. I may be wrong (I certainly wasn't around at the time),
> >> > but ISTR out RCU code is imported/inspired by Linux... Looking there
> >> > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
> >> > monster, with 100 heads and sharpen claws! :-O
> >> > 
> >> > And, while, in there, it has to be like that, I don't think we need all
> >> > such complexity, and hence we can't just re-sync. :-/
> >> 
> >> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU
> >> code and noticed there is a link in the header to [1].
> >> 
> >> It seems to be a documentation about the RCU code we used. From my
> >> understanding of the "RCU Implementations", the authors are expecting a
> >> timer to kick periodically pCPU and check if there is some RCU work pending.
> >> 
> >> We could add this timer but it would prevent an idle pCPU to stay in low
> >> power mode for a long time. Another solution would be to send an interrupt
> >> to each pCPU when call_rcu is called rather depending on a mark. Although
> >> this would still wake-up the pCPU even it was doing nothing.
> >> 
> >> Any better ideas?
> > 
> > Worth checking all the RCU docs in Linux (Documentation/RCU).
> > 
> > I think there are descriptions about idle or no-tick variants. It would
> > be useful to know how Linux handles this. I suspect RCU in Linux is more
> > capable than the one in Xen...
> 
> Isn't all we need an rcu_idle_{enter,exit}() implementation (and of
> course calls to them placed where needed)?
> 

I'm no RCU expert, but having checked Linux source code and the
documentation of rcu_idle_{enter,exit}, what you said makes sense to me.

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 11:22                               ` Jan Beulich
  2017-02-02 11:53                                 ` Wei Liu
@ 2017-02-02 12:01                                 ` Dario Faggioli
  1 sibling, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2017-02-02 12:01 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall, Wei Liu
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 2306 bytes --]

On Thu, 2017-02-02 at 04:22 -0700, Jan Beulich wrote:

> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote:

> > On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote:

> > > Yeah, even the tiny RCU code is quite complex :/. I've looked at 
> > > our RCUcode and noticed there is a link in the header to [1].
> > > 
> > > It seems to be a documentation about the RCU code we used. From
> > > my
> > > understanding of the "RCU Implementations", the authors are
> > > expecting a
> > > timer to kick periodically pCPU and check if there is some RCU
> > > work pending.

> > Worth checking all the RCU docs in Linux (Documentation/RCU).
> > 
> > I think there are descriptions about idle or no-tick variants.
>
It surely is worth, but bearing in mind that, as said before, Linux
RCUs are indeed more powerful than what we have, but also much much
much much more complex than what we probably need.

And (for Julien), perhaps it's me, but I don't think I see references
or hints at using a timer in the docs you linked, nor on other RCU doc
material.

As a matter of fact, I agree with Jan, i.e.,

> Isn't all we need an rcu_idle_{enter,exit}() implementation (and of
> course calls to them placed where needed)?
> 
This is what I think we're missing. And, AFAIUI, it's sort of similar
to what Stefano (I think) was saying, that a CPU going idle is a step
toward grace period, because rcu critical sections can't occur on it.

As per what Julien said about softirqs (which also looks right to me),
this is how Linux handles the issue:

http://lxr.free-electrons.com/source/kernel/rcu/tree.c#L733
/**
 * rcu_idle_enter - inform RCU that current CPU is entering idle
 *
 * Enter idle mode, in other words, -leave- the mode in which RCU
 * read-side critical sections can occur.  (Though RCU read-side
 * critical sections can occur in irq handlers in idle, a possibility
 * handled by irq_enter() and irq_exit().)
 */

So we may also need rcu_irq_enter() and rcu_irq_exit().

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 11:53                                 ` Wei Liu
@ 2017-02-02 12:18                                   ` Julien Grall
  2017-02-02 12:51                                     ` Dario Faggioli
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2017-02-02 12:18 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich, Dario Faggioli
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson

Hi,

On 02/02/17 11:53, Wei Liu wrote:
> On Thu, Feb 02, 2017 at 04:22:53AM -0700, Jan Beulich wrote:
>>>>> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote:
>>> On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote:
>>>> Hi Dario,
>>>>
>>>> On 25/01/17 16:00, Dario Faggioli wrote:
>>>>> On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote:
>>>>>> On 25/01/17 11:10, Dario Faggioli wrote:
>>>>> And a good one. I may be wrong (I certainly wasn't around at the time),
>>>>> but ISTR out RCU code is imported/inspired by Linux... Looking there
>>>>> again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra
>>>>> monster, with 100 heads and sharpen claws! :-O
>>>>>
>>>>> And, while, in there, it has to be like that, I don't think we need all
>>>>> such complexity, and hence we can't just re-sync. :-/
>>>>
>>>> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU
>>>> code and noticed there is a link in the header to [1].
>>>>
>>>> It seems to be a documentation about the RCU code we used. From my
>>>> understanding of the "RCU Implementations", the authors are expecting a
>>>> timer to kick periodically pCPU and check if there is some RCU work pending.
>>>>
>>>> We could add this timer but it would prevent an idle pCPU to stay in low
>>>> power mode for a long time. Another solution would be to send an interrupt
>>>> to each pCPU when call_rcu is called rather depending on a mark. Although
>>>> this would still wake-up the pCPU even it was doing nothing.
>>>>
>>>> Any better ideas?
>>>
>>> Worth checking all the RCU docs in Linux (Documentation/RCU).
>>>
>>> I think there are descriptions about idle or no-tick variants. It would
>>> be useful to know how Linux handles this. I suspect RCU in Linux is more
>>> capable than the one in Xen...
>>
>> Isn't all we need an rcu_idle_{enter,exit}() implementation (and of
>> course calls to them placed where needed)?
>>
>
> I'm no RCU expert, but having checked Linux source code and the
> documentation of rcu_idle_{enter,exit}, what you said makes sense to me.

And the doc seems to confirm that (see Documentation/RCU/rcu.txt):

"Just as with spinlocks, RCU readers are not permitted to
block, switch to user-mode execution, or enter the idle loop.
Therefore, as soon as a CPU is seen passing through any of these
three states, we know that that CPU has exited any previous RCU
read-side critical sections.  So, if we remove an item from a
linked list, and then wait until all CPUs have switched context,
executed in user mode, or executed in the idle loop, we can
safely free up that item."

Dario, are you going to look into the issue? Or shall I try to write a 
patch for it?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 12:18                                   ` Julien Grall
@ 2017-02-02 12:51                                     ` Dario Faggioli
  2017-02-02 13:26                                       ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-02-02 12:51 UTC (permalink / raw)
  To: Julien Grall, Wei Liu, Jan Beulich
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 711 bytes --]

On Thu, 2017-02-02 at 12:18 +0000, Julien Grall wrote:
> Dario, are you going to look into the issue? Or shall I try to write
> a 
> patch for it?
> 
I'd be up for looking into this. BUT, I'm travelling this weekend, and
am probably going to be busy next week (sorry).

So, I expect to be able to do something useful only, let's stay, from
Mon 13th. If that's ok, do sign me up. If you're more in a hurry, feel
free to beat me. :-)

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 12:51                                     ` Dario Faggioli
@ 2017-02-02 13:26                                       ` Julien Grall
  2017-02-02 13:32                                         ` Dario Faggioli
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2017-02-02 13:26 UTC (permalink / raw)
  To: Dario Faggioli, Wei Liu, Jan Beulich
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson

Hi Dario,

On 02/02/17 12:51, Dario Faggioli wrote:
> On Thu, 2017-02-02 at 12:18 +0000, Julien Grall wrote:
>> Dario, are you going to look into the issue? Or shall I try to write
>> a
>> patch for it?
>>
> I'd be up for looking into this. BUT, I'm travelling this weekend, and
> am probably going to be busy next week (sorry).
>
> So, I expect to be able to do something useful only, let's stay, from
> Mon 13th. If that's ok, do sign me up. If you're more in a hurry, feel
> free to beat me. :-)

I have plenty of others things to do, and will happily let you handle 
this. It is not urgent, thought it will be good to have it fixed for Xen 
4.9 :).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 13:26                                       ` Julien Grall
@ 2017-02-02 13:32                                         ` Dario Faggioli
  2017-03-28 18:30                                           ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2017-02-02 13:32 UTC (permalink / raw)
  To: Julien Grall, Wei Liu, Jan Beulich
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 905 bytes --]

On Thu, 2017-02-02 at 13:26 +0000, Julien Grall wrote:
> On 02/02/17 12:51, Dario Faggioli wrote:
> > So, I expect to be able to do something useful only, let's stay,
> > from
> > Mon 13th. If that's ok, do sign me up. If you're more in a hurry,
> > feel
> > free to beat me. :-)
> 
> I have plenty of others things to do, and will happily let you
> handle 
> this. It is not urgent, thought it will be good to have it fixed for
> Xen 
> 4.9 :).
> 
Ok, sign me up for it then. We absolutely want it for 4.9, I agree.
Track it in your RM emails, with my name on it, if you want.

I'll cry if I need help. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-02-02 13:32                                         ` Dario Faggioli
@ 2017-03-28 18:30                                           ` Julien Grall
  2017-03-30  7:38                                             ` Dario Faggioli
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2017-03-28 18:30 UTC (permalink / raw)
  To: Dario Faggioli, Wei Liu, Jan Beulich
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson

Hi Dario,

On 02/02/17 13:32, Dario Faggioli wrote:
> On Thu, 2017-02-02 at 13:26 +0000, Julien Grall wrote:
>> On 02/02/17 12:51, Dario Faggioli wrote:
>>> So, I expect to be able to do something useful only, let's stay,
>>> from
>>> Mon 13th. If that's ok, do sign me up. If you're more in a hurry,
>>> feel
>>> free to beat me. :-)
>>
>> I have plenty of others things to do, and will happily let you
>> handle
>> this. It is not urgent, thought it will be good to have it fixed for
>> Xen
>> 4.9 :).
>>
> Ok, sign me up for it then. We absolutely want it for 4.9, I agree.

Do you have any update on this? This would allow us to use credit 2 on 
ARM when physical processor are idle.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xen/arm: Domain not fully destroyed when using credit2
  2017-03-28 18:30                                           ` Julien Grall
@ 2017-03-30  7:38                                             ` Dario Faggioli
  0 siblings, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2017-03-30  7:38 UTC (permalink / raw)
  To: Julien Grall, Wei Liu, Jan Beulich
  Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan,
	xen-devel, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 878 bytes --]

On Tue, 2017-03-28 at 19:30 +0100, Julien Grall wrote:
> Hi Dario,
> 
Hey,

> On 02/02/17 13:32, Dario Faggioli wrote:
> > On Thu, 2017-02-02 at 13:26 +0000, Julien Grall wrote:
> > > 
> > Ok, sign me up for it then. We absolutely want it for 4.9, I agree.
> 
> Do you have any update on this? This would allow us to use credit 2
> on 
> ARM when physical processor are idle.
> 
Yes, sorry for the delay. I've started working on this, and I have it
half done, but then had to switch to something else. I most likely will
be able to get back to it tomorrow, and finish and send something soon.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-03-30  7:38 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall
2017-01-24  0:16 ` Stefano Stabellini
2017-01-24 12:52   ` Julien Grall
2017-01-24  8:20 ` Jan Beulich
2017-01-24 10:50   ` Julien Grall
2017-01-24 11:02     ` Jan Beulich
2017-01-24 12:30       ` Julien Grall
2017-01-24 12:53     ` Dario Faggioli
2017-01-24 13:04       ` Julien Grall
2017-01-24 13:05         ` Julien Grall
2017-01-24 13:19         ` Dario Faggioli
2017-01-24 13:24           ` Julien Grall
2017-01-24 13:40             ` Dario Faggioli
2017-01-24 13:49               ` Julien Grall
2017-01-24 14:16                 ` Dario Faggioli
2017-01-24 15:06                   ` Julien Grall
2017-01-25 11:10                     ` Dario Faggioli
2017-01-25 12:38                       ` Julien Grall
2017-01-25 12:40                         ` Andrew Cooper
2017-01-25 14:23                           ` Julien Grall
2017-01-25 16:00                         ` Dario Faggioli
2017-01-31 16:30                           ` Julien Grall
2017-01-31 22:10                             ` Stefano Stabellini
2017-02-01 18:21                             ` Wei Liu
2017-02-02 11:22                               ` Jan Beulich
2017-02-02 11:53                                 ` Wei Liu
2017-02-02 12:18                                   ` Julien Grall
2017-02-02 12:51                                     ` Dario Faggioli
2017-02-02 13:26                                       ` Julien Grall
2017-02-02 13:32                                         ` Dario Faggioli
2017-03-28 18:30                                           ` Julien Grall
2017-03-30  7:38                                             ` Dario Faggioli
2017-02-02 12:01                                 ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.