All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
@ 2015-11-13 17:10 Dario Faggioli
  2015-11-16 13:10 ` Juergen Gross
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dario Faggioli @ 2015-11-13 17:10 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Jan Beulich

In fact, with 2 cpupools, one (the default) Credit and
one Credit2 (with at least 1 pCPU in the latter), trying
a (e.g., ACPI S3) suspend/resume crashes like this:

(XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
(XEN) [  150.587783] CPU:    6
(XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
(XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
(XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
(XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
(XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
(XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
(XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
... ... ...
(XEN) [  150.587962] Xen call trace:
(XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
(XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
(XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
(XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
(XEN) [  151.272182]
(XEN) [  151.274174] ****************************************
(XEN) [  151.279624] Panic on CPU 6:
(XEN) [  151.282915] Xen BUG at sched_credit.c:655
(XEN) [  151.287415] ****************************************

During suspend, the pCPUs are not removed from their
pools with the standard procedure (which would involve
schedule_cpu_switch(). During resume, they:
 1) are assigned to the default cpupool (CPU_UP_PREPARE
    phase);
 2) are moved to the pool they were in before suspend,
    via schedule_cpu_switch() (CPU_ONLINE phase)

During resume, scheduling (even if just the idle loop)
can happen right after the CPU_STARTING phase(before
CPU_ONLINE), i.e., before the pCPU is put back in its
pool. In this case, it is the default pool'sscheduler
that is invoked (Credit1, in the example above). But,
during suspend, the Credit2 specific vCPU data is not
being freed, and Credit1 specific vCPU data is not
allocated, during resume.

Therefore, Credit1 schedules on pCPUs whose idle vCPU's
sched_priv points to Credit2 vCPU data, and we crash.

Fix things by properly deallocating scheduler specific
data of the pCPU's pool scheduler during pCPU teardown,
and re-allocating them --always for &ops-- during pCPU
bringup.

This also fixes another (latent) bug. In fact, it avoids,
still in schedule_cpu_switch(), that Credit1's free_vdata()
is used to deallocate data allocated with Credit2's
alloc_vdata(). This is not easy to trigger, but only
because the other bug shown above manifests first and
crashes the host.

The downside of this patch, is that it adds one more
allocation on the resume path, which is not ideal. Still,
there is no better way of fixing the described bugs at
the moment. Removing (all ideally) allocations happening
during resume should continue being chased, in the long
run.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes from v1:
 * reversed the deallocation order in cpu_schedule_down(), so that
   allocations and deallocations actually happen in reverse ordering;
 * moved the allocation of the private data in the else branch of
   the case where the whole idle vcpu is allocated, and added
   an ASSERT() about such data being actually not allocated in
   that case;
 * improved both in code comment and the changelog.
---
 xen/common/schedule.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 20f5f56..1c05184 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1378,6 +1378,27 @@ static int cpu_schedule_up(unsigned int cpu)
 
     if ( idle_vcpu[cpu] == NULL )
         alloc_vcpu(idle_vcpu[0]->domain, cpu, cpu);
+    else
+    {
+        struct vcpu *idle = idle_vcpu[cpu];
+
+        /*
+         * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
+         * while its scheduler specific data (what is pointed by sched_priv)
+         * is. Also, at this stage of the resume path, we attach the pCPU
+         * to the default scheduler, no matter in what cpupool it was before
+         * suspend. To avoid inconsistency, let's allocate default scheduler
+         * data for the idle vCPU here. If the pCPU was in a different pool
+         * with a different scheduler, it is schedule_cpu_switch(), invoked
+         * later, that will set things up as appropriate.
+         */
+        ASSERT(idle->sched_priv == NULL);
+
+        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
+                                    idle->domain->sched_priv);
+        if ( idle->sched_priv == NULL )
+            return -ENOMEM;
+    }
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
@@ -1395,6 +1416,10 @@ static void cpu_schedule_down(unsigned int cpu)
 
     if ( sd->sched_priv != NULL )
         SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
+
+    idle_vcpu[cpu]->sched_priv = NULL;
+    sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 17:10 [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
@ 2015-11-16 13:10 ` Juergen Gross
  2015-11-24 15:32 ` George Dunlap
  2015-12-10 15:13 ` George Dunlap
  2 siblings, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2015-11-16 13:10 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Jan Beulich

On 13/11/15 18:10, Dario Faggioli wrote:
> In fact, with 2 cpupools, one (the default) Credit and
> one Credit2 (with at least 1 pCPU in the latter), trying
> a (e.g., ACPI S3) suspend/resume crashes like this:
> 
> (XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) [  150.587783] CPU:    6
> (XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
> (XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
> (XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
> (XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
> (XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
> (XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
> (XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
> ... ... ...
> (XEN) [  150.587962] Xen call trace:
> (XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
> (XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
> (XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
> (XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
> (XEN) [  151.272182]
> (XEN) [  151.274174] ****************************************
> (XEN) [  151.279624] Panic on CPU 6:
> (XEN) [  151.282915] Xen BUG at sched_credit.c:655
> (XEN) [  151.287415] ****************************************
> 
> During suspend, the pCPUs are not removed from their
> pools with the standard procedure (which would involve
> schedule_cpu_switch(). During resume, they:
>  1) are assigned to the default cpupool (CPU_UP_PREPARE
>     phase);
>  2) are moved to the pool they were in before suspend,
>     via schedule_cpu_switch() (CPU_ONLINE phase)
> 
> During resume, scheduling (even if just the idle loop)
> can happen right after the CPU_STARTING phase(before
> CPU_ONLINE), i.e., before the pCPU is put back in its
> pool. In this case, it is the default pool'sscheduler
> that is invoked (Credit1, in the example above). But,
> during suspend, the Credit2 specific vCPU data is not
> being freed, and Credit1 specific vCPU data is not
> allocated, during resume.
> 
> Therefore, Credit1 schedules on pCPUs whose idle vCPU's
> sched_priv points to Credit2 vCPU data, and we crash.
> 
> Fix things by properly deallocating scheduler specific
> data of the pCPU's pool scheduler during pCPU teardown,
> and re-allocating them --always for &ops-- during pCPU
> bringup.
> 
> This also fixes another (latent) bug. In fact, it avoids,
> still in schedule_cpu_switch(), that Credit1's free_vdata()
> is used to deallocate data allocated with Credit2's
> alloc_vdata(). This is not easy to trigger, but only
> because the other bug shown above manifests first and
> crashes the host.
> 
> The downside of this patch, is that it adds one more
> allocation on the resume path, which is not ideal. Still,
> there is no better way of fixing the described bugs at
> the moment. Removing (all ideally) allocations happening
> during resume should continue being chased, in the long
> run.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 17:10 [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
  2015-11-16 13:10 ` Juergen Gross
@ 2015-11-24 15:32 ` George Dunlap
  2015-11-24 17:14   ` Dario Faggioli
  2015-12-10 15:13 ` George Dunlap
  2 siblings, 1 reply; 7+ messages in thread
From: George Dunlap @ 2015-11-24 15:32 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Juergen Gross, Jan Beulich

On 13/11/15 17:10, Dario Faggioli wrote:
> In fact, with 2 cpupools, one (the default) Credit and
> one Credit2 (with at least 1 pCPU in the latter), trying
> a (e.g., ACPI S3) suspend/resume crashes like this:
> 
> (XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) [  150.587783] CPU:    6
> (XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
> (XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
> (XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
> (XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
> (XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
> (XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
> (XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
> ... ... ...
> (XEN) [  150.587962] Xen call trace:
> (XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
> (XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
> (XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
> (XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
> (XEN) [  151.272182]
> (XEN) [  151.274174] ****************************************
> (XEN) [  151.279624] Panic on CPU 6:
> (XEN) [  151.282915] Xen BUG at sched_credit.c:655
> (XEN) [  151.287415] ****************************************
> 
> During suspend, the pCPUs are not removed from their
> pools with the standard procedure (which would involve
> schedule_cpu_switch(). During resume, they:
>  1) are assigned to the default cpupool (CPU_UP_PREPARE
>     phase);
>  2) are moved to the pool they were in before suspend,
>     via schedule_cpu_switch() (CPU_ONLINE phase)
> 
> During resume, scheduling (even if just the idle loop)
> can happen right after the CPU_STARTING phase(before
> CPU_ONLINE), i.e., before the pCPU is put back in its
> pool.

So why are we restoring scheduling stuff during CPU_STARTING, but only
putting cpus back in their pools at CPU_ONLINE?

At some point I think I knew the answer to this, but it's worth
revisiting it.

 -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-24 15:32 ` George Dunlap
@ 2015-11-24 17:14   ` Dario Faggioli
  2015-12-07 12:21     ` George Dunlap
  0 siblings, 1 reply; 7+ messages in thread
From: Dario Faggioli @ 2015-11-24 17:14 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Juergen Gross, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2253 bytes --]

On Tue, 2015-11-24 at 15:32 +0000, George Dunlap wrote:
> On 13/11/15 17:10, Dario Faggioli wrote:
> > 
> > During suspend, the pCPUs are not removed from their
> > pools with the standard procedure (which would involve
> > schedule_cpu_switch(). During resume, they:
> >  1) are assigned to the default cpupool (CPU_UP_PREPARE
> >     phase);
> >  2) are moved to the pool they were in before suspend,
> >     via schedule_cpu_switch() (CPU_ONLINE phase)
> > 
> > During resume, scheduling (even if just the idle loop)
> > can happen right after the CPU_STARTING phase(before
> > CPU_ONLINE), i.e., before the pCPU is put back in its
> > pool.
> 
> So why are we restoring scheduling stuff during CPU_STARTING, but
> only
> putting cpus back in their pools at CPU_ONLINE?
> 
Indeed. Much worse: we open the CPU for scheduling before it's back in
its pool (this is all what this bug is about!). this never made much
sense to me.

> At some point I think I knew the answer to this, but it's worth
> revisiting it.
> 
So, I once had a look, and tried shuffling things around, in a way in
which the order made more sense to me, but it does not work 'out of the
box'.

The issues have, AFAICR, to do with the fact that memory allocations
(for the per-pCPU scheduling data, need IRQs enabled (which means
CPU_UP_PREPARE, much rather than CPU_STARTING, is what we want) on the
scheduling side, and other data that need to be ready and initialized
in order to setup cpupools (e.g., per_cpu(cpupool, cpu)).

As said, I don't recall all the details, sorry. I recall thinking that
a solution would involve putting the CPU in the pool earlier, but that
in turn calls for other work (e.g., tweaking the priorities of the
callbacks for avoiding races).

It's on my list of things to do, but not with super high priority. Are
you saying that we should drop this patch, and do the callback
reordering/refactoring first?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-24 17:14   ` Dario Faggioli
@ 2015-12-07 12:21     ` George Dunlap
  2015-12-10  8:42       ` Dario Faggioli
  0 siblings, 1 reply; 7+ messages in thread
From: George Dunlap @ 2015-12-07 12:21 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Juergen Gross, Jan Beulich

On 24/11/15 17:14, Dario Faggioli wrote:
> On Tue, 2015-11-24 at 15:32 +0000, George Dunlap wrote:
>> On 13/11/15 17:10, Dario Faggioli wrote:
>>>  
>>> During suspend, the pCPUs are not removed from their
>>> pools with the standard procedure (which would involve
>>> schedule_cpu_switch(). During resume, they:
>>>  1) are assigned to the default cpupool (CPU_UP_PREPARE
>>>     phase);
>>>  2) are moved to the pool they were in before suspend,
>>>     via schedule_cpu_switch() (CPU_ONLINE phase)
>>>
>>> During resume, scheduling (even if just the idle loop)
>>> can happen right after the CPU_STARTING phase(before
>>> CPU_ONLINE), i.e., before the pCPU is put back in its
>>> pool.
>>
>> So why are we restoring scheduling stuff during CPU_STARTING, but
>> only
>> putting cpus back in their pools at CPU_ONLINE?
>>
> Indeed. Much worse: we open the CPU for scheduling before it's back in
> its pool (this is all what this bug is about!). this never made much
> sense to me.
> 
>> At some point I think I knew the answer to this, but it's worth
>> revisiting it.
>>
> So, I once had a look, and tried shuffling things around, in a way in
> which the order made more sense to me, but it does not work 'out of the
> box'.
> 
> The issues have, AFAICR, to do with the fact that memory allocations
> (for the per-pCPU scheduling data, need IRQs enabled (which means
> CPU_UP_PREPARE, much rather than CPU_STARTING, is what we want) on the
> scheduling side, and other data that need to be ready and initialized
> in order to setup cpupools (e.g., per_cpu(cpupool, cpu)).
> 
> As said, I don't recall all the details, sorry. I recall thinking that
> a solution would involve putting the CPU in the pool earlier, but that
> in turn calls for other work (e.g., tweaking the priorities of the
> callbacks for avoiding races).
> 
> It's on my list of things to do, but not with super high priority. Are
> you saying that we should drop this patch, and do the callback
> reordering/refactoring first?

Sorry, meant to respond to this -- no, I don't think refactoring is a
prerequisite.  Let me give it another look-over today.

 -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-12-07 12:21     ` George Dunlap
@ 2015-12-10  8:42       ` Dario Faggioli
  0 siblings, 0 replies; 7+ messages in thread
From: Dario Faggioli @ 2015-12-10  8:42 UTC (permalink / raw)
  To: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1187 bytes --]

On Mon, 2015-12-07 at 12:21 +0000, George Dunlap wrote:
> On 24/11/15 17:14, Dario Faggioli wrote:
> > On Tue, 2015-11-24 at 15:32 +0000, George Dunlap wrote:
> > > 
> > As said, I don't recall all the details, sorry. I recall thinking
> > that
> > a solution would involve putting the CPU in the pool earlier, but
> > that
> > in turn calls for other work (e.g., tweaking the priorities of the
> > callbacks for avoiding races).
> > 
> > It's on my list of things to do, but not with super high priority.
> > Are
> > you saying that we should drop this patch, and do the callback
> > reordering/refactoring first?
> 
> Sorry, meant to respond to this -- no, I don't think refactoring is a
> prerequisite.  
>
Ah, okay. :-)

> Let me give it another look-over today.
> 
That would be much appreciated. Not a super huge deal, but I've got a
few patches stacked on top of this...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 17:10 [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
  2015-11-16 13:10 ` Juergen Gross
  2015-11-24 15:32 ` George Dunlap
@ 2015-12-10 15:13 ` George Dunlap
  2 siblings, 0 replies; 7+ messages in thread
From: George Dunlap @ 2015-12-10 15:13 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel, Jan Beulich

On Fri, Nov 13, 2015 at 5:10 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> In fact, with 2 cpupools, one (the default) Credit and
> one Credit2 (with at least 1 pCPU in the latter), trying
> a (e.g., ACPI S3) suspend/resume crashes like this:
>
> (XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) [  150.587783] CPU:    6
> (XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
> (XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
> (XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
> (XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
> (XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
> (XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
> (XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
> ... ... ...
> (XEN) [  150.587962] Xen call trace:
> (XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
> (XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
> (XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
> (XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
> (XEN) [  151.272182]
> (XEN) [  151.274174] ****************************************
> (XEN) [  151.279624] Panic on CPU 6:
> (XEN) [  151.282915] Xen BUG at sched_credit.c:655
> (XEN) [  151.287415] ****************************************
>
> During suspend, the pCPUs are not removed from their
> pools with the standard procedure (which would involve
> schedule_cpu_switch(). During resume, they:
>  1) are assigned to the default cpupool (CPU_UP_PREPARE
>     phase);
>  2) are moved to the pool they were in before suspend,
>     via schedule_cpu_switch() (CPU_ONLINE phase)
>
> During resume, scheduling (even if just the idle loop)
> can happen right after the CPU_STARTING phase(before
> CPU_ONLINE), i.e., before the pCPU is put back in its
> pool. In this case, it is the default pool'sscheduler
> that is invoked (Credit1, in the example above). But,
> during suspend, the Credit2 specific vCPU data is not
> being freed, and Credit1 specific vCPU data is not
> allocated, during resume.
>
> Therefore, Credit1 schedules on pCPUs whose idle vCPU's
> sched_priv points to Credit2 vCPU data, and we crash.
>
> Fix things by properly deallocating scheduler specific
> data of the pCPU's pool scheduler during pCPU teardown,
> and re-allocating them --always for &ops-- during pCPU
> bringup.
>
> This also fixes another (latent) bug. In fact, it avoids,
> still in schedule_cpu_switch(), that Credit1's free_vdata()
> is used to deallocate data allocated with Credit2's
> alloc_vdata(). This is not easy to trigger, but only
> because the other bug shown above manifests first and
> crashes the host.
>
> The downside of this patch, is that it adds one more
> allocation on the resume path, which is not ideal. Still,
> there is no better way of fixing the described bugs at
> the moment. Removing (all ideally) allocations happening
> during resume should continue being chased, in the long
> run.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Sorry for the delay.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-10 15:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-13 17:10 [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
2015-11-16 13:10 ` Juergen Gross
2015-11-24 15:32 ` George Dunlap
2015-11-24 17:14   ` Dario Faggioli
2015-12-07 12:21     ` George Dunlap
2015-12-10  8:42       ` Dario Faggioli
2015-12-10 15:13 ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.