xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
@ 2016-04-26 14:25 Julien Grall
  2016-04-26 17:49 ` Dario Faggioli
  0 siblings, 1 reply; 11+ messages in thread
From: Julien Grall @ 2016-04-26 14:25 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap; +Cc: Varun.Swara, Xen Devel, Steve Capper

Hi Dario,

A couple of people have been reported Xen crash on the ARM64
Foundation Model [1] with recent unstable.

The crash seems to happen when Xen fails to bring up secondary CPUs
(see stack trace below).

From my understanding, csched_free_pdata is trying to kill the
timer spc->ticker. However the status of this timer is
TIMER_STATUS_invalid.

This is because csched_init_pdata has set a deadline for the
timer (set_timer) and the softirq to schedule the timer has
not yet happen (indeed Xen is still in early boot).

I am not sure how to fix this issue. How will you recommend
to fix it?

For your information the bisector fingered the following
commit:

commit 64269d936584c29de951c6613bf618640832b9a6
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Fri Apr 8 00:03:03 2016 +0200

    sched: implement .init_pdata in Credit, Credit2 and RTDS
    
    In fact, if a scheduler needs per-pCPU information,
    that needs to be initialized appropriately. So, we take
    the code that is performing initializations from (right
    now) .alloc_pdata, and use it for .init_pdata, leaving
    only actualy allocations in the former, if any (which
    is the case in RTDS and Credit1).
    
    On the other hand, in Credit2, since we don't really
    need any per-pCPU data allocation, everything that was
    being done in .alloc_pdata, is now done in .init_pdata.
    And the fact that now .alloc_pdata can be left undefined,
    allows us to just get rid of it.
    
    Still for Credit2, the fact that .init_pdata is called
    during CPU_STARTING (rather than CPU_UP_PREPARE) kills
    the need for the scheduler to setup a similar callback
    itself, simplifying the code.
    
    And thanks to such simplification, it is now also ok to
    move some of the logic meant at double checking that a
    cpu was (or was not) initialized, into ASSERTS (rather
    than an if() and a BUG_ON).
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
    Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

(XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
(XEN) ----[ Xen-4.7-unstable  arm64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     000000000022e304 timer.c#active_timer+0x8/0x24
(XEN) LR:     000000000022f624
(XEN) SP:     00000000002bfcd0
(XEN) CPSR:   600002c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000000  X1: 0000000000000003  X2: 0000000000000000
(XEN)      X3: 0000000000120012  X4: 0000000000000010  X5: 0000000000000080
(XEN)      X6: 0000000000000004  X7: 0000000000000001  X8: 00000000fffffffd
(XEN)      X9: 000000000000000a X10: 00000000002bfb08 X11: 0000000000000033
(XEN)     X12: 0000000000000001 X13: 0000000000263dc0 X14: 0000000000000020
(XEN)     X15: 0000000000000000 X16: 0000000000000000 X17: 0000000000000000
(XEN)     X18: 0000000000000000 X19: 000080017ffcad68 X20: 000000000030bb00
(XEN)     X21: 00000000000002c0 X22: 0000000000000000 X23: 000000000030bb00
(XEN)     X24: 0000000000306000 X25: 000080017ffcad90 X26: 000000000000ffff
(XEN)     X27: 0000000900000000 X28: 0000000000000002  FP: 00000000002bfcd0
(XEN) 
(XEN)   VTCR_EL2: 80000000
(XEN)  VTTBR_EL2: 0000000000000000
(XEN) 
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000000038643f
(XEN)  TTBR0_EL2: 00000000feef6000
(XEN) 
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000000000
(XEN) 
(XEN) Xen stack trace from sp=00000000002bfcd0:
(XEN)    00000000002bfd20 00000000002208c0 000080017ffca240 0000000000000003
(XEN)    000080017ffcad50 0000000000000240 000080017ffca2a8 0000000000000003
(XEN)    00000000002bfe18 0000000080000000 00000000002bfd60 0000000000227a18
(XEN)    0000000000000003 00000000002807b0 000080017ff9c0c0 0000000000008002
(XEN)    0000000000008000 0000000000000003 00000000002bfd90 0000000000219944
(XEN)    000000000027e318 000000000027e310 000000000027e008 0000000000008002
(XEN)    00000000002bfde0 00000000002015fc 00000000fffffffd 0000000000000000
(XEN)    0000000000000003 0000000000000003 000000000027adb8 0000000000000004
(XEN)    00000000002a80a8 0000000080000000 00000000002bfe20 000000000028f7c0
(XEN)    0000000000000003 000000000027ea00 0000000000306450 0000000000280ab0
(XEN)    0000000900000000 000000000027e000 0000000000000000 00000000810021d8
(XEN)    0000000081002000 0000000080e02000 0000000088000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000001 0000000000000000
(XEN)    0000000000000000 0000000080002174 0000000000000000 0000000088000000
(XEN)    0000000000002106 00000000ff000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
(XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
(XEN)    [<00000000002208c0>] sched_credit.c#csched_free_pdata+0xd8/0x114
(XEN)    [<0000000000227a18>] schedule.c#cpu_schedule_callback+0xc0/0x12c
(XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
(XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
(XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
(XEN)    [<00000000810021d8>] 00000000810021d8
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

Regards,

[1] http://www.arm.com/products/tools/models/fast-models/foundation-model.php

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-04-26 14:25 xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279 Julien Grall
@ 2016-04-26 17:49 ` Dario Faggioli
  2016-04-26 18:05   ` Julien Grall
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dario Faggioli @ 2016-04-26 17:49 UTC (permalink / raw)
  To: Julien Grall, George Dunlap; +Cc: Varun.Swara, Xen Devel, Steve Capper


[-- Attachment #1.1.1: Type: text/plain, Size: 10679 bytes --]

On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
> Hi Dario,
> 
Hi,

> A couple of people have been reported Xen crash on the ARM64
> Foundation Model [1] with recent unstable.
> 
Ok, thanks for reporting.

> The crash seems to happen when Xen fails to bring up secondary CPUs
> (see stack trace below).
> 
Ah... I see.

> From my understanding, csched_free_pdata is trying to kill the
> timer spc->ticker. However the status of this timer is
> TIMER_STATUS_invalid.
> 
> This is because csched_init_pdata has set a deadline for the
> timer (set_timer) and the softirq to schedule the timer has
> not yet happen (indeed Xen is still in early boot).
> 
Yes, this is sort of what happens (only slightly different, see the
changelog of the atached patch patch).


> I am not sure how to fix this issue. How will you recommend
> to fix it?
> 
Yeah, well, doing it cleanly includes a slight change in the scheduler
hooks API, IMO... and we indeed should do it cleanly :-))

George, what do you think?

Honestly, this is similar to what I was thinking to do already (I mean,
having an deinit_pdata hook, "symmetric" with the init_pdata one), when
working on that series, because I do think it's cleaner... then, I
abandoned the idea, as it looked to not be necessary... But apparently
it may actually be! :-)

Let me know, and I'll resubmit the patch properly (together with
another bugfix I have in my queue).

Dario
---
commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Tue Apr 26 17:42:36 2016 +0200

    xen: sched: fix killing an uninitialized timer in free_pdata.
    
    commit 64269d9365 "sched: implement .init_pdata in Credit,
    Credit2 and RTDS" helped fixing Credit2 runqueues, and
    the races we had in switching scheduler for pCPUs, but
    introduced another issue. In fact, if CPU bringup fails
    during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
    but before CPU_STARTING) the CPU_UP_CANCELED notifier
    would be executed, which calls the free_pdata hook.
    
    Such hook does two things: (1) undo the initialization
    done inside the init_pdata hook; (2) free the memory
    allocated by the alloc_pdata hook.
    
    However, in the failure path just described, it is possible
    that only alloc_pdata has really been called, which is
    potentially and issue (depending on what actually happens
    inside the implementation of free_pdata).
    
    In fact, for Credit1 (the only scheduler that actually
    allocates per-pCPU data), this result in calling kill_timer()
    on a timer that had not yet been initialized, which causes
    the following:
    
    (XEN) Xen call trace:
    (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
    (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
    (XEN)    [<00000000002208c0>] sched_credit.c#csched_free_pdata+0xd8/0x114
    (XEN)    [<0000000000227a18>] schedule.c#cpu_schedule_callback+0xc0/0x12c
    (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
    (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
    (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
    (XEN)    [<00000000810021d8>] 00000000810021d8
    (XEN)
    (XEN)
    (XEN) ****************************************
    (XEN) Panic on CPU 0:
    (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
    (XEN) ****************************************
    
    Solve this by making the scheduler hooks API symmetric again,
    i.e., by adding an deinit_pdata hook and making it responsible
    of undoing what init_pdata did, rather than asking to free_pdata
    to do everything.
    
    This is cleaner and, in the case at hand, makes it possible to
    only call free_pdata, which is the right thing to do, as only
    allocation and no initialization was performed.
    
    Reported-by: Julien Grall <julien.grall@arm.com>
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    ---
    Cc: George Dunlap <george.dunlap@citrix.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Varun.Swara@arm.com
    Cc: Steve Capper <Steve.Capper@arm.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index bc36837..0a6a1b4 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu *new)
 }
 
 static void
-csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+csched_free_pdata(const struct scheduler *ops, void *pcpu)
 {
-    struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc = pcpu;
-    unsigned long flags;
 
     if ( spc == NULL )
         return;
 
+    xfree(spc);
+}
+
+static void
+csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+{
+    struct csched_private *prv = CSCHED_PRIV(ops);
+    struct csched_pcpu *spc = pcpu;
+    unsigned long flags;
+
+    ASSERT(spc != NULL);
+
     spin_lock_irqsave(&prv->lock, flags);
 
     prv->credit -= prv->credits_per_tslice;
@@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
         kill_timer(&prv->master_ticker);
 
     spin_unlock_irqrestore(&prv->lock, flags);
-
-    xfree(spc);
 }
 
 static void *
@@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
     .free_vdata     = csched_free_vdata,
     .alloc_pdata    = csched_alloc_pdata,
     .init_pdata     = csched_init_pdata,
+    .deinit_pdata   = csched_deinit_pdata,
     .free_pdata     = csched_free_pdata,
     .switch_sched   = csched_switch_sched,
     .alloc_domdata  = csched_alloc_domdata,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 46b9279..f4c37b4 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 }
 
 static void
-csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
     unsigned long flags;
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     struct csched2_runqueue_data *rqd;
     int rqi;
 
+    ASSERT(pcpu == NULL);
+
     spin_lock_irqsave(&prv->lock, flags);
 
     ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
@@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
     .alloc_vdata    = csched2_alloc_vdata,
     .free_vdata     = csched2_free_vdata,
     .init_pdata     = csched2_init_pdata,
-    .free_pdata     = csched2_free_pdata,
+    .deinit_pdata   = csched2_deinit_pdata,
     .switch_sched   = csched2_switch_sched,
     .alloc_domdata  = csched2_alloc_domdata,
     .free_domdata   = csched2_free_domdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5546999..1a64521 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
-    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+    SCHED_OP(sched, free_pdata, sd->sched_priv);
     SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
 
     idle_vcpu[cpu]->sched_priv = NULL;
@@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;
-    case CPU_UP_CANCELED:
     case CPU_DEAD:
+        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
+        /* Fallthrough */
+    case CPU_UP_CANCELED:
         cpu_schedule_down(cpu);
         break;
     default:
@@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
-        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
+        SCHED_OP(new_ops, free_pdata, ppriv);
         return -ENOMEM;
     }
 
@@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     SCHED_OP(new_ops, tick_resume, cpu);
 
     SCHED_OP(old_ops, free_vdata, vpriv_old);
-    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
+    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
+    SCHED_OP(old_ops, free_pdata, ppriv_old);
 
  out:
     per_cpu(cpupool, cpu) = c;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 1db7c8d..240f66c 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -135,9 +135,10 @@ struct scheduler {
     void         (*free_vdata)     (const struct scheduler *, void *);
     void *       (*alloc_vdata)    (const struct scheduler *, struct vcpu *,
                                     void *);
-    void         (*free_pdata)     (const struct scheduler *, void *, int);
+    void         (*free_pdata)     (const struct scheduler *, void *);
     void *       (*alloc_pdata)    (const struct scheduler *, int);
     void         (*init_pdata)     (const struct scheduler *, void *, int);
+    void         (*deinit_pdata)   (const struct scheduler *, void *, int);
     void         (*free_domdata)   (const struct scheduler *, void *);
     void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);
 
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.1.2: sched-for-julien.patch --]
[-- Type: text/x-patch, Size: 7862 bytes --]

commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Tue Apr 26 17:42:36 2016 +0200

    xen: sched: fix killing an uninitialized timer in free_pdata.
    
    commit 64269d9365 "sched: implement .init_pdata in Credit,
    Credit2 and RTDS" helped fixing Credit2 runqueues, and
    the races we had in switching scheduler for pCPUs, but
    introduced another issue. In fact, if CPU bringup fails
    during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
    but before CPU_STARTING) the CPU_UP_CANCELED notifier
    would be executed, which calls the free_pdata hook.
    
    Such hook does two things: (1) undo the initialization
    done inside the init_pdata hook; (2) free the memory
    allocated by the alloc_pdata hook.
    
    However, in the failure path just described, it is possible
    that only alloc_pdata has really been called, which is
    potentially and issue (depending on what actually happens
    inside the implementation of free_pdata).
    
    In fact, for Credit1 (the only scheduler that actually
    allocates per-pCPU data), this result in calling kill_timer()
    on a timer that had not yet been initialized, which causes
    the following:
    
    (XEN) Xen call trace:
    (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
    (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
    (XEN)    [<00000000002208c0>] sched_credit.c#csched_free_pdata+0xd8/0x114
    (XEN)    [<0000000000227a18>] schedule.c#cpu_schedule_callback+0xc0/0x12c
    (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
    (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
    (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
    (XEN)    [<00000000810021d8>] 00000000810021d8
    (XEN)
    (XEN)
    (XEN) ****************************************
    (XEN) Panic on CPU 0:
    (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
    (XEN) ****************************************
    
    Solve this by making the scheduler hooks API symmetric again,
    i.e., by adding an deinit_pdata hook and making it responsible
    of undoing what init_pdata did, rather than asking to free_pdata
    to do everything.
    
    This is cleaner and, in the case at hand, makes it possible to
    only call free_pdata, which is the right thing to do, as only
    allocation and no initialization was performed.
    
    Reported-by: Julien Grall <julien.grall@arm.com>
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    ---
    Cc: George Dunlap <george.dunlap@citrix.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Varun.Swara@arm.com
    Cc: Steve Capper <Steve.Capper@arm.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index bc36837..0a6a1b4 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu *new)
 }
 
 static void
-csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+csched_free_pdata(const struct scheduler *ops, void *pcpu)
 {
-    struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc = pcpu;
-    unsigned long flags;
 
     if ( spc == NULL )
         return;
 
+    xfree(spc);
+}
+
+static void
+csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+{
+    struct csched_private *prv = CSCHED_PRIV(ops);
+    struct csched_pcpu *spc = pcpu;
+    unsigned long flags;
+
+    ASSERT(spc != NULL);
+
     spin_lock_irqsave(&prv->lock, flags);
 
     prv->credit -= prv->credits_per_tslice;
@@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
         kill_timer(&prv->master_ticker);
 
     spin_unlock_irqrestore(&prv->lock, flags);
-
-    xfree(spc);
 }
 
 static void *
@@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
     .free_vdata     = csched_free_vdata,
     .alloc_pdata    = csched_alloc_pdata,
     .init_pdata     = csched_init_pdata,
+    .deinit_pdata   = csched_deinit_pdata,
     .free_pdata     = csched_free_pdata,
     .switch_sched   = csched_switch_sched,
     .alloc_domdata  = csched_alloc_domdata,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 46b9279..f4c37b4 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 }
 
 static void
-csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
     unsigned long flags;
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     struct csched2_runqueue_data *rqd;
     int rqi;
 
+    ASSERT(pcpu == NULL);
+
     spin_lock_irqsave(&prv->lock, flags);
 
     ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
@@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
     .alloc_vdata    = csched2_alloc_vdata,
     .free_vdata     = csched2_free_vdata,
     .init_pdata     = csched2_init_pdata,
-    .free_pdata     = csched2_free_pdata,
+    .deinit_pdata   = csched2_deinit_pdata,
     .switch_sched   = csched2_switch_sched,
     .alloc_domdata  = csched2_alloc_domdata,
     .free_domdata   = csched2_free_domdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5546999..1a64521 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
-    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+    SCHED_OP(sched, free_pdata, sd->sched_priv);
     SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
 
     idle_vcpu[cpu]->sched_priv = NULL;
@@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;
-    case CPU_UP_CANCELED:
     case CPU_DEAD:
+        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
+        /* Fallthrough */
+    case CPU_UP_CANCELED:
         cpu_schedule_down(cpu);
         break;
     default:
@@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
-        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
+        SCHED_OP(new_ops, free_pdata, ppriv);
         return -ENOMEM;
     }
 
@@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     SCHED_OP(new_ops, tick_resume, cpu);
 
     SCHED_OP(old_ops, free_vdata, vpriv_old);
-    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
+    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
+    SCHED_OP(old_ops, free_pdata, ppriv_old);
 
  out:
     per_cpu(cpupool, cpu) = c;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 1db7c8d..240f66c 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -135,9 +135,10 @@ struct scheduler {
     void         (*free_vdata)     (const struct scheduler *, void *);
     void *       (*alloc_vdata)    (const struct scheduler *, struct vcpu *,
                                     void *);
-    void         (*free_pdata)     (const struct scheduler *, void *, int);
+    void         (*free_pdata)     (const struct scheduler *, void *);
     void *       (*alloc_pdata)    (const struct scheduler *, int);
     void         (*init_pdata)     (const struct scheduler *, void *, int);
+    void         (*deinit_pdata)   (const struct scheduler *, void *, int);
     void         (*free_domdata)   (const struct scheduler *, void *);
     void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);
 

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-04-26 17:49 ` Dario Faggioli
@ 2016-04-26 18:05   ` Julien Grall
  2016-04-27 13:43   ` George Dunlap
  2016-05-03 13:03   ` Julien Grall
  2 siblings, 0 replies; 11+ messages in thread
From: Julien Grall @ 2016-04-26 18:05 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap; +Cc: Varun.Swara, Xen Devel, Steve Capper



On 26/04/16 18:49, Dario Faggioli wrote:
> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
> Hi,

Hi Dario,

[...]

> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
> Author: Dario Faggioli <dario.faggioli@citrix.com>
> Date:   Tue Apr 26 17:42:36 2016 +0200
>
>      xen: sched: fix killing an uninitialized timer in free_pdata.
>
>      commit 64269d9365 "sched: implement .init_pdata in Credit,
>      Credit2 and RTDS" helped fixing Credit2 runqueues, and
>      the races we had in switching scheduler for pCPUs, but
>      introduced another issue. In fact, if CPU bringup fails
>      during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>      but before CPU_STARTING) the CPU_UP_CANCELED notifier
>      would be executed, which calls the free_pdata hook.
>
>      Such hook does two things: (1) undo the initialization
>      done inside the init_pdata hook; (2) free the memory
>      allocated by the alloc_pdata hook.
>
>      However, in the failure path just described, it is possible
>      that only alloc_pdata has really been called, which is
>      potentially and issue (depending on what actually happens
>      inside the implementation of free_pdata).
>
>      In fact, for Credit1 (the only scheduler that actually
>      allocates per-pCPU data), this result in calling kill_timer()
>      on a timer that had not yet been initialized, which causes
>      the following:
>
>      (XEN) Xen call trace:
>      (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>      (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>      (XEN)    [<00000000002208c0>] sched_credit.c#csched_free_pdata+0xd8/0x114
>      (XEN)    [<0000000000227a18>] schedule.c#cpu_schedule_callback+0xc0/0x12c
>      (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
>      (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
>      (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
>      (XEN)    [<00000000810021d8>] 00000000810021d8
>      (XEN)
>      (XEN)
>      (XEN) ****************************************
>      (XEN) Panic on CPU 0:
>      (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
>      (XEN) ****************************************
>
>      Solve this by making the scheduler hooks API symmetric again,
>      i.e., by adding an deinit_pdata hook and making it responsible
>      of undoing what init_pdata did, rather than asking to free_pdata
>      to do everything.
>
>      This is cleaner and, in the case at hand, makes it possible to
>      only call free_pdata, which is the right thing to do, as only
>      allocation and no initialization was performed.
>
>      Reported-by: Julien Grall <julien.grall@arm.com>
>      Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>      ---
>      Cc: George Dunlap <george.dunlap@citrix.com>
>      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>      Cc: Varun.Swara@arm.com
>      Cc: Steve Capper <Steve.Capper@arm.com>

Thank you for the patch. It fixes the bug when secondary CPUs fail to 
come online on the Foundation Model:

Tested-by: Julien Grall <julien.grall@arm.com>

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-04-26 17:49 ` Dario Faggioli
  2016-04-26 18:05   ` Julien Grall
@ 2016-04-27 13:43   ` George Dunlap
  2016-04-27 14:05     ` Dario Faggioli
  2016-05-03 13:03   ` Julien Grall
  2 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2016-04-27 13:43 UTC (permalink / raw)
  To: Dario Faggioli, Julien Grall; +Cc: Varun.Swara, Xen Devel, Steve Capper

On 26/04/16 18:49, Dario Faggioli wrote:
> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>> Hi Dario,
>>
> Hi,
> 
>> A couple of people have been reported Xen crash on the ARM64
>> Foundation Model [1] with recent unstable.
>>
> Ok, thanks for reporting.
> 
>> The crash seems to happen when Xen fails to bring up secondary CPUs
>> (see stack trace below).
>>
> Ah... I see.
> 
>> From my understanding, csched_free_pdata is trying to kill the
>> timer spc->ticker. However the status of this timer is
>> TIMER_STATUS_invalid.
>>
>> This is because csched_init_pdata has set a deadline for the
>> timer (set_timer) and the softirq to schedule the timer has
>> not yet happen (indeed Xen is still in early boot).
>>
> Yes, this is sort of what happens (only slightly different, see the
> changelog of the atached patch patch).
> 
> 
>> I am not sure how to fix this issue. How will you recommend
>> to fix it?
>>
> Yeah, well, doing it cleanly includes a slight change in the scheduler
> hooks API, IMO... and we indeed should do it cleanly :-))
> 
> George, what do you think?
> 
> Honestly, this is similar to what I was thinking to do already (I mean,
> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
> working on that series, because I do think it's cleaner... then, I
> abandoned the idea, as it looked to not be necessary... But apparently
> it may actually be! :-)
> 
> Let me know, and I'll resubmit the patch properly (together with
> another bugfix I have in my queue).

Yeah, assuming the description in your changeset is accurate, this seems
like the right approach.

The main thing to add here I think is that we need to document what
different circumstances under which the various functions may be called
-- for instance, in credit1 free_pdata(), it seems to expect that spc
may == null at some point.  Future schedulers need to know the
circumstances under which this might happen so they can DTRT.

It might be nice at some point to have the alloc / free / init / deinit
functions in credit1 ordered in a rational way so that they could be
understood by glancing at them, rather than having to jump around, but
that's probably a nice-to-have clean-up for another time. :-)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-04-27 13:43   ` George Dunlap
@ 2016-04-27 14:05     ` Dario Faggioli
  2016-04-27 14:29       ` George Dunlap
  0 siblings, 1 reply; 11+ messages in thread
From: Dario Faggioli @ 2016-04-27 14:05 UTC (permalink / raw)
  To: George Dunlap, Julien Grall; +Cc: Varun.Swara, Xen Devel, Steve Capper


[-- Attachment #1.1: Type: text/plain, Size: 1889 bytes --]

On Wed, 2016-04-27 at 14:43 +0100, George Dunlap wrote:
> On 26/04/16 18:49, Dario Faggioli wrote:
> > 
> > Let me know, and I'll resubmit the patch properly (together with
> > another bugfix I have in my queue).
> Yeah, assuming the description in your changeset is accurate, this
> seems
> like the right approach.
> 
Ok, thanks for having a look, I'll submit a proper series.

> The main thing to add here I think is that we need to document what
> different circumstances under which the various functions may be
> called
> -- for instance, in credit1 free_pdata(), it seems to expect that spc
> may == null at some point.  Future schedulers need to know the
> circumstances under which this might happen so they can DTRT.
> 
I saw that too (many times). And in fact, I'm not sure whether that can
actually happen or not, but I certainly can look at this.

And if by "document what different circumstances under which the
various functions may be called" you mean adding comments to that
effect somewhere, I'm up for that (I just need to figure out where it
would be best to put such comments).

> It might be nice at some point to have the alloc / free / init /
> deinit
> functions in credit1 ordered in a rational way so that they could be
> understood by glancing at them, rather than having to jump around,
> but
> that's probably a nice-to-have clean-up for another time. :-)
> 
When you say "ordered" you mean the order in which they appear in the
source file? If yes, I agree, but no, I'm not doing that right now (but
I can queue this for when 4.8 opens).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-04-27 14:05     ` Dario Faggioli
@ 2016-04-27 14:29       ` George Dunlap
  0 siblings, 0 replies; 11+ messages in thread
From: George Dunlap @ 2016-04-27 14:29 UTC (permalink / raw)
  To: Dario Faggioli, Julien Grall; +Cc: Varun.Swara, Xen Devel, Steve Capper

On 27/04/16 15:05, Dario Faggioli wrote:
> On Wed, 2016-04-27 at 14:43 +0100, George Dunlap wrote:
>> On 26/04/16 18:49, Dario Faggioli wrote:
>>>  
>>> Let me know, and I'll resubmit the patch properly (together with
>>> another bugfix I have in my queue).
>> Yeah, assuming the description in your changeset is accurate, this
>> seems
>> like the right approach.
>>
> Ok, thanks for having a look, I'll submit a proper series.
> 
>> The main thing to add here I think is that we need to document what
>> different circumstances under which the various functions may be
>> called
>> -- for instance, in credit1 free_pdata(), it seems to expect that spc
>> may == null at some point.  Future schedulers need to know the
>> circumstances under which this might happen so they can DTRT.
>>
> I saw that too (many times). And in fact, I'm not sure whether that can
> actually happen or not, but I certainly can look at this.
> 
> And if by "document what different circumstances under which the
> various functions may be called" you mean adding comments to that
> effect somewhere, I'm up for that (I just need to figure out where it
> would be best to put such comments).

Yes, comments is what I meant.  At very least it should be documented
that alloc/free might be called without init/deinit netsed inside.  If
you're also not sure how free_pdata() could be called with spc == null,
then I think we can probably punt an investigation into that until later
as well.


>> It might be nice at some point to have the alloc / free / init /
>> deinit
>> functions in credit1 ordered in a rational way so that they could be
>> understood by glancing at them, rather than having to jump around,
>> but
>> that's probably a nice-to-have clean-up for another time. :-)
>>
> When you say "ordered" you mean the order in which they appear in the
> source file? If yes, I agree, but no, I'm not doing that right now (but
> I can queue this for when 4.8 opens).

Sorry, yes I meant ordered in the source file.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-04-26 17:49 ` Dario Faggioli
  2016-04-26 18:05   ` Julien Grall
  2016-04-27 13:43   ` George Dunlap
@ 2016-05-03 13:03   ` Julien Grall
  2016-05-03 13:20     ` George Dunlap
  2 siblings, 1 reply; 11+ messages in thread
From: Julien Grall @ 2016-05-03 13:03 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap; +Cc: Varun.Swara, Xen Devel, Steve Capper

Hi Dario and George,

What is the status of this patch? It would be nice to have it for Xen 
4.7 to avoid unwanted crash when secondary CPUs fails to come online.

Regards,

On 26/04/16 18:49, Dario Faggioli wrote:
> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>> Hi Dario,
>>
> Hi,
>
>> A couple of people have been reported Xen crash on the ARM64
>> Foundation Model [1] with recent unstable.
>>
> Ok, thanks for reporting.
>
>> The crash seems to happen when Xen fails to bring up secondary CPUs
>> (see stack trace below).
>>
> Ah... I see.
>
>>  From my understanding, csched_free_pdata is trying to kill the
>> timer spc->ticker. However the status of this timer is
>> TIMER_STATUS_invalid.
>>
>> This is because csched_init_pdata has set a deadline for the
>> timer (set_timer) and the softirq to schedule the timer has
>> not yet happen (indeed Xen is still in early boot).
>>
> Yes, this is sort of what happens (only slightly different, see the
> changelog of the atached patch patch).
>
>
>> I am not sure how to fix this issue. How will you recommend
>> to fix it?
>>
> Yeah, well, doing it cleanly includes a slight change in the scheduler
> hooks API, IMO... and we indeed should do it cleanly :-))
>
> George, what do you think?
>
> Honestly, this is similar to what I was thinking to do already (I mean,
> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
> working on that series, because I do think it's cleaner... then, I
> abandoned the idea, as it looked to not be necessary... But apparently
> it may actually be! :-)
>
> Let me know, and I'll resubmit the patch properly (together with
> another bugfix I have in my queue).
>
> Dario
> ---
> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
> Author: Dario Faggioli <dario.faggioli@citrix.com>
> Date:   Tue Apr 26 17:42:36 2016 +0200
>
>      xen: sched: fix killing an uninitialized timer in free_pdata.
>
>      commit 64269d9365 "sched: implement .init_pdata in Credit,
>      Credit2 and RTDS" helped fixing Credit2 runqueues, and
>      the races we had in switching scheduler for pCPUs, but
>      introduced another issue. In fact, if CPU bringup fails
>      during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>      but before CPU_STARTING) the CPU_UP_CANCELED notifier
>      would be executed, which calls the free_pdata hook.
>
>      Such hook does two things: (1) undo the initialization
>      done inside the init_pdata hook; (2) free the memory
>      allocated by the alloc_pdata hook.
>
>      However, in the failure path just described, it is possible
>      that only alloc_pdata has really been called, which is
>      potentially and issue (depending on what actually happens
>      inside the implementation of free_pdata).
>
>      In fact, for Credit1 (the only scheduler that actually
>      allocates per-pCPU data), this result in calling kill_timer()
>      on a timer that had not yet been initialized, which causes
>      the following:
>
>      (XEN) Xen call trace:
>      (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>      (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>      (XEN)    [<00000000002208c0>] sched_credit.c#csched_free_pdata+0xd8/0x114
>      (XEN)    [<0000000000227a18>] schedule.c#cpu_schedule_callback+0xc0/0x12c
>      (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
>      (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
>      (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
>      (XEN)    [<00000000810021d8>] 00000000810021d8
>      (XEN)
>      (XEN)
>      (XEN) ****************************************
>      (XEN) Panic on CPU 0:
>      (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
>      (XEN) ****************************************
>
>      Solve this by making the scheduler hooks API symmetric again,
>      i.e., by adding an deinit_pdata hook and making it responsible
>      of undoing what init_pdata did, rather than asking to free_pdata
>      to do everything.
>
>      This is cleaner and, in the case at hand, makes it possible to
>      only call free_pdata, which is the right thing to do, as only
>      allocation and no initialization was performed.
>
>      Reported-by: Julien Grall <julien.grall@arm.com>
>      Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>      ---
>      Cc: George Dunlap <george.dunlap@citrix.com>
>      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>      Cc: Varun.Swara@arm.com
>      Cc: Steve Capper <Steve.Capper@arm.com>
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index bc36837..0a6a1b4 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu *new)
>   }
>
>   static void
> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +csched_free_pdata(const struct scheduler *ops, void *pcpu)
>   {
> -    struct csched_private *prv = CSCHED_PRIV(ops);
>       struct csched_pcpu *spc = pcpu;
> -    unsigned long flags;
>
>       if ( spc == NULL )
>           return;
>
> +    xfree(spc);
> +}
> +
> +static void
> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +{
> +    struct csched_private *prv = CSCHED_PRIV(ops);
> +    struct csched_pcpu *spc = pcpu;
> +    unsigned long flags;
> +
> +    ASSERT(spc != NULL);
> +
>       spin_lock_irqsave(&prv->lock, flags);
>
>       prv->credit -= prv->credits_per_tslice;
> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>           kill_timer(&prv->master_ticker);
>
>       spin_unlock_irqrestore(&prv->lock, flags);
> -
> -    xfree(spc);
>   }
>
>   static void *
> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
>       .free_vdata     = csched_free_vdata,
>       .alloc_pdata    = csched_alloc_pdata,
>       .init_pdata     = csched_init_pdata,
> +    .deinit_pdata   = csched_deinit_pdata,
>       .free_pdata     = csched_free_pdata,
>       .switch_sched   = csched_switch_sched,
>       .alloc_domdata  = csched_alloc_domdata,
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 46b9279..f4c37b4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
>   }
>
>   static void
> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>   {
>       unsigned long flags;
>       struct csched2_private *prv = CSCHED2_PRIV(ops);
>       struct csched2_runqueue_data *rqd;
>       int rqi;
>
> +    ASSERT(pcpu == NULL);
> +
>       spin_lock_irqsave(&prv->lock, flags);
>
>       ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
>       .alloc_vdata    = csched2_alloc_vdata,
>       .free_vdata     = csched2_free_vdata,
>       .init_pdata     = csched2_init_pdata,
> -    .free_pdata     = csched2_free_pdata,
> +    .deinit_pdata   = csched2_deinit_pdata,
>       .switch_sched   = csched2_switch_sched,
>       .alloc_domdata  = csched2_alloc_domdata,
>       .free_domdata   = csched2_free_domdata,
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 5546999..1a64521 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
>       struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>       struct scheduler *sched = per_cpu(scheduler, cpu);
>
> -    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
> +    SCHED_OP(sched, free_pdata, sd->sched_priv);
>       SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>
>       idle_vcpu[cpu]->sched_priv = NULL;
> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
>       case CPU_UP_PREPARE:
>           rc = cpu_schedule_up(cpu);
>           break;
> -    case CPU_UP_CANCELED:
>       case CPU_DEAD:
> +        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
> +        /* Fallthrough */
> +    case CPU_UP_CANCELED:
>           cpu_schedule_down(cpu);
>           break;
>       default:
> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>       vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
>       if ( vpriv == NULL )
>       {
> -        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
> +        SCHED_OP(new_ops, free_pdata, ppriv);
>           return -ENOMEM;
>       }
>
> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>       SCHED_OP(new_ops, tick_resume, cpu);
>
>       SCHED_OP(old_ops, free_vdata, vpriv_old);
> -    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
> +    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
> +    SCHED_OP(old_ops, free_pdata, ppriv_old);
>
>    out:
>       per_cpu(cpupool, cpu) = c;
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> index 1db7c8d..240f66c 100644
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -135,9 +135,10 @@ struct scheduler {
>       void         (*free_vdata)     (const struct scheduler *, void *);
>       void *       (*alloc_vdata)    (const struct scheduler *, struct vcpu *,
>                                       void *);
> -    void         (*free_pdata)     (const struct scheduler *, void *, int);
> +    void         (*free_pdata)     (const struct scheduler *, void *);
>       void *       (*alloc_pdata)    (const struct scheduler *, int);
>       void         (*init_pdata)     (const struct scheduler *, void *, int);
> +    void         (*deinit_pdata)   (const struct scheduler *, void *, int);
>       void         (*free_domdata)   (const struct scheduler *, void *);
>       void *       (*alloc_domdata)  (const struct scheduler *, struct domain *);
>
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-05-03 13:03   ` Julien Grall
@ 2016-05-03 13:20     ` George Dunlap
  2016-05-03 13:22       ` Julien Grall
  2016-05-03 13:23       ` Wei Liu
  0 siblings, 2 replies; 11+ messages in thread
From: George Dunlap @ 2016-05-03 13:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Varun.Swara, Dario Faggioli, Steve Capper, Wei Liu, Xen Devel

On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Dario and George,
>
> What is the status of this patch? It would be nice to have it for Xen 4.7 to
> avoid unwanted crash when secondary CPUs fails to come online.

Wei, can you put this on your release blockers list?  (Julien, I take
it this should be a blocker, right?)

 -George

>
> Regards,
>
> On 26/04/16 18:49, Dario Faggioli wrote:
>>
>> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>>>
>>> Hi Dario,
>>>
>> Hi,
>>
>>> A couple of people have been reported Xen crash on the ARM64
>>> Foundation Model [1] with recent unstable.
>>>
>> Ok, thanks for reporting.
>>
>>> The crash seems to happen when Xen fails to bring up secondary CPUs
>>> (see stack trace below).
>>>
>> Ah... I see.
>>
>>>  From my understanding, csched_free_pdata is trying to kill the
>>> timer spc->ticker. However the status of this timer is
>>> TIMER_STATUS_invalid.
>>>
>>> This is because csched_init_pdata has set a deadline for the
>>> timer (set_timer) and the softirq to schedule the timer has
>>> not yet happen (indeed Xen is still in early boot).
>>>
>> Yes, this is sort of what happens (only slightly different, see the
>> changelog of the atached patch patch).
>>
>>
>>> I am not sure how to fix this issue. How will you recommend
>>> to fix it?
>>>
>> Yeah, well, doing it cleanly includes a slight change in the scheduler
>> hooks API, IMO... and we indeed should do it cleanly :-))
>>
>> George, what do you think?
>>
>> Honestly, this is similar to what I was thinking to do already (I mean,
>> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
>> working on that series, because I do think it's cleaner... then, I
>> abandoned the idea, as it looked to not be necessary... But apparently
>> it may actually be! :-)
>>
>> Let me know, and I'll resubmit the patch properly (together with
>> another bugfix I have in my queue).
>>
>> Dario
>> ---
>> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
>> Author: Dario Faggioli <dario.faggioli@citrix.com>
>> Date:   Tue Apr 26 17:42:36 2016 +0200
>>
>>      xen: sched: fix killing an uninitialized timer in free_pdata.
>>
>>      commit 64269d9365 "sched: implement .init_pdata in Credit,
>>      Credit2 and RTDS" helped fixing Credit2 runqueues, and
>>      the races we had in switching scheduler for pCPUs, but
>>      introduced another issue. In fact, if CPU bringup fails
>>      during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>>      but before CPU_STARTING) the CPU_UP_CANCELED notifier
>>      would be executed, which calls the free_pdata hook.
>>
>>      Such hook does two things: (1) undo the initialization
>>      done inside the init_pdata hook; (2) free the memory
>>      allocated by the alloc_pdata hook.
>>
>>      However, in the failure path just described, it is possible
>>      that only alloc_pdata has really been called, which is
>>      potentially and issue (depending on what actually happens
>>      inside the implementation of free_pdata).
>>
>>      In fact, for Credit1 (the only scheduler that actually
>>      allocates per-pCPU data), this result in calling kill_timer()
>>      on a timer that had not yet been initialized, which causes
>>      the following:
>>
>>      (XEN) Xen call trace:
>>      (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>>      (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>>      (XEN)    [<00000000002208c0>]
>> sched_credit.c#csched_free_pdata+0xd8/0x114
>>      (XEN)    [<0000000000227a18>]
>> schedule.c#cpu_schedule_callback+0xc0/0x12c
>>      (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
>>      (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
>>      (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
>>      (XEN)    [<00000000810021d8>] 00000000810021d8
>>      (XEN)
>>      (XEN)
>>      (XEN) ****************************************
>>      (XEN) Panic on CPU 0:
>>      (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
>> timer.c:279
>>      (XEN) ****************************************
>>
>>      Solve this by making the scheduler hooks API symmetric again,
>>      i.e., by adding an deinit_pdata hook and making it responsible
>>      of undoing what init_pdata did, rather than asking to free_pdata
>>      to do everything.
>>
>>      This is cleaner and, in the case at hand, makes it possible to
>>      only call free_pdata, which is the right thing to do, as only
>>      allocation and no initialization was performed.
>>
>>      Reported-by: Julien Grall <julien.grall@arm.com>
>>      Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>>      ---
>>      Cc: George Dunlap <george.dunlap@citrix.com>
>>      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>      Cc: Varun.Swara@arm.com
>>      Cc: Steve Capper <Steve.Capper@arm.com>
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index bc36837..0a6a1b4 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu
>> *new)
>>   }
>>
>>   static void
>> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched_free_pdata(const struct scheduler *ops, void *pcpu)
>>   {
>> -    struct csched_private *prv = CSCHED_PRIV(ops);
>>       struct csched_pcpu *spc = pcpu;
>> -    unsigned long flags;
>>
>>       if ( spc == NULL )
>>           return;
>>
>> +    xfree(spc);
>> +}
>> +
>> +static void
>> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +{
>> +    struct csched_private *prv = CSCHED_PRIV(ops);
>> +    struct csched_pcpu *spc = pcpu;
>> +    unsigned long flags;
>> +
>> +    ASSERT(spc != NULL);
>> +
>>       spin_lock_irqsave(&prv->lock, flags);
>>
>>       prv->credit -= prv->credits_per_tslice;
>> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void
>> *pcpu, int cpu)
>>           kill_timer(&prv->master_ticker);
>>
>>       spin_unlock_irqrestore(&prv->lock, flags);
>> -
>> -    xfree(spc);
>>   }
>>
>>   static void *
>> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
>>       .free_vdata     = csched_free_vdata,
>>       .alloc_pdata    = csched_alloc_pdata,
>>       .init_pdata     = csched_init_pdata,
>> +    .deinit_pdata   = csched_deinit_pdata,
>>       .free_pdata     = csched_free_pdata,
>>       .switch_sched   = csched_switch_sched,
>>       .alloc_domdata  = csched_alloc_domdata,
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 46b9279..f4c37b4 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops,
>> unsigned int cpu,
>>   }
>>
>>   static void
>> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>>   {
>>       unsigned long flags;
>>       struct csched2_private *prv = CSCHED2_PRIV(ops);
>>       struct csched2_runqueue_data *rqd;
>>       int rqi;
>>
>> +    ASSERT(pcpu == NULL);
>> +
>>       spin_lock_irqsave(&prv->lock, flags);
>>
>>       ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
>> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
>>       .alloc_vdata    = csched2_alloc_vdata,
>>       .free_vdata     = csched2_free_vdata,
>>       .init_pdata     = csched2_init_pdata,
>> -    .free_pdata     = csched2_free_pdata,
>> +    .deinit_pdata   = csched2_deinit_pdata,
>>       .switch_sched   = csched2_switch_sched,
>>       .alloc_domdata  = csched2_alloc_domdata,
>>       .free_domdata   = csched2_free_domdata,
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 5546999..1a64521 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
>>       struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>>       struct scheduler *sched = per_cpu(scheduler, cpu);
>>
>> -    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>> +    SCHED_OP(sched, free_pdata, sd->sched_priv);
>>       SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>>
>>       idle_vcpu[cpu]->sched_priv = NULL;
>> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
>>       case CPU_UP_PREPARE:
>>           rc = cpu_schedule_up(cpu);
>>           break;
>> -    case CPU_UP_CANCELED:
>>       case CPU_DEAD:
>> +        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>> +        /* Fallthrough */
>> +    case CPU_UP_CANCELED:
>>           cpu_schedule_down(cpu);
>>           break;
>>       default:
>> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>>       vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
>> idle->domain->sched_priv);
>>       if ( vpriv == NULL )
>>       {
>> -        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
>> +        SCHED_OP(new_ops, free_pdata, ppriv);
>>           return -ENOMEM;
>>       }
>>
>> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>>       SCHED_OP(new_ops, tick_resume, cpu);
>>
>>       SCHED_OP(old_ops, free_vdata, vpriv_old);
>> -    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
>> +    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
>> +    SCHED_OP(old_ops, free_pdata, ppriv_old);
>>
>>    out:
>>       per_cpu(cpupool, cpu) = c;
>> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
>> index 1db7c8d..240f66c 100644
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -135,9 +135,10 @@ struct scheduler {
>>       void         (*free_vdata)     (const struct scheduler *, void *);
>>       void *       (*alloc_vdata)    (const struct scheduler *, struct
>> vcpu *,
>>                                       void *);
>> -    void         (*free_pdata)     (const struct scheduler *, void *,
>> int);
>> +    void         (*free_pdata)     (const struct scheduler *, void *);
>>       void *       (*alloc_pdata)    (const struct scheduler *, int);
>>       void         (*init_pdata)     (const struct scheduler *, void *,
>> int);
>> +    void         (*deinit_pdata)   (const struct scheduler *, void *,
>> int);
>>       void         (*free_domdata)   (const struct scheduler *, void *);
>>       void *       (*alloc_domdata)  (const struct scheduler *, struct
>> domain *);
>>
>>
>
> --
> Julien Grall
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-05-03 13:20     ` George Dunlap
@ 2016-05-03 13:22       ` Julien Grall
  2016-05-03 13:23       ` Wei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Julien Grall @ 2016-05-03 13:22 UTC (permalink / raw)
  To: George Dunlap
  Cc: Varun.Swara, Dario Faggioli, Steve Capper, Wei Liu, Xen Devel

Hi George,

On 03/05/16 14:20, George Dunlap wrote:
> On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@arm.com> wrote:
>> Hi Dario and George,
>>
>> What is the status of this patch? It would be nice to have it for Xen 4.7 to
>> avoid unwanted crash when secondary CPUs fails to come online.
>
> Wei, can you put this on your release blockers list?  (Julien, I take
> it this should be a blocker, right?)

Correct, this is a regression compare to Xen 4.6.

Regards,

>   -George
>
>>
>> Regards,
>>
>> On 26/04/16 18:49, Dario Faggioli wrote:
>>>
>>> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>>>>
>>>> Hi Dario,
>>>>
>>> Hi,
>>>
>>>> A couple of people have been reported Xen crash on the ARM64
>>>> Foundation Model [1] with recent unstable.
>>>>
>>> Ok, thanks for reporting.
>>>
>>>> The crash seems to happen when Xen fails to bring up secondary CPUs
>>>> (see stack trace below).
>>>>
>>> Ah... I see.
>>>
>>>>   From my understanding, csched_free_pdata is trying to kill the
>>>> timer spc->ticker. However the status of this timer is
>>>> TIMER_STATUS_invalid.
>>>>
>>>> This is because csched_init_pdata has set a deadline for the
>>>> timer (set_timer) and the softirq to schedule the timer has
>>>> not yet happen (indeed Xen is still in early boot).
>>>>
>>> Yes, this is sort of what happens (only slightly different, see the
>>> changelog of the atached patch patch).
>>>
>>>
>>>> I am not sure how to fix this issue. How will you recommend
>>>> to fix it?
>>>>
>>> Yeah, well, doing it cleanly includes a slight change in the scheduler
>>> hooks API, IMO... and we indeed should do it cleanly :-))
>>>
>>> George, what do you think?
>>>
>>> Honestly, this is similar to what I was thinking to do already (I mean,
>>> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
>>> working on that series, because I do think it's cleaner... then, I
>>> abandoned the idea, as it looked to not be necessary... But apparently
>>> it may actually be! :-)
>>>
>>> Let me know, and I'll resubmit the patch properly (together with
>>> another bugfix I have in my queue).
>>>
>>> Dario
>>> ---
>>> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
>>> Author: Dario Faggioli <dario.faggioli@citrix.com>
>>> Date:   Tue Apr 26 17:42:36 2016 +0200
>>>
>>>       xen: sched: fix killing an uninitialized timer in free_pdata.
>>>
>>>       commit 64269d9365 "sched: implement .init_pdata in Credit,
>>>       Credit2 and RTDS" helped fixing Credit2 runqueues, and
>>>       the races we had in switching scheduler for pCPUs, but
>>>       introduced another issue. In fact, if CPU bringup fails
>>>       during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>>>       but before CPU_STARTING) the CPU_UP_CANCELED notifier
>>>       would be executed, which calls the free_pdata hook.
>>>
>>>       Such hook does two things: (1) undo the initialization
>>>       done inside the init_pdata hook; (2) free the memory
>>>       allocated by the alloc_pdata hook.
>>>
>>>       However, in the failure path just described, it is possible
>>>       that only alloc_pdata has really been called, which is
>>>       potentially and issue (depending on what actually happens
>>>       inside the implementation of free_pdata).
>>>
>>>       In fact, for Credit1 (the only scheduler that actually
>>>       allocates per-pCPU data), this result in calling kill_timer()
>>>       on a timer that had not yet been initialized, which causes
>>>       the following:
>>>
>>>       (XEN) Xen call trace:
>>>       (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>>>       (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>>>       (XEN)    [<00000000002208c0>]
>>> sched_credit.c#csched_free_pdata+0xd8/0x114
>>>       (XEN)    [<0000000000227a18>]
>>> schedule.c#cpu_schedule_callback+0xc0/0x12c
>>>       (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
>>>       (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
>>>       (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
>>>       (XEN)    [<00000000810021d8>] 00000000810021d8
>>>       (XEN)
>>>       (XEN)
>>>       (XEN) ****************************************
>>>       (XEN) Panic on CPU 0:
>>>       (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
>>> timer.c:279
>>>       (XEN) ****************************************
>>>
>>>       Solve this by making the scheduler hooks API symmetric again,
>>>       i.e., by adding an deinit_pdata hook and making it responsible
>>>       of undoing what init_pdata did, rather than asking to free_pdata
>>>       to do everything.
>>>
>>>       This is cleaner and, in the case at hand, makes it possible to
>>>       only call free_pdata, which is the right thing to do, as only
>>>       allocation and no initialization was performed.
>>>
>>>       Reported-by: Julien Grall <julien.grall@arm.com>
>>>       Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>>>       ---
>>>       Cc: George Dunlap <george.dunlap@citrix.com>
>>>       Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>       Cc: Varun.Swara@arm.com
>>>       Cc: Steve Capper <Steve.Capper@arm.com>
>>>
>>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>>> index bc36837..0a6a1b4 100644
>>> --- a/xen/common/sched_credit.c
>>> +++ b/xen/common/sched_credit.c
>>> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu
>>> *new)
>>>    }
>>>
>>>    static void
>>> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>>> +csched_free_pdata(const struct scheduler *ops, void *pcpu)
>>>    {
>>> -    struct csched_private *prv = CSCHED_PRIV(ops);
>>>        struct csched_pcpu *spc = pcpu;
>>> -    unsigned long flags;
>>>
>>>        if ( spc == NULL )
>>>            return;
>>>
>>> +    xfree(spc);
>>> +}
>>> +
>>> +static void
>>> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>>> +{
>>> +    struct csched_private *prv = CSCHED_PRIV(ops);
>>> +    struct csched_pcpu *spc = pcpu;
>>> +    unsigned long flags;
>>> +
>>> +    ASSERT(spc != NULL);
>>> +
>>>        spin_lock_irqsave(&prv->lock, flags);
>>>
>>>        prv->credit -= prv->credits_per_tslice;
>>> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void
>>> *pcpu, int cpu)
>>>            kill_timer(&prv->master_ticker);
>>>
>>>        spin_unlock_irqrestore(&prv->lock, flags);
>>> -
>>> -    xfree(spc);
>>>    }
>>>
>>>    static void *
>>> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
>>>        .free_vdata     = csched_free_vdata,
>>>        .alloc_pdata    = csched_alloc_pdata,
>>>        .init_pdata     = csched_init_pdata,
>>> +    .deinit_pdata   = csched_deinit_pdata,
>>>        .free_pdata     = csched_free_pdata,
>>>        .switch_sched   = csched_switch_sched,
>>>        .alloc_domdata  = csched_alloc_domdata,
>>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>>> index 46b9279..f4c37b4 100644
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops,
>>> unsigned int cpu,
>>>    }
>>>
>>>    static void
>>> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>>> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>>>    {
>>>        unsigned long flags;
>>>        struct csched2_private *prv = CSCHED2_PRIV(ops);
>>>        struct csched2_runqueue_data *rqd;
>>>        int rqi;
>>>
>>> +    ASSERT(pcpu == NULL);
>>> +
>>>        spin_lock_irqsave(&prv->lock, flags);
>>>
>>>        ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
>>> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
>>>        .alloc_vdata    = csched2_alloc_vdata,
>>>        .free_vdata     = csched2_free_vdata,
>>>        .init_pdata     = csched2_init_pdata,
>>> -    .free_pdata     = csched2_free_pdata,
>>> +    .deinit_pdata   = csched2_deinit_pdata,
>>>        .switch_sched   = csched2_switch_sched,
>>>        .alloc_domdata  = csched2_alloc_domdata,
>>>        .free_domdata   = csched2_free_domdata,
>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>>> index 5546999..1a64521 100644
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
>>>        struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>>>        struct scheduler *sched = per_cpu(scheduler, cpu);
>>>
>>> -    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>>> +    SCHED_OP(sched, free_pdata, sd->sched_priv);
>>>        SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>>>
>>>        idle_vcpu[cpu]->sched_priv = NULL;
>>> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
>>>        case CPU_UP_PREPARE:
>>>            rc = cpu_schedule_up(cpu);
>>>            break;
>>> -    case CPU_UP_CANCELED:
>>>        case CPU_DEAD:
>>> +        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>>> +        /* Fallthrough */
>>> +    case CPU_UP_CANCELED:
>>>            cpu_schedule_down(cpu);
>>>            break;
>>>        default:
>>> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct
>>> cpupool *c)
>>>        vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
>>> idle->domain->sched_priv);
>>>        if ( vpriv == NULL )
>>>        {
>>> -        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
>>> +        SCHED_OP(new_ops, free_pdata, ppriv);
>>>            return -ENOMEM;
>>>        }
>>>
>>> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct
>>> cpupool *c)
>>>        SCHED_OP(new_ops, tick_resume, cpu);
>>>
>>>        SCHED_OP(old_ops, free_vdata, vpriv_old);
>>> -    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
>>> +    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
>>> +    SCHED_OP(old_ops, free_pdata, ppriv_old);
>>>
>>>     out:
>>>        per_cpu(cpupool, cpu) = c;
>>> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
>>> index 1db7c8d..240f66c 100644
>>> --- a/xen/include/xen/sched-if.h
>>> +++ b/xen/include/xen/sched-if.h
>>> @@ -135,9 +135,10 @@ struct scheduler {
>>>        void         (*free_vdata)     (const struct scheduler *, void *);
>>>        void *       (*alloc_vdata)    (const struct scheduler *, struct
>>> vcpu *,
>>>                                        void *);
>>> -    void         (*free_pdata)     (const struct scheduler *, void *,
>>> int);
>>> +    void         (*free_pdata)     (const struct scheduler *, void *);
>>>        void *       (*alloc_pdata)    (const struct scheduler *, int);
>>>        void         (*init_pdata)     (const struct scheduler *, void *,
>>> int);
>>> +    void         (*deinit_pdata)   (const struct scheduler *, void *,
>>> int);
>>>        void         (*free_domdata)   (const struct scheduler *, void *);
>>>        void *       (*alloc_domdata)  (const struct scheduler *, struct
>>> domain *);
>>>
>>>
>>
>> --
>> Julien Grall
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-05-03 13:20     ` George Dunlap
  2016-05-03 13:22       ` Julien Grall
@ 2016-05-03 13:23       ` Wei Liu
  2016-05-03 21:52         ` Dario Faggioli
  1 sibling, 1 reply; 11+ messages in thread
From: Wei Liu @ 2016-05-03 13:23 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Steve Capper, Dario Faggioli, Xen Devel, Julien Grall,
	Varun.Swara

On Tue, May 03, 2016 at 02:20:05PM +0100, George Dunlap wrote:
> On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@arm.com> wrote:
> > Hi Dario and George,
> >
> > What is the status of this patch? It would be nice to have it for Xen 4.7 to
> > avoid unwanted crash when secondary CPUs fails to come online.
> 
> Wei, can you put this on your release blockers list?  (Julien, I take
> it this should be a blocker, right?)
> 

It's already on my list.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
  2016-05-03 13:23       ` Wei Liu
@ 2016-05-03 21:52         ` Dario Faggioli
  0 siblings, 0 replies; 11+ messages in thread
From: Dario Faggioli @ 2016-05-03 21:52 UTC (permalink / raw)
  To: Wei Liu, George Dunlap; +Cc: Varun.Swara, Julien Grall, Steve Capper, Xen Devel

On Tue, 2016-05-03 at 14:23 +0100, Wei Liu wrote:
> On Tue, May 03, 2016 at 02:20:05PM +0100, George Dunlap wrote:
> > 
> > On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@arm.com>
> > wrote:
> > > 
> > > Hi Dario and George,
> > > 
> > > What is the status of this patch? It would be nice to have it for
> > > Xen 4.7 to
> > > avoid unwanted crash when secondary CPUs fails to come online.
> > Wei, can you put this on your release blockers list?  (Julien, I
> > take
> > it this should be a blocker, right?)
> > 
> It's already on my list.
> 
Here it comes, as a part of a small series:
 http://lists.xen.org/archives/html/xen-devel/2016-05/msg00243.html

Sorry it took a bit but:
 1. I found a few other issues, an wanted to group them in a series;
 2. I tried to do what George asked, so document and put ASSERT()-s to
    make things clear and understandable.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-05-03 21:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26 14:25 xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279 Julien Grall
2016-04-26 17:49 ` Dario Faggioli
2016-04-26 18:05   ` Julien Grall
2016-04-27 13:43   ` George Dunlap
2016-04-27 14:05     ` Dario Faggioli
2016-04-27 14:29       ` George Dunlap
2016-05-03 13:03   ` Julien Grall
2016-05-03 13:20     ` George Dunlap
2016-05-03 13:22       ` Julien Grall
2016-05-03 13:23       ` Wei Liu
2016-05-03 21:52         ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).