[PATCH v2 0/2] xen: credit1: fix a race when picking initial pCPU for a vCPU

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] xen: credit1: fix a race when picking initial pCPU for a vCPU
@ 2016-08-18 10:00 Dario Faggioli
  2016-08-18 10:00 ` [PATCH v2 1/2] " Dario Faggioli
  2016-08-18 10:00 ` [PATCH v2 2/2] xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct() Dario Faggioli
  0 siblings, 2 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-08-18 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

Hey,

This is v2 of this
https://lists.xen.org/archives/html/xen-devel/2016-08/msg01679.html although it
became a small series, as I noticed in the process room for a small improvement
to the code and added a patch for that (that's patch 2).

Also, while adding the ASSERT() George asked for here
https://lists.xen.org/archives/html/xen-devel/2016-08/msg01694.html , I
discovered that things where a bit worse than they looked at the beginning
(I described such finding here:
https://lists.xen.org/archives/html/xen-devel/2016-08/msg01752.html).

Patch 1 of this series is my idea of a fix for the reported issue, that does
not explode in any other way and is also respectful of the code of
csched_tick(). In fact, while the most obvious) solution would be to just
always take the appropriate lock, doing that in csched_tick() is particularly
tricky (because of Credit1's lock nesting rule), and would result in a dance of
taking and releasing locks which I don't think is what we want.

I've tested this by repeatedly creating and destroying a VM, on both an idle
and a loaded system, as that seems to be what makes the system crash in both
Andrew's XenServer testing, and OSSTest flight 100518 (it's failing at the
guest-start/win.repeat step).

If people (George, mainly, I guess) agree this is the proper fix, I think this
(I mean, patch 1) needs to be backported to all the branches to which we
backported 9f358ddd69463 ("xen: Have schedulers revise initial placement").

Regards,
Dario
---
Dario Faggioli (2):
      xen: credit1: fix a race when picking initial pCPU for a vCPU
      xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct()

 xen/common/sched_credit.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] xen: credit1: fix a race when picking initial pCPU for a vCPU
  2016-08-18 10:00 [PATCH v2 0/2] xen: credit1: fix a race when picking initial pCPU for a vCPU Dario Faggioli
@ 2016-08-18 10:00 ` Dario Faggioli
  2016-08-19 12:23   ` George Dunlap
  2016-08-18 10:00 ` [PATCH v2 2/2] xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct() Dario Faggioli
  1 sibling, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2016-08-18 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

In the Credit1 hunk of 9f358ddd69463 ("xen: Have
schedulers revise initial placement") csched_cpu_pick()
is called without taking the runqueue lock of the
(temporary) pCPU that the vCPU has been assigned to
(e.g., in XEN_DOMCTL_max_vcpus).

However, although 'hidden' in the IS_RUNQ_IDLE() macro,
that function does access the runq (for doing load
balancing calculations). Two scenarios are possible:
 1) we are on cpu X, and IS_RUNQ_IDLE() peeks at cpu's
    X own runq;
 2) we are on cpu X, but IS_RUNQ_IDLE() peeks at some
    other cpu's runq.

Scenario 2) absolutely requies that the appropriate
runq lock is taken. Scenario 1) works even without
taking the cpu's own runq lock, and this is important
for the case when _csched_pick_cpu() is called from
csched_vcpu_acct() which in turn is called by
csched_tick().

Races have been observed and reported (by both XenServer
own testing and OSSTest [1]), in the form of
IS_RUNQ_IDLE() falling over LIST_POISON, because we're
not currently holding the proper lock, in
csched_vcpu_insert(), when scenario 1) occurs.

Since this is all very tricky, in addition to fix things,
add both an ASSERT() and a comment in IS_RUNQ_IDLE() (which
is also becoming static inline instead of macro).

[1] https://lists.xen.org/archives/html/xen-devel/2016-08/msg02144.html

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes from v1:
 - macro IS_RUNQ_IDLE() to static inline is_runq_idle(), as suggested
   during review;
 - add an ASSERT() and a comment, as suggested during review;
 - take into account what's described in the changelog as "scenario 1)",
   which wasn't being considered in v1.
---
 xen/common/sched_credit.c |   38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 220ff0d..daace81 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -84,9 +84,6 @@
 #define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
-/* Is the first element of _cpu's runq its idle vcpu? */
-#define IS_RUNQ_IDLE(_cpu)  (list_empty(RUNQ(_cpu)) || \
-                             is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
 
 
 /*
@@ -248,6 +245,33 @@ __runq_elem(struct list_head *elem)
     return list_entry(elem, struct csched_vcpu, runq_elem);
 }
 
+/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
+static inline bool_t is_runq_idle(unsigned int cpu)
+{
+    /*
+     * If we are on cpu, and we are peeking at our own runq while cpu itself
+     * is not idle, that's fine even if we don't hold the runq lock. In fact,
+     * the fact that there is a (non idle!) vcpu running means that at least
+     * the idle vcpu is in the runq. And since only cpu itself (via work
+     * stealing) can add stuff to the runq, and no other cpu will ever steal
+     * our idle vcpu, that maks the runq manipulations done below safe, even
+     * without locks.
+     *
+     * On the other hand, if we're peeking at another cpu's runq, we must hold
+     * its the proper runq lock.
+     *
+     * As a matter of fact, the former scenario describes what happens when
+     * _cshced_cpu_pick() (which then calls us) is called from csched_tick(),
+     * the latter one describes what actually happen when it is called from
+     * csched_vcpu_insert().
+     */
+    ASSERT((!is_idle_vcpu(curr_on_cpu(cpu)) && cpu == smp_processor_id()) ||
+           spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+
+    return list_empty(RUNQ(cpu)) ||
+           is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
+}
+
 static inline void
 __runq_insert(struct csched_vcpu *svc)
 {
@@ -771,7 +795,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * runnable vcpu on cpu, we add cpu to the idlers.
          */
         cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
-        if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) )
+        if ( vc->processor == cpu && is_runq_idle(cpu) )
             __cpumask_set_cpu(cpu, &idlers);
         cpumask_and(&cpus, &cpus, &idlers);
 
@@ -998,9 +1022,13 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    /* This is safe because vc isn't yet being scheduled */
+    /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
+    lock = vcpu_schedule_lock_irq(vc);
+
     vc->processor = csched_cpu_pick(ops, vc);
 
+    spin_unlock_irq(lock);
+
     lock = vcpu_schedule_lock_irq(vc);
 
     if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct()
  2016-08-18 10:00 [PATCH v2 0/2] xen: credit1: fix a race when picking initial pCPU for a vCPU Dario Faggioli
  2016-08-18 10:00 ` [PATCH v2 1/2] " Dario Faggioli
@ 2016-08-18 10:00 ` Dario Faggioli
  2016-08-19 12:40   ` George Dunlap
  1 sibling, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2016-08-18 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

In fact, csched_vcpu_acct() is called by csched_tick()
_only_ on non idle vcpus already.

There's even an ASSERT() already in place at the top
of it which, by checking that svc->sdom is not NULL,
guarantees that the function is not being called on
the idle domain.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/sched_credit.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index daace81..71a8b1d 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -966,11 +966,8 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
                  svc->vcpu->vcpu_id);
     }
 
-    /*
-     * Update credits
-     */
-    if ( !is_idle_vcpu(svc->vcpu) )
-        burn_credits(svc, NOW());
+    /* Update credits. */
+    burn_credits(svc, NOW());
 
     /*
      * Put this VCPU and domain back on the active list if it was


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] xen: credit1: fix a race when picking initial pCPU for a vCPU
  2016-08-18 10:00 ` [PATCH v2 1/2] " Dario Faggioli
@ 2016-08-19 12:23   ` George Dunlap
  2016-08-19 14:02     ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: George Dunlap @ 2016-08-19 12:23 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

On 18/08/16 11:00, Dario Faggioli wrote:
> In the Credit1 hunk of 9f358ddd69463 ("xen: Have
> schedulers revise initial placement") csched_cpu_pick()
> is called without taking the runqueue lock of the
> (temporary) pCPU that the vCPU has been assigned to
> (e.g., in XEN_DOMCTL_max_vcpus).
> 
> However, although 'hidden' in the IS_RUNQ_IDLE() macro,
> that function does access the runq (for doing load
> balancing calculations). Two scenarios are possible:
>  1) we are on cpu X, and IS_RUNQ_IDLE() peeks at cpu's
>     X own runq;
>  2) we are on cpu X, but IS_RUNQ_IDLE() peeks at some
>     other cpu's runq.
> 
> Scenario 2) absolutely requies that the appropriate
> runq lock is taken. Scenario 1) works even without
> taking the cpu's own runq lock, and this is important
> for the case when _csched_pick_cpu() is called from
> csched_vcpu_acct() which in turn is called by
> csched_tick().


> 
> Races have been observed and reported (by both XenServer
> own testing and OSSTest [1]), in the form of
> IS_RUNQ_IDLE() falling over LIST_POISON, because we're
> not currently holding the proper lock, in
> csched_vcpu_insert(), when scenario 1) occurs.
> 
> Since this is all very tricky, in addition to fix things,
> add both an ASSERT() and a comment in IS_RUNQ_IDLE() (which
> is also becoming static inline instead of macro).
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2016-08/msg02144.html
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> ---
> Changes from v1:
>  - macro IS_RUNQ_IDLE() to static inline is_runq_idle(), as suggested
>    during review;
>  - add an ASSERT() and a comment, as suggested during review;
>  - take into account what's described in the changelog as "scenario 1)",
>    which wasn't being considered in v1.
> ---
>  xen/common/sched_credit.c |   38 +++++++++++++++++++++++++++++++++-----
>  1 file changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 220ff0d..daace81 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -84,9 +84,6 @@
>  #define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
>  #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
>  #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
> -/* Is the first element of _cpu's runq its idle vcpu? */
> -#define IS_RUNQ_IDLE(_cpu)  (list_empty(RUNQ(_cpu)) || \
> -                             is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
>  
>  
>  /*
> @@ -248,6 +245,33 @@ __runq_elem(struct list_head *elem)
>      return list_entry(elem, struct csched_vcpu, runq_elem);
>  }
>  
> +/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
> +static inline bool_t is_runq_idle(unsigned int cpu)
> +{
> +    /*
> +     * If we are on cpu, and we are peeking at our own runq while cpu itself
> +     * is not idle, that's fine even if we don't hold the runq lock. In fact,
> +     * the fact that there is a (non idle!) vcpu running means that at least
> +     * the idle vcpu is in the runq. And since only cpu itself (via work
> +     * stealing) can add stuff to the runq, and no other cpu will ever steal
> +     * our idle vcpu, that maks the runq manipulations done below safe, even
> +     * without locks.

Thanks for investigating this and figuring out why the lockless access
hasn't caused a problem before.  But relying on this behavior going
forward doesn't really seem like a great idea if we can avoid it.

We can't grab the pcpu scheduler lock in csched_tick(), or in the whole
of csched_vcpu_acct() because we grab the private lock in
__csched_vcpu_acct_start() (and that violates the locking order).  But
is there a reason we can't grab the pcpu lock just around the call to
_csched_cpu_pick?

If not, we would then need to put a comment in the runq struct listing
the restrictions on access: namely, that nothing can be inserted from
other pcpus; and ideally a wrapper for all list modification operations
to ASSERT() that we're on the right pcpu.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct()
  2016-08-18 10:00 ` [PATCH v2 2/2] xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct() Dario Faggioli
@ 2016-08-19 12:40   ` George Dunlap
  0 siblings, 0 replies; 6+ messages in thread
From: George Dunlap @ 2016-08-19 12:40 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 18/08/16 11:00, Dario Faggioli wrote:
> In fact, csched_vcpu_acct() is called by csched_tick()
> _only_ on non idle vcpus already.
> 
> There's even an ASSERT() already in place at the top
> of it which, by checking that svc->sdom is not NULL,
> guarantees that the function is not being called on
> the idle domain.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] xen: credit1: fix a race when picking initial pCPU for a vCPU
  2016-08-19 12:23   ` George Dunlap
@ 2016-08-19 14:02     ` Dario Faggioli
  0 siblings, 0 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-08-19 14:02 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2508 bytes --]

On Fri, 2016-08-19 at 13:23 +0100, George Dunlap wrote:
> On 18/08/16 11:00, Dario Faggioli wrote:
> > @@ -248,6 +245,33 @@ __runq_elem(struct list_head *elem)
> >      return list_entry(elem, struct csched_vcpu, runq_elem);
> >  }
> >  
> > +/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
> > +static inline bool_t is_runq_idle(unsigned int cpu)
> > +{
> > +    /*
> > +     * If we are on cpu, and we are peeking at our own runq while
> > cpu itself
> > +     * is not idle, that's fine even if we don't hold the runq
> > lock. In fact,
> > +     * the fact that there is a (non idle!) vcpu running means
> > that at least
> > +     * the idle vcpu is in the runq. And since only cpu itself
> > (via work
> > +     * stealing) can add stuff to the runq, and no other cpu will
> > ever steal
> > +     * our idle vcpu, that maks the runq manipulations done below
> > safe, even
> > +     * without locks.
> Thanks for investigating this and figuring out why the lockless
> access
> hasn't caused a problem before.  But relying on this behavior going
> forward doesn't really seem like a great idea if we can avoid it.
> 
I totally agree.

> We can't grab the pcpu scheduler lock in csched_tick(), or in the
> whole
> of csched_vcpu_acct() because we grab the private lock in
> __csched_vcpu_acct_start() (and that violates the locking
> order).  But
> is there a reason we can't grab the pcpu lock just around the call to
> _csched_cpu_pick?
> 
The first version of this patch, here in my stgit patchqueue, looked
exactly like that. ISTR I even tested it, and it works.

Then I thought that, since in this case it's all about making an
ASSERT() happy, it may be a good thing to avoid introducing more
contention. Also, I see your point on robustness/reliability. My view
is that locking on this path (if not on Credit1 in general) is already
so bad, that I don't think it's possible to make it any worse (and
hence wans't feeling guilty about taking going the way I did). :-)

*BUT* I don't have a too strong opinion, and if you prefer 'take lock'
approach, I'm fine with that.

I'll send v3.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-19 14:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18 10:00 [PATCH v2 0/2] xen: credit1: fix a race when picking initial pCPU for a vCPU Dario Faggioli
2016-08-18 10:00 ` [PATCH v2 1/2] " Dario Faggioli
2016-08-19 12:23   ` George Dunlap
2016-08-19 14:02     ` Dario Faggioli
2016-08-18 10:00 ` [PATCH v2 2/2] xen: credit1: no need to check for is_idle_vcpu() in csched_vcpu_acct() Dario Faggioli
2016-08-19 12:40   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.