All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] credit: fix race condition in csched_vcpu flags
@ 2013-02-25  8:20 Igor Pavlikevich
  2013-02-26 14:19 ` George Dunlap
  0 siblings, 1 reply; 3+ messages in thread
From: Igor Pavlikevich @ 2013-02-25  8:20 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

When vcpu stops yielding, it's possible to overwrite CSCHED_FLAG_VCPU_PARKED flag after clearing CSCHED_FLAG_VCPU_YIELD.

 332.970218393 -------x d1v1 runstate_continue d1v1 running->running
]332.970248371 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]                  yielding, flags=0x0002
]332.970251968 -------x d1v1                                           clearing 0x0002, flags=0
 332.970252175 -------x d1v1 runstate_continue d1v1 running->running
]332.970282574 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]                  yielding, flags=0x0002
]332.970283068 x------| d32767v0                                       entering csched_acct(), d1v1 flags=0x0002
]332.970285763 x------| d32767v0   28003(2:8:3) 2 [ 1 1 ]              calling vcpu_pause_nosync() for d1v1
]332.970286647 -------x d1v1                                           clearing 0x0002, flags=0
]332.970286771 x------| d32767v0                                       flags |= CSCHED_FLAG_VCPU_PARKED, flags=0x0003
]332.970286990 -------x d1v1   2800e(2:8:e) 2 [ 1 1cacfca ]
]332.970287122 -------x d1v1   2800f(2:8:f) 3 [ 7fff 1cacfca ffffffff ]
]332.970287242 -------x d1v1   2800a(2:8:a) 4 [ 1 1 7fff 7 ]           exiting schedule(), flags=0
 332.970287417 -------x d1v1 runstate_change d1v1 running->offline
 332.970287629 -------x d?v? runstate_change d32767v7 runnable->running
]332.970288397 x------- d32767v0                                       exiting csched_acct(), flags=0x0003
]332.995349690 x------- d32767v0                                       entering csched_acct(), d1v1 flags=0x0000, flag lost.
]332.995352170 x------- d32767v0   28003(2:8:3) 2 [ 1 1 ]              calling vcpu_pause_nosync() for d1v1 again

>From this moment d1v1 has extra +1 in pause_count, so it goes offline right after credit>prv->credits_per_tslice forever.

Signed-off-by: Igor Pavlikevich <ipavlikevich@gmail.com>

diff -r 66f563be41d9 -r 4d2fb69d362b xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Tue Feb 19 10:49:53 2013 +0100
+++ b/xen/common/sched_credit.c	Mon Feb 25 07:54:02 2013 +0000
@@ -1155,7 +1155,9 @@ csched_acct(void* dummy)
                 {
                     SCHED_STAT_CRANK(vcpu_park);
                     vcpu_pause_nosync(svc->vcpu);
+                    vcpu_schedule_lock_irq(svc->vcpu);
                     svc->flags |= CSCHED_FLAG_VCPU_PARKED;
+                    vcpu_schedule_unlock_irq(svc->vcpu);
                 }
 
                 /* Lower bound on credits */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] credit: fix race condition in csched_vcpu flags
  2013-02-25  8:20 [PATCH] credit: fix race condition in csched_vcpu flags Igor Pavlikevich
@ 2013-02-26 14:19 ` George Dunlap
  2013-02-26 18:34   ` Igor Pavlikevich
  0 siblings, 1 reply; 3+ messages in thread
From: George Dunlap @ 2013-02-26 14:19 UTC (permalink / raw)
  To: Igor Pavlikevich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1960 bytes --]

On Mon, Feb 25, 2013 at 8:20 AM, Igor Pavlikevich <ipavlikevich@gmail.com>wrote:

> When vcpu stops yielding, it's possible to overwrite
> CSCHED_FLAG_VCPU_PARKED flag after clearing CSCHED_FLAG_VCPU_YIELD.
>
>  332.970218393 -------x d1v1 runstate_continue d1v1 running->running
> ]332.970248371 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]
>  yielding, flags=0x0002
> ]332.970251968 -------x d1v1
> clearing 0x0002, flags=0
>  332.970252175 -------x d1v1 runstate_continue d1v1 running->running
> ]332.970282574 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]
>  yielding, flags=0x0002
> ]332.970283068 x------| d32767v0
> entering csched_acct(), d1v1 flags=0x0002
> ]332.970285763 x------| d32767v0   28003(2:8:3) 2 [ 1 1 ]
>  calling vcpu_pause_nosync() for d1v1
> ]332.970286647 -------x d1v1
> clearing 0x0002, flags=0
> ]332.970286771 x------| d32767v0
> flags |= CSCHED_FLAG_VCPU_PARKED, flags=0x0003
> ]332.970286990 -------x d1v1   2800e(2:8:e) 2 [ 1 1cacfca ]
> ]332.970287122 -------x d1v1   2800f(2:8:f) 3 [ 7fff 1cacfca ffffffff ]
> ]332.970287242 -------x d1v1   2800a(2:8:a) 4 [ 1 1 7fff 7 ]
> exiting schedule(), flags=0
>  332.970287417 -------x d1v1 runstate_change d1v1 running->offline
>  332.970287629 -------x d?v? runstate_change d32767v7 runnable->running
> ]332.970288397 x------- d32767v0
> exiting csched_acct(), flags=0x0003
> ]332.995349690 x------- d32767v0
> entering csched_acct(), d1v1 flags=0x0000, flag lost.
> ]332.995352170 x------- d32767v0   28003(2:8:3) 2 [ 1 1 ]
>  calling vcpu_pause_nosync() for d1v1 again
>
> From this moment d1v1 has extra +1 in pause_count, so it goes offline
> right after credit>prv->credits_per_tslice forever.
>

Good catch -- I guess we don't really test the scheduler "cap"
functionality enough to catch this one.

Unfortunately I think the real problem is that the ->flags field should be
uniformly using the atomic bit operations.

Can you try the attached patch and see if it works for you?

 -George

[-- Attachment #1.2: Type: text/html, Size: 2725 bytes --]

[-- Attachment #2: credit-atomic-flags.diff --]
[-- Type: application/octet-stream, Size: 4405 bytes --]

commit c72abbcf71f12ae5e9c5dda4a85bce23dbf4868f
Author: George Dunlap <george.dunlap@eu.citrix.com>
Date:   Tue Feb 26 12:44:48 2013 +0000

    credit1: Use atomic bit operations for the flags structure
    
    The flags structure is not protected by locks (or more precisely,
    it is protected using an inconsistent set of locks); we therefore need
    to make sure that all accesses are atomic-safe.  This is particulary
    important in the case of the PARKED flag, which if clobbered while
    changing the YIELD bit will leave a vcpu wedged in an offline state.
    
    Using the atomic bitops also requires us to change the size of the "flags"
    element.
    
    Spotted-by: Igor Pavlikevich <ipavlikevich@gmail.com>
    Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index df2d076..ecdbd76 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -46,8 +46,8 @@
 /*
  * Flags
  */
-#define CSCHED_FLAG_VCPU_PARKED    0x0001  /* VCPU over capped credits */
-#define CSCHED_FLAG_VCPU_YIELD     0x0002  /* VCPU yielding */
+#define CSCHED_FLAG_VCPU_PARKED    0x0  /* VCPU over capped credits */
+#define CSCHED_FLAG_VCPU_YIELD     0x1  /* VCPU yielding */
 
 
 /*
@@ -137,7 +137,7 @@ struct csched_vcpu {
     struct vcpu *vcpu;
     atomic_t credit;
     s_time_t start_time;   /* When we were scheduled (used for credit) */
-    uint16_t flags;
+    unsigned flags;
     int16_t pri;
 #ifdef CSCHED_STATS
     struct {
@@ -220,7 +220,7 @@ __runq_insert(unsigned int cpu, struct csched_vcpu *svc)
     /* If the vcpu yielded, try to put it behind one lower-priority
      * runnable vcpu if we can.  The next runq_sort will bring it forward
      * within 30ms if the queue too long. */
-    if ( svc->flags & CSCHED_FLAG_VCPU_YIELD
+    if ( test_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags)
          && __runq_elem(iter)->pri > CSCHED_PRI_IDLE )
     {
         iter=iter->next;
@@ -811,7 +811,7 @@ csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
      * those.
      */
     if ( svc->pri == CSCHED_PRI_TS_UNDER &&
-         !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
+         !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
@@ -824,10 +824,10 @@ csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 static void
 csched_vcpu_yield(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched_vcpu * const sv = CSCHED_VCPU(vc);
+    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
 
     /* Let the scheduler know that this vcpu is trying to yield */
-    sv->flags |= CSCHED_FLAG_VCPU_YIELD;
+    set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
 }
 
 static int
@@ -1151,11 +1151,10 @@ csched_acct(void* dummy)
                 /* Park running VCPUs of capped-out domains */
                 if ( sdom->cap != 0U &&
                      credit < -credit_cap &&
-                     !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
+                     !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(vcpu_park);
                     vcpu_pause_nosync(svc->vcpu);
-                    svc->flags |= CSCHED_FLAG_VCPU_PARKED;
                 }
 
                 /* Lower bound on credits */
@@ -1171,7 +1170,7 @@ csched_acct(void* dummy)
                 svc->pri = CSCHED_PRI_TS_UNDER;
 
                 /* Unpark any capped domains whose credits go positive */
-                if ( svc->flags & CSCHED_FLAG_VCPU_PARKED)
+                if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
                 {
                     /*
                      * It's important to unset the flag AFTER the unpause()
@@ -1180,7 +1179,6 @@ csched_acct(void* dummy)
                      */
                     SCHED_STAT_CRANK(vcpu_unpark);
                     vcpu_unpause(svc->vcpu);
-                    svc->flags &= ~CSCHED_FLAG_VCPU_PARKED;
                 }
 
                 /* Upper bound on credits means VCPU stops earning */
@@ -1442,8 +1440,7 @@ csched_schedule(
     /*
      * Clear YIELD flag before scheduling out
      */
-    if ( scurr->flags & CSCHED_FLAG_VCPU_YIELD )
-        scurr->flags &= ~(CSCHED_FLAG_VCPU_YIELD);
+    clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags);
 
     /*
      * SMP Load balance:

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] credit: fix race condition in csched_vcpu flags
  2013-02-26 14:19 ` George Dunlap
@ 2013-02-26 18:34   ` Igor Pavlikevich
  0 siblings, 0 replies; 3+ messages in thread
From: Igor Pavlikevich @ 2013-02-26 18:34 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2129 bytes --]

On Tue, Feb 26, 2013 at 6:19 PM, George Dunlap
<George.Dunlap@eu.citrix.com>wrote:

> On Mon, Feb 25, 2013 at 8:20 AM, Igor Pavlikevich <ipavlikevich@gmail.com>wrote:
>
>> When vcpu stops yielding, it's possible to overwrite
>> CSCHED_FLAG_VCPU_PARKED flag after clearing CSCHED_FLAG_VCPU_YIELD.
>>
>>  332.970218393 -------x d1v1 runstate_continue d1v1 running->running
>> ]332.970248371 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]
>>  yielding, flags=0x0002
>> ]332.970251968 -------x d1v1
>> clearing 0x0002, flags=0
>>  332.970252175 -------x d1v1 runstate_continue d1v1 running->running
>> ]332.970282574 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]
>>  yielding, flags=0x0002
>> ]332.970283068 x------| d32767v0
>> entering csched_acct(), d1v1 flags=0x0002
>> ]332.970285763 x------| d32767v0   28003(2:8:3) 2 [ 1 1 ]
>>  calling vcpu_pause_nosync() for d1v1
>> ]332.970286647 -------x d1v1
>> clearing 0x0002, flags=0
>> ]332.970286771 x------| d32767v0
>> flags |= CSCHED_FLAG_VCPU_PARKED, flags=0x0003
>> ]332.970286990 -------x d1v1   2800e(2:8:e) 2 [ 1 1cacfca ]
>> ]332.970287122 -------x d1v1   2800f(2:8:f) 3 [ 7fff 1cacfca ffffffff ]
>> ]332.970287242 -------x d1v1   2800a(2:8:a) 4 [ 1 1 7fff 7 ]
>> exiting schedule(), flags=0
>>  332.970287417 -------x d1v1 runstate_change d1v1 running->offline
>>  332.970287629 -------x d?v? runstate_change d32767v7 runnable->running
>> ]332.970288397 x------- d32767v0
>> exiting csched_acct(), flags=0x0003
>> ]332.995349690 x------- d32767v0
>> entering csched_acct(), d1v1 flags=0x0000, flag lost.
>> ]332.995352170 x------- d32767v0   28003(2:8:3) 2 [ 1 1 ]
>>  calling vcpu_pause_nosync() for d1v1 again
>>
>> From this moment d1v1 has extra +1 in pause_count, so it goes offline
>> right after credit>prv->credits_per_tslice forever.
>>
>
> Good catch -- I guess we don't really test the scheduler "cap"
> functionality enough to catch this one.
>
> Unfortunately I think the real problem is that the ->flags field should be
> uniformly using the atomic bit operations.
>
> Can you try the attached patch and see if it works for you?
>
>
Thanks for help, your patch works well.

[-- Attachment #1.2: Type: text/html, Size: 3244 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-02-26 18:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-25  8:20 [PATCH] credit: fix race condition in csched_vcpu flags Igor Pavlikevich
2013-02-26 14:19 ` George Dunlap
2013-02-26 18:34   ` Igor Pavlikevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.