All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: fix scheduler_disable() with core scheduling
@ 2020-04-09  9:41 Sergey Dyasli
  2020-04-09 12:50 ` Jürgen Groß
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Dyasli @ 2020-04-09  9:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Sergey Dyasli, George Dunlap, Jan Beulich, Dario Faggioli

In core-scheduling mode, Xen might crash when entering ACPI S5 state.
This happens in sched_slave() during is_idle_unit(next) check because
next->vcpu_list is stale and points to an already freed memory.

This situation happens shortly after scheduler_disable() is called if
some CPU is still inside sched_slave() softirq. Current logic simply
returns prev->next_task from sched_wait_rendezvous_in() which causes
the described crash because next_task->vcpu_list has become invalid.

Fix the crash by returning NULL from sched_wait_rendezvous_in() in
the case when scheduler_disable() has been called.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
CC: Juergen Gross <jgross@suse.com>
CC: Dario Faggioli <dfaggioli@suse.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/common/sched/core.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 626861a3fe..d4a6489929 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2484,19 +2484,15 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 
         *lock = pcpu_schedule_lock_irq(cpu);
 
-        if ( unlikely(!scheduler_active) )
-        {
-            ASSERT(is_idle_unit(prev));
-            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
-            prev->rendezvous_in_cnt = 0;
-        }
-
         /*
          * Check for scheduling resource switched. This happens when we are
          * moved away from our cpupool and cpus are subject of the idle
          * scheduler now.
+         *
+         * This is also a bail out case when scheduler_disable() has been
+         * called.
          */
-        if ( unlikely(sr != get_sched_res(cpu)) )
+        if ( unlikely(sr != get_sched_res(cpu) || !scheduler_active) )
         {
             ASSERT(is_idle_unit(prev));
             atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: fix scheduler_disable() with core scheduling
  2020-04-09  9:41 [PATCH] sched: fix scheduler_disable() with core scheduling Sergey Dyasli
@ 2020-04-09 12:50 ` Jürgen Groß
  2020-04-14 12:37   ` Sergey Dyasli
  2020-04-16 16:10   ` Dario Faggioli
  0 siblings, 2 replies; 4+ messages in thread
From: Jürgen Groß @ 2020-04-09 12:50 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel; +Cc: George Dunlap, Jan Beulich, Dario Faggioli

On 09.04.20 11:41, Sergey Dyasli wrote:
> In core-scheduling mode, Xen might crash when entering ACPI S5 state.
> This happens in sched_slave() during is_idle_unit(next) check because
> next->vcpu_list is stale and points to an already freed memory.
> 
> This situation happens shortly after scheduler_disable() is called if
> some CPU is still inside sched_slave() softirq. Current logic simply
> returns prev->next_task from sched_wait_rendezvous_in() which causes
> the described crash because next_task->vcpu_list has become invalid.
> 
> Fix the crash by returning NULL from sched_wait_rendezvous_in() in
> the case when scheduler_disable() has been called.
> 
> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>

Good catch!

Have you seen any further problems (e.g. with cpu on/offlining) with
this patch applied?

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: fix scheduler_disable() with core scheduling
  2020-04-09 12:50 ` Jürgen Groß
@ 2020-04-14 12:37   ` Sergey Dyasli
  2020-04-16 16:10   ` Dario Faggioli
  1 sibling, 0 replies; 4+ messages in thread
From: Sergey Dyasli @ 2020-04-14 12:37 UTC (permalink / raw)
  To: Jürgen Groß, xen-devel
  Cc: Igor Druzhinin, Sergey Dyasli, George Dunlap, Jan Beulich,
	Dario Faggioli

(CC Igor)

On 09/04/2020 13:50, Jürgen Groß wrote:
> On 09.04.20 11:41, Sergey Dyasli wrote:
>> In core-scheduling mode, Xen might crash when entering ACPI S5 state.
>> This happens in sched_slave() during is_idle_unit(next) check because
>> next->vcpu_list is stale and points to an already freed memory.
>>
>> This situation happens shortly after scheduler_disable() is called if
>> some CPU is still inside sched_slave() softirq. Current logic simply
>> returns prev->next_task from sched_wait_rendezvous_in() which causes
>> the described crash because next_task->vcpu_list has become invalid.
>>
>> Fix the crash by returning NULL from sched_wait_rendezvous_in() in
>> the case when scheduler_disable() has been called.
>>
>> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
> 
> Good catch!
> 
> Have you seen any further problems (e.g. with cpu on/offlining) with
> this patch applied?

This patch shouldn't affect cpu on/offlining AFAICS. Igor was the one testing
cpu on/offlining and I think he came to a conclusion that it's broken even
without core-scheduling enabled.

> Reviewed-by: Juergen Gross <jgross@suse.com>

Thanks!

--
Sergey


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: fix scheduler_disable() with core scheduling
  2020-04-09 12:50 ` Jürgen Groß
  2020-04-14 12:37   ` Sergey Dyasli
@ 2020-04-16 16:10   ` Dario Faggioli
  1 sibling, 0 replies; 4+ messages in thread
From: Dario Faggioli @ 2020-04-16 16:10 UTC (permalink / raw)
  To: Jürgen Groß, Sergey Dyasli, xen-devel
  Cc: George Dunlap, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]

On Thu, 2020-04-09 at 14:50 +0200, Jürgen Groß wrote:
> On 09.04.20 11:41, Sergey Dyasli wrote:
> > In core-scheduling mode, Xen might crash when entering ACPI S5
> > state.
> > This happens in sched_slave() during is_idle_unit(next) check
> > because
> > next->vcpu_list is stale and points to an already freed memory.
> > 
> > This situation happens shortly after scheduler_disable() is called
> > if
> > some CPU is still inside sched_slave() softirq. Current logic
> > simply
> > returns prev->next_task from sched_wait_rendezvous_in() which
> > causes
> > the described crash because next_task->vcpu_list has become
> > invalid.
> > 
> > Fix the crash by returning NULL from sched_wait_rendezvous_in() in
> > the case when scheduler_disable() has been called.
> > 
> > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-16 16:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09  9:41 [PATCH] sched: fix scheduler_disable() with core scheduling Sergey Dyasli
2020-04-09 12:50 ` Jürgen Groß
2020-04-14 12:37   ` Sergey Dyasli
2020-04-16 16:10   ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.