* [PATCH] sched: fix scheduler_disable() with core scheduling
@ 2020-04-09 9:41 Sergey Dyasli
2020-04-09 12:50 ` Jürgen Groß
0 siblings, 1 reply; 4+ messages in thread
From: Sergey Dyasli @ 2020-04-09 9:41 UTC (permalink / raw)
To: xen-devel
Cc: Juergen Gross, Sergey Dyasli, George Dunlap, Jan Beulich, Dario Faggioli
In core-scheduling mode, Xen might crash when entering ACPI S5 state.
This happens in sched_slave() during is_idle_unit(next) check because
next->vcpu_list is stale and points to an already freed memory.
This situation happens shortly after scheduler_disable() is called if
some CPU is still inside sched_slave() softirq. Current logic simply
returns prev->next_task from sched_wait_rendezvous_in() which causes
the described crash because next_task->vcpu_list has become invalid.
Fix the crash by returning NULL from sched_wait_rendezvous_in() in
the case when scheduler_disable() has been called.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
CC: Juergen Gross <jgross@suse.com>
CC: Dario Faggioli <dfaggioli@suse.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
xen/common/sched/core.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 626861a3fe..d4a6489929 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2484,19 +2484,15 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
*lock = pcpu_schedule_lock_irq(cpu);
- if ( unlikely(!scheduler_active) )
- {
- ASSERT(is_idle_unit(prev));
- atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
- prev->rendezvous_in_cnt = 0;
- }
-
/*
* Check for scheduling resource switched. This happens when we are
* moved away from our cpupool and cpus are subject of the idle
* scheduler now.
+ *
+ * This is also a bail out case when scheduler_disable() has been
+ * called.
*/
- if ( unlikely(sr != get_sched_res(cpu)) )
+ if ( unlikely(sr != get_sched_res(cpu) || !scheduler_active) )
{
ASSERT(is_idle_unit(prev));
atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] sched: fix scheduler_disable() with core scheduling
2020-04-09 9:41 [PATCH] sched: fix scheduler_disable() with core scheduling Sergey Dyasli
@ 2020-04-09 12:50 ` Jürgen Groß
2020-04-14 12:37 ` Sergey Dyasli
2020-04-16 16:10 ` Dario Faggioli
0 siblings, 2 replies; 4+ messages in thread
From: Jürgen Groß @ 2020-04-09 12:50 UTC (permalink / raw)
To: Sergey Dyasli, xen-devel; +Cc: George Dunlap, Jan Beulich, Dario Faggioli
On 09.04.20 11:41, Sergey Dyasli wrote:
> In core-scheduling mode, Xen might crash when entering ACPI S5 state.
> This happens in sched_slave() during is_idle_unit(next) check because
> next->vcpu_list is stale and points to an already freed memory.
>
> This situation happens shortly after scheduler_disable() is called if
> some CPU is still inside sched_slave() softirq. Current logic simply
> returns prev->next_task from sched_wait_rendezvous_in() which causes
> the described crash because next_task->vcpu_list has become invalid.
>
> Fix the crash by returning NULL from sched_wait_rendezvous_in() in
> the case when scheduler_disable() has been called.
>
> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Good catch!
Have you seen any further problems (e.g. with cpu on/offlining) with
this patch applied?
Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sched: fix scheduler_disable() with core scheduling
2020-04-09 12:50 ` Jürgen Groß
@ 2020-04-14 12:37 ` Sergey Dyasli
2020-04-16 16:10 ` Dario Faggioli
1 sibling, 0 replies; 4+ messages in thread
From: Sergey Dyasli @ 2020-04-14 12:37 UTC (permalink / raw)
To: Jürgen Groß, xen-devel
Cc: Igor Druzhinin, Sergey Dyasli, George Dunlap, Jan Beulich,
Dario Faggioli
(CC Igor)
On 09/04/2020 13:50, Jürgen Groß wrote:
> On 09.04.20 11:41, Sergey Dyasli wrote:
>> In core-scheduling mode, Xen might crash when entering ACPI S5 state.
>> This happens in sched_slave() during is_idle_unit(next) check because
>> next->vcpu_list is stale and points to an already freed memory.
>>
>> This situation happens shortly after scheduler_disable() is called if
>> some CPU is still inside sched_slave() softirq. Current logic simply
>> returns prev->next_task from sched_wait_rendezvous_in() which causes
>> the described crash because next_task->vcpu_list has become invalid.
>>
>> Fix the crash by returning NULL from sched_wait_rendezvous_in() in
>> the case when scheduler_disable() has been called.
>>
>> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
>
> Good catch!
>
> Have you seen any further problems (e.g. with cpu on/offlining) with
> this patch applied?
This patch shouldn't affect cpu on/offlining AFAICS. Igor was the one testing
cpu on/offlining and I think he came to a conclusion that it's broken even
without core-scheduling enabled.
> Reviewed-by: Juergen Gross <jgross@suse.com>
Thanks!
--
Sergey
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sched: fix scheduler_disable() with core scheduling
2020-04-09 12:50 ` Jürgen Groß
2020-04-14 12:37 ` Sergey Dyasli
@ 2020-04-16 16:10 ` Dario Faggioli
1 sibling, 0 replies; 4+ messages in thread
From: Dario Faggioli @ 2020-04-16 16:10 UTC (permalink / raw)
To: Jürgen Groß, Sergey Dyasli, xen-devel
Cc: George Dunlap, Jan Beulich
[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]
On Thu, 2020-04-09 at 14:50 +0200, Jürgen Groß wrote:
> On 09.04.20 11:41, Sergey Dyasli wrote:
> > In core-scheduling mode, Xen might crash when entering ACPI S5
> > state.
> > This happens in sched_slave() during is_idle_unit(next) check
> > because
> > next->vcpu_list is stale and points to an already freed memory.
> >
> > This situation happens shortly after scheduler_disable() is called
> > if
> > some CPU is still inside sched_slave() softirq. Current logic
> > simply
> > returns prev->next_task from sched_wait_rendezvous_in() which
> > causes
> > the described crash because next_task->vcpu_list has become
> > invalid.
> >
> > Fix the crash by returning NULL from sched_wait_rendezvous_in() in
> > the case when scheduler_disable() has been called.
> >
> > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
>
> Reviewed-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Thanks and Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-04-16 16:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09 9:41 [PATCH] sched: fix scheduler_disable() with core scheduling Sergey Dyasli
2020-04-09 12:50 ` Jürgen Groß
2020-04-14 12:37 ` Sergey Dyasli
2020-04-16 16:10 ` Dario Faggioli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.