* [PATCH] sched/fair: Fix dereference NULL sched domain during select_idle_sibling
@ 2016-10-08 10:24 Wanpeng Li
2016-10-08 17:06 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Wanpeng Li @ 2016-10-08 10:24 UTC (permalink / raw)
To: linux-kernel
Cc: Wanpeng Li, Ingo Molnar, Mike Galbraith, Peter Zijlstra, Thomas Gleixner
From: Wanpeng Li <wanpeng.li@hotmail.com>
Commit:
10e2f1acd01 ("sched/core: Rewrite and improve select_idle_siblings()")
... improved select_idle_sibling() but also triggered a regression:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffb10cd332>] select_idle_sibling+0x1c2/0x4f0
PGD 0
Oops: 0000 [#1] SMP
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0+ #16
RIP: 0010:[<ffffffffb10cd332>] [<ffffffffb10cd332>] select_idle_sibling+0x1c2/0x4f0
Call Trace:
<IRQ>
select_task_rq_fair+0x749/0x930
? select_task_rq_fair+0xb4/0x930
? __lock_is_held+0x54/0x70
try_to_wake_up+0x19a/0x5b0
default_wake_function+0x12/0x20
autoremove_wake_function+0x12/0x40
__wake_up_common+0x55/0x90
__wake_up+0x39/0x50
wake_up_klogd_work_func+0x40/0x60
irq_work_run_list+0x57/0x80
irq_work_run+0x2c/0x30
smp_irq_work_interrupt+0x2e/0x40
irq_work_interrupt+0x96/0xa0
<EOI>
? _raw_spin_unlock_irqrestore+0x45/0x80
try_to_wake_up+0x4a/0x5b0
wake_up_state+0x10/0x20
__kthread_unpark+0x67/0x70
kthread_unpark+0x22/0x30
cpuhp_online_idle+0x3e/0x70
cpu_startup_entry+0x6a/0x450
start_secondary+0x154/0x180
This can be reproduced by running the ftrace test case of kselftest, the
test case will hot-unplug the cpu and the cpu will attach to the NULL
sched-domain during scheduler teardown.
The step 2 for the rewrite select_idle_siblings():
| Step 2) tracks the average cost of the scan and compares this to the
| average idle time guestimate for the CPU doing the wakeup.
If the cpu which doing the wakeup is the going hot-unplug cpu, then NULL
sched domain will be dereferenced to acquire the average cost of the scan.
This patch fix it by failing the search of an idle CPU in the LLC process
if this sched domain is NULL.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
kernel/sched/fair.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 543b2f2..03a6620 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5472,19 +5472,29 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
*/
static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
{
- struct sched_domain *this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
+ struct sched_domain *this_sd;
u64 avg_idle = this_rq()->avg_idle;
- u64 avg_cost = this_sd->avg_scan_cost;
+ u64 avg_cost;
u64 time, cost;
s64 delta;
int cpu, wrap;
+ rcu_read_lock();
+ this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
+ if (!this_sd) {
+ cpu = -1;
+ goto unlock;
+ }
+ avg_cost = this_sd->avg_scan_cost;
+
/*
* Due to large variance we need a large fuzz factor; hackbench in
* particularly is sensitive here.
*/
- if ((avg_idle / 512) < avg_cost)
- return -1;
+ if ((avg_idle / 512) < avg_cost) {
+ cpu = -1;
+ goto unlock;
+ }
time = local_clock();
@@ -5500,6 +5510,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
delta = (s64)(time - cost) / 8;
this_sd->avg_scan_cost += delta;
+unlock:
+ rcu_read_unlock();
return cpu;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] sched/fair: Fix dereference NULL sched domain during select_idle_sibling
2016-10-08 10:24 [PATCH] sched/fair: Fix dereference NULL sched domain during select_idle_sibling Wanpeng Li
@ 2016-10-08 17:06 ` Peter Zijlstra
2016-10-08 23:39 ` Wanpeng Li
0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2016-10-08 17:06 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, Wanpeng Li, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Sat, Oct 08, 2016 at 06:24:38PM +0800, Wanpeng Li wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 543b2f2..03a6620 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5472,19 +5472,29 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
> */
> static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> {
> - struct sched_domain *this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
So select_idle_cpu() <- select_idle_sibling() is called from two places,
both which already hold rcu_read_lock() afaict.
This would've insta-triggered a rcu-lockdep splat otherwise I think.
That is, selsect_task_rq_fair() has rcu_read_lock() taken when calling
this, and task_numa_compare() does too.
> + struct sched_domain *this_sd;
> u64 avg_idle = this_rq()->avg_idle;
> - u64 avg_cost = this_sd->avg_scan_cost;
> + u64 avg_cost;
> u64 time, cost;
> s64 delta;
> int cpu, wrap;
>
> + rcu_read_lock();
> + this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
> + if (!this_sd) {
> + cpu = -1;
> + goto unlock;
> + }
Yes, this is the part that was missing. We need to test this_sd after
the lookup.
Thanks for looking at this!
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] sched/fair: Fix dereference NULL sched domain during select_idle_sibling
2016-10-08 17:06 ` Peter Zijlstra
@ 2016-10-08 23:39 ` Wanpeng Li
0 siblings, 0 replies; 3+ messages in thread
From: Wanpeng Li @ 2016-10-08 23:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Wanpeng Li, Ingo Molnar, Mike Galbraith, Thomas Gleixner
2016-10-09 1:06 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Sat, Oct 08, 2016 at 06:24:38PM +0800, Wanpeng Li wrote:
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 543b2f2..03a6620 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5472,19 +5472,29 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
>> */
>> static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
>> {
>> - struct sched_domain *this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
>
> So select_idle_cpu() <- select_idle_sibling() is called from two places,
> both which already hold rcu_read_lock() afaict.
Agreed.
>
> This would've insta-triggered a rcu-lockdep splat otherwise I think.
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_LOCK_ALLOC=y
So it is interesting why not a rcu-lockdep splat. :)
>
> That is, selsect_task_rq_fair() has rcu_read_lock() taken when calling
> this, and task_numa_compare() does too.
>
>> + struct sched_domain *this_sd;
>> u64 avg_idle = this_rq()->avg_idle;
>> - u64 avg_cost = this_sd->avg_scan_cost;
>> + u64 avg_cost;
>> u64 time, cost;
>> s64 delta;
>> int cpu, wrap;
>>
>> + rcu_read_lock();
>> + this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
>> + if (!this_sd) {
>> + cpu = -1;
>> + goto unlock;
>> + }
>
> Yes, this is the part that was missing. We need to test this_sd after
> the lookup.
>
> Thanks for looking at this!
NP, I will send out v2 soon. :)
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-10-08 23:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-08 10:24 [PATCH] sched/fair: Fix dereference NULL sched domain during select_idle_sibling Wanpeng Li
2016-10-08 17:06 ` Peter Zijlstra
2016-10-08 23:39 ` Wanpeng Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).