From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758019AbcJHRG4 (ORCPT ); Sat, 8 Oct 2016 13:06:56 -0400 Received: from merlin.infradead.org ([205.233.59.134]:50840 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752411AbcJHRGu (ORCPT ); Sat, 8 Oct 2016 13:06:50 -0400 Date: Sat, 8 Oct 2016 19:06:38 +0200 From: Peter Zijlstra To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, Wanpeng Li , Ingo Molnar , Mike Galbraith , Thomas Gleixner Subject: Re: [PATCH] sched/fair: Fix dereference NULL sched domain during select_idle_sibling Message-ID: <20161008170638.GK3568@worktop.programming.kicks-ass.net> References: <1475922278-3306-1-git-send-email-wanpeng.li@hotmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1475922278-3306-1-git-send-email-wanpeng.li@hotmail.com> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 08, 2016 at 06:24:38PM +0800, Wanpeng Li wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 543b2f2..03a6620 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5472,19 +5472,29 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd > */ > static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) > { > - struct sched_domain *this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); So select_idle_cpu() <- select_idle_sibling() is called from two places, both which already hold rcu_read_lock() afaict. This would've insta-triggered a rcu-lockdep splat otherwise I think. That is, selsect_task_rq_fair() has rcu_read_lock() taken when calling this, and task_numa_compare() does too. > + struct sched_domain *this_sd; > u64 avg_idle = this_rq()->avg_idle; > - u64 avg_cost = this_sd->avg_scan_cost; > + u64 avg_cost; > u64 time, cost; > s64 delta; > int cpu, wrap; > > + rcu_read_lock(); > + this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); > + if (!this_sd) { > + cpu = -1; > + goto unlock; > + } Yes, this is the part that was missing. We need to test this_sd after the lookup. Thanks for looking at this!