From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93CA6C43331 for ; Tue, 24 Mar 2020 03:01:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6F82220663 for ; Tue, 24 Mar 2020 03:01:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727233AbgCXDBc (ORCPT ); Mon, 23 Mar 2020 23:01:32 -0400 Received: from mga03.intel.com ([134.134.136.65]:29293 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726874AbgCXDBc (ORCPT ); Mon, 23 Mar 2020 23:01:32 -0400 IronPort-SDR: 3pof18D67SLKFPhv4enRlEMhMBCLhd6mnED7GNg24aAwEKo2LGHW8FFCyFEDHf5yOJo97JRoot 618mj6dBZaDA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 20:01:31 -0700 IronPort-SDR: 3ZHWJJDdK2iADXbBU88hlNLtqHpcCrwsbu88M3z835dbwTBbJmyE90mG9EqWvCfUktWd4Sd5OE Lm+7+1K2be1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="357288459" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by fmsmga001.fm.intel.com with ESMTP; 23 Mar 2020 20:01:28 -0700 Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic To: Joel Fernandes Cc: paulmck@kernel.org, linux-kernel@vger.kernel.org, vpillai , Aaron Lu , Aubrey Li , peterz@infradead.org, Ben Segall , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Steven Rostedt , Vincent Guittot References: <20200313232918.62303-1-joel@joelfernandes.org> <20200314003004.GI3199@paulmck-ThinkPad-P72> <20200323152126.GA141027@google.com> From: "Li, Aubrey" Message-ID: <6d933ce2-75e3-6469-4bb0-08ce9df29139@linux.intel.com> Date: Tue, 24 Mar 2020 11:01:27 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: <20200323152126.GA141027@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/3/23 23:21, Joel Fernandes wrote: > On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote: >> On 2020/3/14 8:30, Paul E. McKenney wrote: >>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote: >>>> rcu_read_unlock() can incur an infrequent deadlock in >>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead. >>>> >>>> This fixes the following spinlock recursion observed when testing the >>>> core scheduling patches on PREEMPT=y kernel on ChromeOS: >>>> >>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965] >>>> >>> >>> The original could indeed deadlock, and this would avoid that deadlock. >>> (The commit to solve this deadlock is sadly not yet in mainline.) >>> >>> Acked-by: Paul E. McKenney >> >> I saw this in dmesg with this patch, is it expected? >> >> [ 117.000905] ============================= >> [ 117.000907] WARNING: suspicious RCU usage >> [ 117.000911] 5.5.7+ #160 Not tainted >> [ 117.000913] ----------------------------- >> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage! >> [ 117.000918] >> other info that might help us debug this: > > Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU > PoV, the code is correct (warning doesn't cause any issue). > > To silence warning, we could replace the rcu_read_lock_sched() in my patch with: > preempt_disable(); > rcu_read_lock(); > > and replace the unlock with: > > rcu_read_unlock(); > preempt_enable(); > > That should both take care of both the warning and the scheduler-related > deadlock. Thoughts? > How about this? diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a01df3e..7ff694e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq) int cpu = cpu_of(rq); rcu_read_lock(); - raw_spin_unlock_irq(rq_lockp(rq)); for_each_domain(cpu, sd) { if (!(sd->flags & SD_LOAD_BALANCE)) break; @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq) if (steal_cookie_task(cpu, sd)) break; } - raw_spin_lock_irq(rq_lockp(rq)); rcu_read_unlock(); } Thanks, -Aubrey