From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BFB4C2D0C5 for ; Wed, 11 Dec 2019 08:59:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3D5D5205C9 for ; Wed, 11 Dec 2019 08:59:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=nxp.com header.i=@nxp.com header.b="efJl5Byf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727253AbfLKI7z (ORCPT ); Wed, 11 Dec 2019 03:59:55 -0500 Received: from mail-eopbgr60053.outbound.protection.outlook.com ([40.107.6.53]:29450 "EHLO EUR04-DB3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725973AbfLKI7z (ORCPT ); Wed, 11 Dec 2019 03:59:55 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q2hcdJO8G9Ju9o78KJLoxfOJQcBUpMmplX+TzbRMixMpZAlCkcR9Cn6/FkkPLAXjMWtqmLb41ZfAi1xktCDkxXQX1gyRWTGQy3/u/u9+OkG5pUOv3CNJqd+WcUkQwtIuT7vijCA1XhQrzJc9pLCKiHWaNW44WJx7HoYgUtPOstu5E5QNoYEPwtDy0pw3PhtZ6ComdcTGTbzVOaQgMwJXJu2jkqIKlrjmstylggJv8HAh/2vTdUc5JvqFEpzn+ztXu8MafSL39DFVKF5czpcD2r0Av8NgdoL8aIZqoJzht/K8NutTdbIBTKy/CTa9rZO/wvOW706NuLOGU2EPrnYuQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=01GuNmmQQZnuFVzIR/0Btt9cZnTxzBKpEOspwvfJOjQ=; b=FI/fxT/YKc9ixpw2Zho3M8WvaxzHKdSA6T6qXhwY866XlOU+HkNECJ4CRE22Vv3hWk3MLrG1jKup6dhwlww09KHrqWOXaeYxpdh+rYdwIX6lWNb1Y6c6Ckm5EBLlROdPSUYY1kctXCKzyBS/0eXiKobDqSNFqiGqHiQvgAJ838AHGkjNPpGpzn7LjUXFQsFhK5PON43A/5sRU6OmqrW8m+vW3it9nv4El/mKxFotRS2ddQxa+djR0x8RLqXaTwAbMd4DRPHsZiA2sB4pOj5QvAD/8Eke/z/RUJ2AIY6IvcFy+Rrbz3bT6jn67auyqVSFH/l+byhSYbovhTfHZIBpDQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nxp.com; dmarc=pass action=none header.from=nxp.com; dkim=pass header.d=nxp.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=01GuNmmQQZnuFVzIR/0Btt9cZnTxzBKpEOspwvfJOjQ=; b=efJl5ByfjO5YNXo5q/IZaJPTGWNPvRveVi8ZXbBjoq/f95QHzZ1ITKhR9O191mFuy1RgZLKpgfFkA9WkU3fTUKqXSusPODu/Ou/o1ZfQitgEMugdXgzKNM0eg8bvRXN3uJoT68uP2e+keRThmBFVCIt9t+z33/rDDD2pDYx6Flc= Received: from AM0PR04MB4481.eurprd04.prod.outlook.com (52.135.147.15) by AM0PR04MB6546.eurprd04.prod.outlook.com (20.179.253.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2516.13; Wed, 11 Dec 2019 08:59:45 +0000 Received: from AM0PR04MB4481.eurprd04.prod.outlook.com ([fe80::505:87e7:6b49:3d29]) by AM0PR04MB4481.eurprd04.prod.outlook.com ([fe80::505:87e7:6b49:3d29%7]) with mapi id 15.20.2516.018; Wed, 11 Dec 2019 08:59:45 +0000 From: Peng Fan To: "Rafael J. Wysocki" , Anson Huang CC: "Rafael J. Wysocki" , Viresh Kumar , Jacky Bai , "linux-pm@vger.kernel.org" , Vincent Guittot , Peter Zijlstra , Paul McKenney Subject: RE: About CPU hot-plug stress test failed in cpufreq driver Thread-Topic: About CPU hot-plug stress test failed in cpufreq driver Thread-Index: AdWgM3EuFQeyF1SHRka5sBhDxaFShQAG6J6AAAAKxPAAAq52AAAAFLaAACYwDoAACh8qAAAASXeAAJxYEgAAxubCAAAEUtUgASMyDZAABHGSAAAFLgggAAVyLYAAvd+WYAAA3omQAAEGDYAAAmWbRgAAcGGAACPMEXAAAqbAgAACrQEAAAAfUXAAAG2mAAAAIzIAAABNT4AAAAalAAADxrOAAACJ1gAALbeWIA== Date: Wed, 11 Dec 2019 08:59:45 +0000 Message-ID: References: <5310126.hg2rr5Fjtk@kreacher> <7233060.oySJ2cjCuV@kreacher> In-Reply-To: <7233060.oySJ2cjCuV@kreacher> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=peng.fan@nxp.com; x-originating-ip: [119.31.174.71] x-ms-publictraffictype: Email x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: ae0b08f9-7d0d-4c85-7ce1-08d77e187816 x-ms-traffictypediagnostic: AM0PR04MB6546:|AM0PR04MB6546: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4303; x-forefront-prvs: 024847EE92 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(4636009)(396003)(136003)(366004)(39860400002)(376002)(346002)(199004)(52314003)(189003)(13464003)(71200400001)(81166006)(8676002)(66946007)(81156014)(478600001)(5660300002)(45080400002)(44832011)(9686003)(8936002)(55016002)(30864003)(110136005)(966005)(33656002)(2906002)(186003)(66476007)(6636002)(6506007)(26005)(52536014)(66556008)(54906003)(316002)(86362001)(4326008)(7696005)(53546011)(76116006)(64756008)(66446008);DIR:OUT;SFP:1101;SCL:1;SRVR:AM0PR04MB6546;H:AM0PR04MB4481.eurprd04.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: nxp.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: XQ5hOD5rNSOXFJukCRDsgNUggnTQSiBe/tNUUPvdAUYlzdKx63f/Di2cpHBAVc+0FUadnsLvloN2JPSIVamh3R4WRhuiAUQBv16pdLX4RsRPTHPiWv8IxtXelvL4/dtyN558yigfS2Bh723gR5doBTUzLqjxoEHp8GebVzfkUl3V62WTSjFs4iC//r2wQoy1ZPKQ/bP/4+nMSLuzHDbocyJ9j40HO/nwu2pZYjEPKFbmTnB9FfnL2y3ypzFXlbpCxbLD4RZXfqm5lin7Yzy7CFOglOBg5svesXLBH4zL77Se94wP7GsOBUnnkMLhMIVM2aOnr8Q3A17AMQg7WMzhpdAaTe/Nhxq4DX3dsBAPqtrgVxcSz3nUcL6tioEqd2qINcxU6NieslXbN85pY28JnvAd1GruK+Z0TCqfTGQEau0/drVtghUkJycNJ+zHv+VWsyYpowFDrdFaJDX990tVm3SbrjFJH9WoKkjndFWh6X/glteDFeUVa9l3qYWOzc/hlp+I7hs1UCgaujfTUzzCRA== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-Network-Message-Id: ae0b08f9-7d0d-4c85-7ce1-08d77e187816 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Dec 2019 08:59:45.3740 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: kb+y8i1Q3cpA0D91X17Ved+ybRFmHc9nAGwqkKlKVC1FCWUN0vy7EbrgMrlK29AeHv2SCsqS0imK5xY0+xOAoA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR04MB6546 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver >=20 > On Tuesday, December 10, 2019 11:39:25 AM CET Rafael J. Wysocki wrote: > > On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote: > > > > > > > -----Original Message----- > > > > From: Rafael J. Wysocki > > > > Sent: Tuesday, December 10, 2019 4:51 PM > > > > To: Anson Huang > > > > Cc: Rafael J. Wysocki ; Viresh Kumar > > > > ; Peng Fan ; Jacky Bai > > > > ; linux-pm@vger.kernel.org; Vincent Guittot > > > > ; Peter Zijlstra > > > > ; Paul McKenney > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq > > > > driver > > > > > > > > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote: > > > > > > > > > > > -----Original Message----- > > > > > > From: Rafael J. Wysocki > > > > > > Sent: Tuesday, December 10, 2019 4:38 PM > > > > > > To: Anson Huang > > > > > > Cc: Rafael J. Wysocki ; Viresh Kumar > > > > > > ; Peng Fan ; Rafael > J. > > > > > > Wysocki ; Jacky Bai ; > > > > > > linux- pm@vger.kernel.org; Vincent Guittot > > > > > > ; Peter Zijlstra > > > > > > ; Paul McKenney > > > > > > > > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq > > > > > > driver > > > > > > > > > > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Rafael J. Wysocki > > > > > > > > Sent: Tuesday, December 10, 2019 4:22 PM > > > > > > > > To: Viresh Kumar > > > > > > > > Cc: Peng Fan ; Rafael J. Wysocki > > > > > > > > ; Anson Huang ; > > > > > > > > Rafael > > > > J. > > > > > > > > Wysocki ; Jacky Bai ; > > > > > > > > linux- pm@vger.kernel.org; Vincent Guittot > > > > > > > > ; Peter Zijlstra > > > > > > > > ; Paul McKenney > > > > > > > > > > > > > > > > Subject: Re: About CPU hot-plug stress test failed in > > > > > > > > cpufreq driver > > > > > > > > > > > > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > +few more guys > > > > > > > > > > > > > > > > > > On 10-12-19, 05:53, Peng Fan wrote: > > > > > > > > > > But per > > > > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=3Dh= t > > > > > > > > > > tps%3A > > > > > > > > > > %2F% > > > > > > > > > > 2Fel > > > > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5- > > > > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche > > > > > > > > > > > > > > > > > > > > > > > > > > > > > d.h%23L2293&data=3D02%7C01%7Canson.huang%40nxp.com%7C6f4490 > 0 > > > > > > > > be3404 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0 > % > > > > > > > > 7C637 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 115629475456329&sdata=3DXXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl > > > > > > > > W%2Faw31 > > > > > > > > > > 0AYcM%3D&reserved=3D0 > > > > > > > > > > cpu_of(rq) and smp_processor_id() is possible to not > > > > > > > > > > the same, > > > > > > > > > > > > > > > > > > > > When cpu_of(rq) is not equal to smp_processor_id(), > > > > > > > > > > dbs_update_util_handler will use irq_work_queue to > > > > > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected? > > > > > > > > > > Or should the irq_work be queued to cpu_of(rq)? > > > > > > > > > > > > > > > > > > Okay, sorry for the long weekend where I couldn't get > > > > > > > > > time to reply at > > > > > > all. > > > > > > > > > > > > > > > > No worries. :-) > > > > > > > > > > > > > > > > > First of all, lets try to understand dvfs_possible_from_a= ny_cpu. > > > > > > > > > > > > > > > > > > Who can update the frequency of a CPU ? For many > > > > > > > > > architectures/platforms the eventual code that writes to > > > > > > > > > some register to change the frequency should only run on > > > > > > > > > the local CPU, as these registers are per-cpu registers > > > > > > > > > and not something shared > > > > > > between CPUs. > > > > > > > > > > > > > > > > > > But for the ARM architecture, we have a PLL and then > > > > > > > > > some more registers to play with the clk provided to the > > > > > > > > > CPU blocks and these registers (which are updated as a > > > > > > > > > result of > > > > > > > > > clk_set_rate()) are part of a > > > > > > > > block outside of the CPU blocks. > > > > > > > > > And so any CPU (even if it is not part of the same > > > > > > > > > cpufreq > > > > > > > > > policy) can update it. Setting this flag allows that and > > > > > > > > > eventually we may end up updating the frequency sooner, > > > > > > > > > instead of later (which may be less effective). That was > > > > > > > > > the idea of > > > > the remote-wakeup series. > > > > > > > > > This stuff is absolutely correct and so cpufreq-dt does > > > > > > > > > it for > > > > everyone. > > > > > > > > > > > > > > > > > > This also means that the normal work and irq-work both > > > > > > > > > can run on any CPU for your platform and it should be oka= y to > do that. > > > > > > > > > > > > > > > > And it the failing case all of the CPUs in the system are > > > > > > > > in the same policy anyway, so dvfs_possible_from_any_cpu is= a > red herring. > > > > > > > > > > > > > > > > > Now, we have necessary measures in place to make sure > > > > > > > > > that after stopping and before starting a governor, the > > > > > > > > > scheduler hooks to save the cpufreq governor pointer and > > > > > > > > > updates to > > > > > > > > > policy->cpus are made properly, to make sure that we > > > > > > > > > policy->never > > > > > > > > > ever schedule a work or irq-work on a CPU which is offlin= e. > > > > > > > > > Now it looks like this isn't working as expected and we > > > > > > > > > need to find > > > > what exactly is broken here. > > > > > > > > > > > > > > > > > > And yes, I did the testing on Hikey 620, an octa-core > > > > > > > > > ARM platform which has a single cpufreq policy which has > > > > > > > > > all the 8 CPUs. And yes, I am using cpufreq-dt only and > > > > > > > > > I wasn't able to reproduce the problem with mainline kern= el as > I explained earlier. > > > > > > > > > > > > > > > > > > The problem is somewhere between the scheduler's > > > > > > > > > governor hook > > > > > > > > running > > > > > > > > > or queuing work on a CPU which is in the middle of > > > > > > > > > getting offline/online and there is some race around > > > > > > > > > that. The problem hence may not be related to just > > > > > > > > > cpufreq, but a wider variety of > > > > clients. > > > > > > > > > > > > > > > > The problem is that a CPU is running a governor hook which > > > > > > > > it shouldn't be running at all. > > > > > > > > > > > > > > > > The observation that dvfs_possible_from_any_cpu makes a > > > > > > > > difference only means that the governor hook is running on > > > > > > > > a CPU that is not present in the > > > > > > > > policy->cpus mask. On the platform(s) in question this > > > > > > > > policy->cannot happen as > > > > > > > > long as RCU works as expected. > > > > > > > > > > > > > > If I understand correctly, the governor hook ONLY be clear > > > > > > > on the CPU being offline and after governor stopped, but the > > > > > > > CPU being offline could still run into below function to > > > > > > > help other CPU update the util, and it ONLY checks the > > > > > > > cpu_of(rq)'s governor hook which is valid as that > > > > > > CPU is online. > > > > > > > > > > > > > > So the question is how to avoid the CPU being offline and > > > > > > > already finish the governor stop flow be scheduled to help > > > > > > > other CPU update the > > > > > > util. > > > > > > > > > > > > > > static inline void cpufreq_update_util(struct rq *rq, > > > > > > > unsigned int > > > > > > > flags) { > > > > > > > struct update_util_data *data; > > > > > > > > > > > > > > data =3D > > > > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data, > > > > > > > > cpu_of(rq))); > > > > > > > if (data) > > > > > > > data->func(data, rq_clock(rq), flags); } > > > > > > > > > > > > OK, so that's where the problem is, good catch! > > > > > > > > > > > > So what happens is that a CPU going offline runs some > > > > > > scheduler code that invokes cpufreq_update_util(). > > > > > > Incidentally, it is not the cpu_of(rq), but that CPU is still > > > > > > online, so the callback is invoked and then policy->cpus test > > > > > > is bypassed because of > > > > dvfs_possible_from_any_cpu. > > > > > > > > > > If this is the issue, add another check here for the current > > > > > CPU's governor > > > > hook? > > > > > Or any other better place to make sure the CPU being offline NOT > > > > > to be > > > > queued to irq work? > > > > > > > > Generally, yes. > > > > > > > > Something like the patch below should help if I'm not mistaken: > > > > > > > > --- > > > > include/linux/cpufreq.h | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > Index: linux-pm/include/linux/cpufreq.h > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D > > > > =3D=3D=3D > > > > --- linux-pm.orig/include/linux/cpufreq.h > > > > +++ linux-pm/include/linux/cpufreq.h > > > > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_ { > > > > /* > > > > * Allow remote callbacks if: > > > > - * - dvfs_possible_from_any_cpu flag is set > > > > * - the local and remote CPUs share cpufreq policy > > > > + * - dvfs_possible_from_any_cpu flag is set and the CPU running > the > > > > + * code is not going offline. > > > > */ > > > > - return policy->dvfs_possible_from_any_cpu || > > > > - cpumask_test_cpu(smp_processor_id(), policy->cpus); > > > > + return cpumask_test_cpu(smp_processor_id(), policy->cpus) || > > > > + (policy->dvfs_possible_from_any_cpu && > > > > + !cpumask_test_cpu(smp_processor_id(), policy- > > > > >related_cpus)); > > > > } > > > > > > I will start a stress test of this patch, thanks! > > > > OK, thanks! > > > > Another patch to test is appended and it should be more robust. > > > > Instead of doing the related_cpus cpumask check in the previous patch > > (which only covered CPUs that belog to the target policy) it checks if > > the update_util hook is set for the local CPU (if it is not, that CPU > > is not expected to run the uodate_util code). >=20 > One more thing. >=20 > Both of the previous patches would not fix the schedutil governor in whic= h > cpufreq_this_cpu_can_update() only is called in the fast_switch case and = that > is not when irq_works are used. >=20 > So please discard the patch I have just posted and here is an updated pat= ch > that covers schedutil too, so please test this one instead. >=20 > --- > include/linux/cpufreq.h | 11 ----------- > include/linux/sched/cpufreq.h | 3 +++ > kernel/sched/cpufreq.c | 18 ++++++++++++++++++ > kernel/sched/cpufreq_schedutil.c | 8 +++----- > 4 files changed, 24 insertions(+), 16 deletions(-) >=20 > Index: linux-pm/include/linux/cpufreq.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D > --- linux-pm.orig/include/linux/cpufreq.h > +++ linux-pm/include/linux/cpufreq.h > @@ -595,17 +595,6 @@ struct governor_attr { > size_t count); > }; >=20 > -static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy *po= licy) > -{ > - /* > - * Allow remote callbacks if: > - * - dvfs_possible_from_any_cpu flag is set > - * - the local and remote CPUs share cpufreq policy > - */ > - return policy->dvfs_possible_from_any_cpu || > - cpumask_test_cpu(smp_processor_id(), policy->cpus); > -} > - >=20 > /************************************************************* > ******** > * FREQUENCY TABLE HELPERS > * >=20 > ************************************************************** > *******/ > Index: linux-pm/kernel/sched/cpufreq.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D > --- linux-pm.orig/kernel/sched/cpufreq.c > +++ linux-pm/kernel/sched/cpufreq.c > @@ -5,6 +5,8 @@ > * Copyright (C) 2016, Intel Corporation > * Author: Rafael J. Wysocki > */ > +#include > + > #include "sched.h" >=20 > DEFINE_PER_CPU(struct update_util_data __rcu *, > cpufreq_update_util_data); @@ -57,3 +59,19 @@ void > cpufreq_remove_update_util_hook(int > rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL); } > EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook); > + > +/** > + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated. > + * @policy: cpufreq policy to check. > + * > + * Return 'true' if: > + * - the local and remote CPUs share @policy, > + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is n= ot > going > + * offline (in which it is not expected to run cpufreq updates any mor= e). > + */ > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy) { > + return cpumask_test_cpu(smp_processor_id(), policy->cpus) || > + (policy->dvfs_possible_from_any_cpu && > + > rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data))); > +} > Index: linux-pm/include/linux/sched/cpufreq.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D > --- linux-pm.orig/include/linux/sched/cpufreq.h > +++ linux-pm/include/linux/sched/cpufreq.h > @@ -12,6 +12,8 @@ > #define SCHED_CPUFREQ_MIGRATION (1U << 1) >=20 > #ifdef CONFIG_CPU_FREQ > +struct cpufreq_policy; > + > struct update_util_data { > void (*func)(struct update_util_data *data, u64 time, unsigned in= t > flags); }; @@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp > void (*func)(struct update_util_data *data, u64 > time, > unsigned int flags)); > void cpufreq_remove_update_util_hook(int cpu); > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy); >=20 > static inline unsigned long map_util_freq(unsigned long util, > unsigned long freq, unsigned long cap) > Index: linux-pm/kernel/sched/cpufreq_schedutil.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c > +++ linux-pm/kernel/sched/cpufreq_schedutil.c > @@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str > * by the hardware, as calculating the frequency is pointless if > * we cannot in fact act on it. > * > - * For the slow switching platforms, the kthread is always scheduled on > - * the right set of CPUs and any CPU can find the next frequency and > - * schedule the kthread. > + * This is needed on the slow switching platforms too to prevent CPUs > + * going offline from leaving stale IRQ work items behind. > */ > - if (sg_policy->policy->fast_switch_enabled && > - !cpufreq_this_cpu_can_update(sg_policy->policy)) > + if (!cpufreq_this_cpu_can_update(sg_policy->policy)) > return false; >=20 > if (unlikely(sg_policy->limits_changed)) { So we will not queue irq_work on the offlining CPU in your patch. When we met issue on CPU3, it is CPU3 has irq_work pending, but the SGI IRQ_WORK interrupt is not handled because irq is always disabled, see stack in idle irq disabled state. [ 227.344678] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.4.0-03554-gbb115= 9fa5556-dirty #95 [ 227.344682] Hardware name: Freescale i.MX8QXP MEK (DT) [ 227.344686] Call trace: [ 227.344701] dump_backtrace+0x0/0x140 [ 227.344708] show_stack+0x14/0x20 [ 227.344717] dump_stack+0xb4/0xf8 [ 227.344730] dbs_update_util_handler+0x150/0x180 [ 227.344739] update_load_avg+0x38c/0x3c8 [ 227.344746] enqueue_task_fair+0xcc/0x3a0 [ 227.344756] activate_task+0x5c/0xa0 [ 227.344766] ttwu_do_activate.isra.0+0x4c/0x70 [ 227.344776] try_to_wake_up+0x2d8/0x410 [ 227.344786] wake_up_process+0x14/0x20 [ 227.344794] swake_up_locked.part.0+0x18/0x38 [ 227.344801] swake_up_one+0x30/0x48 [ 227.344808] rcu_gp_kthread_wake+0x5c/0x80 [ 227.344815] rcu_report_qs_rsp+0x40/0x50 [ 227.344825] rcu_report_qs_rnp+0x120/0x148 [ 227.344832] rcu_report_dead+0x120/0x130 [ 227.344841] cpuhp_report_idle_dead+0x3c/0x80 [ 227.344847] do_idle+0x198/0x280 [ 227.344856] cpu_startup_entry+0x24/0x40 [ 227.344865] secondary_start_kernel+0x154/0x190 [ 227.344905] CPU3: shutdown [ 227.444015] psci: CPU3 killed. I also met CPU1 offlining have irq_work queued, but CPU1 not trigger issue, because SGI IRQ_WORK interrupt is handled. There are multiple path to run into dbs_update_util_handler irq_work_queue, the path might will enable interrupt, might not. Seem do_idle is the only path that will trigger cpu_die for HOTPLUG ARM/ARM64. So do we need to use idle as flag to queue irq_work or not? In this way, we could still inject irq work on offlining/offline cpu, until it runs into idle to cpu_die. Thanks, Peng. >=20 >=20