linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Anson Huang <anson.huang@nxp.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Peng Fan <peng.fan@nxp.com>, Jacky Bai <ping.bai@nxp.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
Date: Tue, 10 Dec 2019 11:54:50 +0100	[thread overview]
Message-ID: <7233060.oySJ2cjCuV@kreacher> (raw)
In-Reply-To: <5310126.hg2rr5Fjtk@kreacher>

On Tuesday, December 10, 2019 11:39:25 AM CET Rafael J. Wysocki wrote:
> On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote:
> > 
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Sent: Tuesday, December 10, 2019 4:51 PM
> > > To: Anson Huang <anson.huang@nxp.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky Bai
> > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent Guittot
> > > <vincent.guittot@linaro.org>; Peter Zijlstra <peterz@infradead.org>; Paul
> > > McKenney <paulmck@linux.vnet.ibm.com>
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > 
> > > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > > > <paulmck@linux.vnet.ibm.com>
> > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > > >
> > > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael
> > > J.
> > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > > > driver
> > > > > > >
> > > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > > > <viresh.kumar@linaro.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > +few more guys
> > > > > > > >
> > > > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > > > But per
> > > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A
> > > > > > > > > %2F%
> > > > > > > > > 2Fel
> > > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > > > >
> > > > > > >
> > > > >
> > > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > > > > > be3404
> > > > > > > > >
> > > > > > >
> > > > >
> > > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > > > > > 7C637
> > > > > > > > >
> > > > > > >
> > > > >
> > > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > > > W%2Faw31
> > > > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > > > cpu_of(rq) and smp_processor_id() is possible to not the
> > > > > > > > > same,
> > > > > > > > >
> > > > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > > > >
> > > > > > > > Okay, sorry for the long weekend where I couldn't get time to
> > > > > > > > reply at
> > > > > all.
> > > > > > >
> > > > > > > No worries. :-)
> > > > > > >
> > > > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > > > >
> > > > > > > > Who can update the frequency of a CPU ? For many
> > > > > > > > architectures/platforms the eventual code that writes to some
> > > > > > > > register to change the frequency should only run on the local
> > > > > > > > CPU, as these registers are per-cpu registers and not
> > > > > > > > something shared
> > > > > between CPUs.
> > > > > > > >
> > > > > > > > But for the ARM architecture, we have a PLL and then some more
> > > > > > > > registers to play with the clk provided to the CPU blocks and
> > > > > > > > these registers (which are updated as a result of
> > > > > > > > clk_set_rate()) are part of a
> > > > > > > block outside of the CPU blocks.
> > > > > > > > And so any CPU (even if it is not part of the same cpufreq
> > > > > > > > policy) can update it. Setting this flag allows that and
> > > > > > > > eventually we may end up updating the frequency sooner,
> > > > > > > > instead of later (which may be less effective). That was the idea of
> > > the remote-wakeup series.
> > > > > > > > This stuff is absolutely correct and so cpufreq-dt does it for
> > > everyone.
> > > > > > > >
> > > > > > > > This also means that the normal work and irq-work both can run
> > > > > > > > on any CPU for your platform and it should be okay to do that.
> > > > > > >
> > > > > > > And it the failing case all of the CPUs in the system are in the
> > > > > > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > > > > > >
> > > > > > > > Now, we have necessary measures in place to make sure that
> > > > > > > > after stopping and before starting a governor, the scheduler
> > > > > > > > hooks to save the cpufreq governor pointer and updates to
> > > > > > > > policy->cpus are made properly, to make sure that we never
> > > > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > > > Now it looks like this isn't working as expected and we need to find
> > > what exactly is broken here.
> > > > > > > >
> > > > > > > > And yes, I did the testing on Hikey 620, an octa-core ARM
> > > > > > > > platform which has a single cpufreq policy which has all the 8
> > > > > > > > CPUs. And yes, I am using cpufreq-dt only and I wasn't able to
> > > > > > > > reproduce the problem with mainline kernel as I explained earlier.
> > > > > > > >
> > > > > > > > The problem is somewhere between the scheduler's governor hook
> > > > > > > running
> > > > > > > > or queuing work on a CPU which is in the middle of getting
> > > > > > > > offline/online and there is some race around that. The problem
> > > > > > > > hence may not be related to just cpufreq, but a wider variety of
> > > clients.
> > > > > > >
> > > > > > > The problem is that a CPU is running a governor hook which it
> > > > > > > shouldn't be running at all.
> > > > > > >
> > > > > > > The observation that dvfs_possible_from_any_cpu makes a
> > > > > > > difference only means that the governor hook is running on a CPU
> > > > > > > that is not present in the
> > > > > > > policy->cpus mask.  On the platform(s) in question this cannot
> > > > > > > policy->happen as
> > > > > > > long as RCU works as expected.
> > > > > >
> > > > > > If I understand correctly, the governor hook ONLY be clear on the
> > > > > > CPU being offline and after governor stopped, but the CPU being
> > > > > > offline could still run into below function to help other CPU
> > > > > > update the util, and it ONLY checks the cpu_of(rq)'s governor hook
> > > > > > which is valid as that
> > > > > CPU is online.
> > > > > >
> > > > > > So the question is how to avoid the CPU being offline and already
> > > > > > finish the governor stop flow be scheduled to help other CPU
> > > > > > update the
> > > > > util.
> > > > > >
> > > > > >  static inline void cpufreq_update_util(struct rq *rq, unsigned
> > > > > > int
> > > > > > flags)  {
> > > > > >          struct update_util_data *data;
> > > > > >
> > > > > >          data =
> > > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > > > >                                                    cpu_of(rq)));
> > > > > >          if (data)
> > > > > >                  data->func(data, rq_clock(rq), flags);  }
> > > > >
> > > > > OK, so that's where the problem is, good catch!
> > > > >
> > > > > So what happens is that a CPU going offline runs some scheduler code
> > > > > that invokes cpufreq_update_util().  Incidentally, it is not the
> > > > > cpu_of(rq), but that CPU is still online, so the callback is invoked
> > > > > and then policy->cpus test is bypassed because of
> > > dvfs_possible_from_any_cpu.
> > > >
> > > > If this is the issue, add another check here for the current CPU's governor
> > > hook?
> > > > Or any other better place to make sure the CPU being offline NOT to be
> > > queued to irq work?
> > > 
> > > Generally, yes.
> > > 
> > > Something like the patch below should help if I'm not mistaken:
> > > 
> > > ---
> > >  include/linux/cpufreq.h |    8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > Index: linux-pm/include/linux/cpufreq.h
> > > ================================================================
> > > ===
> > > --- linux-pm.orig/include/linux/cpufreq.h
> > > +++ linux-pm/include/linux/cpufreq.h
> > > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_  {
> > >  	/*
> > >  	 * Allow remote callbacks if:
> > > -	 * - dvfs_possible_from_any_cpu flag is set
> > >  	 * - the local and remote CPUs share cpufreq policy
> > > +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> > > +	 *   code is not going offline.
> > >  	 */
> > > -	return policy->dvfs_possible_from_any_cpu ||
> > > -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > > +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > > +		(policy->dvfs_possible_from_any_cpu &&
> > > +		 !cpumask_test_cpu(smp_processor_id(), policy-
> > > >related_cpus));
> > >  }
> > 
> > I will start a stress test of this patch, thanks!
> 
> OK, thanks!
> 
> Another patch to test is appended and it should be more robust.
> 
> Instead of doing the related_cpus cpumask check in the previous patch (which
> only covered CPUs that belog to the target policy) it checks if the update_util
> hook is set for the local CPU (if it is not, that CPU is not expected to run
> the uodate_util code).

One more thing.

Both of the previous patches would not fix the schedutil governor in which
cpufreq_this_cpu_can_update() only is called in the fast_switch case and
that is not when irq_works are used.

So please discard the patch I have just posted and here is an updated patch
that covers schedutil too, so please test this one instead.

---
 include/linux/cpufreq.h          |   11 -----------
 include/linux/sched/cpufreq.h    |    3 +++
 kernel/sched/cpufreq.c           |   18 ++++++++++++++++++
 kernel/sched/cpufreq_schedutil.c |    8 +++-----
 4 files changed, 24 insertions(+), 16 deletions(-)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -595,17 +595,6 @@ struct governor_attr {
 			 size_t count);
 };
 
-static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
-{
-	/*
-	 * Allow remote callbacks if:
-	 * - dvfs_possible_from_any_cpu flag is set
-	 * - the local and remote CPUs share cpufreq policy
-	 */
-	return policy->dvfs_possible_from_any_cpu ||
-		cpumask_test_cpu(smp_processor_id(), policy->cpus);
-}
-
 /*********************************************************************
  *                     FREQUENCY TABLE HELPERS                       *
  *********************************************************************/
Index: linux-pm/kernel/sched/cpufreq.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq.c
+++ linux-pm/kernel/sched/cpufreq.c
@@ -5,6 +5,8 @@
  * Copyright (C) 2016, Intel Corporation
  * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
  */
+#include <linux/cpufreq.h>
+
 #include "sched.h"
 
 DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
@@ -57,3 +59,19 @@ void cpufreq_remove_update_util_hook(int
 	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
 }
 EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
+
+/**
+ * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
+ * @policy: cpufreq policy to check.
+ *
+ * Return 'true' if:
+ * - the local and remote CPUs share @policy,
+ * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not going
+ *   offline (in which it is not expected to run cpufreq updates any more).
+ */
+bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
+{
+	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
+		(policy->dvfs_possible_from_any_cpu &&
+		 rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
+}
Index: linux-pm/include/linux/sched/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/sched/cpufreq.h
+++ linux-pm/include/linux/sched/cpufreq.h
@@ -12,6 +12,8 @@
 #define SCHED_CPUFREQ_MIGRATION	(1U << 1)
 
 #ifdef CONFIG_CPU_FREQ
+struct cpufreq_policy;
+
 struct update_util_data {
        void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
 };
@@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp
                        void (*func)(struct update_util_data *data, u64 time,
 				    unsigned int flags));
 void cpufreq_remove_update_util_hook(int cpu);
+bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
 
 static inline unsigned long map_util_freq(unsigned long util,
 					unsigned long freq, unsigned long cap)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str
 	 * by the hardware, as calculating the frequency is pointless if
 	 * we cannot in fact act on it.
 	 *
-	 * For the slow switching platforms, the kthread is always scheduled on
-	 * the right set of CPUs and any CPU can find the next frequency and
-	 * schedule the kthread.
+	 * This is needed on the slow switching platforms too to prevent CPUs
+	 * going offline from leaving stale IRQ work items behind.
 	 */
-	if (sg_policy->policy->fast_switch_enabled &&
-	    !cpufreq_this_cpu_can_update(sg_policy->policy))
+	if (!cpufreq_this_cpu_can_update(sg_policy->policy))
 		return false;
 
 	if (unlikely(sg_policy->limits_changed)) {




  reply	other threads:[~2019-12-10 10:54 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <DB3PR0402MB391626A8ECFDC182C6EDCF8DF54E0@DB3PR0402MB3916.eurprd04.prod.outlook.com>
2019-11-21  9:35 ` About CPU hot-plug stress test failed in cpufreq driver Viresh Kumar
2019-11-21 10:13   ` Anson Huang
2019-11-21 10:53     ` Rafael J. Wysocki
2019-11-21 10:56       ` Rafael J. Wysocki
2019-11-22  5:15         ` Anson Huang
2019-11-22  9:59           ` Rafael J. Wysocki
2019-11-25  6:05             ` Anson Huang
2019-11-25  9:43               ` Anson Huang
2019-11-26  6:18                 ` Viresh Kumar
2019-11-26  8:22                   ` Anson Huang
2019-11-26  8:25                     ` Viresh Kumar
2019-11-25 12:44               ` Rafael J. Wysocki
2019-11-26  8:57                 ` Rafael J. Wysocki
2019-11-29 11:39                 ` Rafael J. Wysocki
2019-11-29 13:44                   ` Anson Huang
2019-12-05  8:53                     ` Anson Huang
2019-12-05 10:48                       ` Rafael J. Wysocki
2019-12-05 13:18                         ` Anson Huang
2019-12-05 15:52                           ` Rafael J. Wysocki
2019-12-09 10:31                             ` Peng Fan
2019-12-09 10:37                             ` Anson Huang
2019-12-09 10:56                               ` Anson Huang
2019-12-09 11:23                                 ` Rafael J. Wysocki
2019-12-09 12:32                                   ` Anson Huang
2019-12-09 12:44                                     ` Rafael J. Wysocki
2019-12-09 14:18                                       ` Anson Huang
2019-12-10  5:39                                         ` Anson Huang
2019-12-10  5:53                                       ` Peng Fan
2019-12-10  7:05                                         ` Viresh Kumar
2019-12-10  8:22                                           ` Rafael J. Wysocki
2019-12-10  8:29                                             ` Anson Huang
2019-12-10  8:36                                               ` Viresh Kumar
2019-12-10  8:37                                                 ` Peng Fan
2019-12-10  8:37                                               ` Rafael J. Wysocki
2019-12-10  8:43                                                 ` Peng Fan
2019-12-10  8:45                                                 ` Anson Huang
2019-12-10  8:50                                                   ` Rafael J. Wysocki
2019-12-10  8:51                                                     ` Anson Huang
2019-12-10 10:39                                                       ` Rafael J. Wysocki
2019-12-10 10:54                                                         ` Rafael J. Wysocki [this message]
2019-12-11  5:08                                                           ` Anson Huang
2019-12-11  8:59                                                           ` Peng Fan
2019-12-11  9:36                                                             ` Rafael J. Wysocki
2019-12-11  9:43                                                               ` Peng Fan
2019-12-11  9:52                                                                 ` Rafael J. Wysocki
2019-12-11 10:11                                                                   ` Peng Fan
2019-12-10 10:54                                                         ` Viresh Kumar
2019-12-10 11:07                                                           ` Rafael J. Wysocki
2019-12-10  8:57                                                     ` Viresh Kumar
2019-12-10 11:03                                                       ` Rafael J. Wysocki
2019-12-10  9:04                                                     ` Rafael J. Wysocki
2019-12-10  8:31                                             ` Viresh Kumar
2019-12-10  8:12                                         ` Rafael J. Wysocki
2019-12-05 11:00                       ` Viresh Kumar
2019-12-05 11:10                         ` Rafael J. Wysocki
2019-12-05 11:17                           ` Viresh Kumar
2019-11-21 10:37   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7233060.oySJ2cjCuV@kreacher \
    --to=rjw@rjwysocki.net \
    --cc=anson.huang@nxp.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peng.fan@nxp.com \
    --cc=peterz@infradead.org \
    --cc=ping.bai@nxp.com \
    --cc=rafael@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).