linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anson Huang <anson.huang@nxp.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
	Jacky Bai <ping.bai@nxp.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Subject: RE: About CPU hot-plug stress test failed in cpufreq driver
Date: Fri, 22 Nov 2019 05:15:42 +0000	[thread overview]
Message-ID: <DB3PR0402MB3916BDC24BDA1053B7ADBDCFF5490@DB3PR0402MB3916.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <CAJZ5v0j4z9tEDCGKRc7dHqTiJ1Fq3So=ELfvR6H25UkRmKeBvg@mail.gmail.com>

Hi, Rafael
	Theoretically, yes, the CPU being offline will run the irq work list to make sure the irq work pending on it will be clear, but the fact is NOT, both ondemand and schedutil governor can reproduce this issue if running stress CPU hotplug test.
	I tried add a "int cpu" in irq work structure to record CPU number which has irq work pending, when issue happen, I can see the irq work is pending at CPU #3 which is already offline, this is why issue happen, but I don't know how it happens...

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index b11fcdf..f8da06f9 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -25,6 +25,7 @@ struct irq_work {
        unsigned long flags;
        struct llist_node llnode;
        void (*func)(struct irq_work *);
+       int cpu;
 };

 static inline
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index d42acaf..2e893d5 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/irq_work.h>
+#include <linux/jiffies.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
 #include <linux/irqflags.h>
@@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
        if (!irq_work_claim(work))
                return false;

+       work->cpu = smp_processor_id();
        /* Queue the entry and raise the IPI if needed. */
        preempt_disable();
        __irq_work_queue_local(work);
@@ -105,6 +107,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
        /* Only queue if not already pending */
        if (!irq_work_claim(work))
                return false;
+       work->cpu = cpu;

        preempt_disable();
        if (cpu != smp_processor_id()) {
@@ -161,6 +164,7 @@ static void irq_work_run_list(struct llist_head *list)
                 */
                flags = work->flags & ~IRQ_WORK_PENDING;
                xchg(&work->flags, flags);
+               work->cpu = -1;

                work->func(work);
                /*
@@ -197,9 +201,13 @@ void irq_work_tick(void)
  */
 void irq_work_sync(struct irq_work *work)
 {
+       unsigned long timeout = jiffies + msecs_to_jiffies(500);
        lockdep_assert_irqs_enabled();

-       while (work->flags & IRQ_WORK_BUSY)
+       while (work->flags & IRQ_WORK_BUSY) {
+               if (time_after(jiffies, timeout))
+                       pr_warn("irq_work_sync 500ms timeout, work cpu %d\n", work->cpu);
                cpu_relax();
+       }
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);


LOG:
87383 [  312.638229] Detected VIPT I-cache on CPU1
87384 [  312.638267] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
87385 [  312.638326] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
87386 [  312.673205] Detected VIPT I-cache on CPU2
87387 [  312.673243] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
87388 [  312.673303] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
87389 [  312.722140] Detected VIPT I-cache on CPU3
87390 [  312.722182] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
87391 [  312.722249] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
87392 CPUHotplug: 4877 times remaining
87393 [  313.854051] CPU1: shutdown
87394 [  313.856778] psci: CPU1 killed.
87395 [  313.894008] CPU2: shutdown
87396 [  313.896764] psci: CPU2 killed.
87397 [  313.934015] CPU3: shutdown
87398 [  313.936736] psci: CPU3 killed.
87399 [  314.970878] Detected VIPT I-cache on CPU1
87400 [  314.970921] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
87401 [  314.970987] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
87402 [  315.009201] Detected VIPT I-cache on CPU2
87403 [  315.009239] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
87404 [  315.009300] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
87405 [  315.058155] Detected VIPT I-cache on CPU3
87406 [  315.058199] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
87407 [  315.058266] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
87408 CPUHotplug: 4876 times remaining
87409 [  316.182053] CPU1: shutdown
87410 [  316.184776] psci: CPU1 killed.
87411 [  316.222002] CPU2: shutdown
87412 [  316.224729] psci: CPU2 killed.
87413 [  316.262011] CPU3: shutdown
87414 [  316.264734] psci: CPU3 killed.
87415 [  317.298143] Detected VIPT I-cache on CPU1
87416 [  317.298187] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
87417 [  317.298253] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
87418 [  317.833414] irq_work_sync 500ms timeout, work cpu 3
87419 [  317.838318] irq_work_sync 500ms timeout, work cpu 3
87420 [  317.843225] irq_work_sync 500ms timeout, work cpu 3
87421 [  317.848130] irq_work_sync 500ms timeout, work cpu 3
87422 [  317.853030] irq_work_sync 500ms timeout, work cpu 3
87423 [  317.857932] irq_work_sync 500ms timeout, work cpu 3
87424 [  317.862840] irq_work_sync 500ms timeout, work cpu 3



> On Thu, Nov 21, 2019 at 11:53 AM Rafael J. Wysocki <rafael@kernel.org>
> wrote:
> >
> > On Thu, Nov 21, 2019 at 11:13 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> > >
> > > Thanks Viresh for your quick response.
> > > The output of cpufreq info are as below, some more info for you are,
> > > our internal tree is based on v5.4-rc7, and the CPU hotplug has no i.MX
> platform code, so far we reproduced it on i.MX8QXP, i.MX8QM and i.MX8MN.
> > > With cpufreq disabled, no issue met.
> > > I also reproduced this issue with v5.4-rc7, Will continue to debug
> > > and let you know if any new found.
> > >
> > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > driver
> > > >
> > > > +Rafael and PM list.
> > > >
> > > > Please provide output of following for your platform while I am
> > > > having a look at your problem.
> > > >
> > > > grep . /sys/devices/system/cpu/cpufreq/*/*
> > >
> > > root@imx8qxpmek:~# grep . /sys/devices/system/cpu/cpufreq/*/*
> > > /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
> > > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:0
> > > /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
> > > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
> > > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:10000
> > > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold:95
> > > /sys/devices/system/cpu/cpufreq/policy0/affected_cpus:0 1 2 3
> >
> > All CPUs in one policy, CPU0 is the policy CPU and it never goes offline
> AFAICS.
> >
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_max_freq:1200000
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:1
> > > 50000
> > > /sys/devices/system/cpu/cpufreq/policy0/related_cpus:0 1 2 3
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencie
> > > s:900000 1200000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors:
> > > ondemand userspace performance schedutil
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_driver:cpufreq-dt
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor:ondemand
> >
> > Hm.  That shouldn't really make a difference, but I'm wondering if you
> > can reproduce this with the schedutil governor?
> >
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1200000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed:<unsupporte
> > > d>
> > > grep: /sys/devices/system/cpu/cpufreq/policy0/stats: Is a directory
> > >
> > >
> > > CPUHotplug: 4524 times remaining
> > > [ 5954.441803] CPU1: shutdown
> > > [ 5954.444529] psci: CPU1 killed.
> > > [ 5954.481739] CPU2: shutdown
> > > [ 5954.484484] psci: CPU2 killed.
> > > [ 5954.530509] CPU3: shutdown
> > > [ 5954.533270] psci: CPU3 killed.
> > > [ 5955.561978] Detected VIPT I-cache on CPU1 [ 5955.562015] GICv3:
> > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > 5955.562073] CPU1: Booted secondary processor 0x0000000001
> > > [0x410fd042] [ 5955.596921] Detected VIPT I-cache on CPU2 [
> > > 5955.596959] GICv3: CPU2: found redistributor 2 region
> > > 0:0x0000000051b40000 [ 5955.597018] CPU2: Booted secondary
> processor
> > > 0x0000000002 [0x410fd042] [ 5955.645878] Detected VIPT I-cache on
> > > CPU3 [ 5955.645921] GICv3: CPU3: found redistributor 3 region
> > > 0:0x0000000051b60000 [ 5955.645986] CPU3: Booted secondary
> processor
> > > 0x0000000003 [0x410fd042]
> > > CPUHotplug: 4523 times remaining
> > > [ 5956.769790] CPU1: shutdown
> > > [ 5956.772518] psci: CPU1 killed.
> > > [ 5956.809752] CPU2: shutdown
> > > [ 5956.812480] psci: CPU2 killed.
> > > [ 5956.849769] CPU3: shutdown
> > > [ 5956.852494] psci: CPU3 killed.
> > > [ 5957.882045] Detected VIPT I-cache on CPU1 [ 5957.882089] GICv3:
> > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > 5957.882153] CPU1: Booted secondary processor 0x0000000001
> > > [0x410fd042]
> > >
> > >
> > > Looping here, no hang, can response to debug console.... if
> > > attaching JTAG, I can see the CPU1 Will busy waiting for irq_work to be
> free..
> >
> > Well, cpufreq_offline() calls cpufreq_stop_governor() too, so there
> > shouldn't be any pending irq_works coming from cpufreq on the offline
> > CPUs after that.
> >
> > Hence, if an irq_work is pending at the cpufreq_online() time, it must
> > be on CPU0 (which is always online).
> 
> Let me rephrase this: If an irq_work is pending at the
> cpufreq_online() time, it must be on an online CPU, which is CPU0 if all of the
> other CPUs are offline.

  reply	other threads:[~2019-11-22  5:15 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <DB3PR0402MB391626A8ECFDC182C6EDCF8DF54E0@DB3PR0402MB3916.eurprd04.prod.outlook.com>
2019-11-21  9:35 ` About CPU hot-plug stress test failed in cpufreq driver Viresh Kumar
2019-11-21 10:13   ` Anson Huang
2019-11-21 10:53     ` Rafael J. Wysocki
2019-11-21 10:56       ` Rafael J. Wysocki
2019-11-22  5:15         ` Anson Huang [this message]
2019-11-22  9:59           ` Rafael J. Wysocki
2019-11-25  6:05             ` Anson Huang
2019-11-25  9:43               ` Anson Huang
2019-11-26  6:18                 ` Viresh Kumar
2019-11-26  8:22                   ` Anson Huang
2019-11-26  8:25                     ` Viresh Kumar
2019-11-25 12:44               ` Rafael J. Wysocki
2019-11-26  8:57                 ` Rafael J. Wysocki
2019-11-29 11:39                 ` Rafael J. Wysocki
2019-11-29 13:44                   ` Anson Huang
2019-12-05  8:53                     ` Anson Huang
2019-12-05 10:48                       ` Rafael J. Wysocki
2019-12-05 13:18                         ` Anson Huang
2019-12-05 15:52                           ` Rafael J. Wysocki
2019-12-09 10:31                             ` Peng Fan
2019-12-09 10:37                             ` Anson Huang
2019-12-09 10:56                               ` Anson Huang
2019-12-09 11:23                                 ` Rafael J. Wysocki
2019-12-09 12:32                                   ` Anson Huang
2019-12-09 12:44                                     ` Rafael J. Wysocki
2019-12-09 14:18                                       ` Anson Huang
2019-12-10  5:39                                         ` Anson Huang
2019-12-10  5:53                                       ` Peng Fan
2019-12-10  7:05                                         ` Viresh Kumar
2019-12-10  8:22                                           ` Rafael J. Wysocki
2019-12-10  8:29                                             ` Anson Huang
2019-12-10  8:36                                               ` Viresh Kumar
2019-12-10  8:37                                                 ` Peng Fan
2019-12-10  8:37                                               ` Rafael J. Wysocki
2019-12-10  8:43                                                 ` Peng Fan
2019-12-10  8:45                                                 ` Anson Huang
2019-12-10  8:50                                                   ` Rafael J. Wysocki
2019-12-10  8:51                                                     ` Anson Huang
2019-12-10 10:39                                                       ` Rafael J. Wysocki
2019-12-10 10:54                                                         ` Rafael J. Wysocki
2019-12-11  5:08                                                           ` Anson Huang
2019-12-11  8:59                                                           ` Peng Fan
2019-12-11  9:36                                                             ` Rafael J. Wysocki
2019-12-11  9:43                                                               ` Peng Fan
2019-12-11  9:52                                                                 ` Rafael J. Wysocki
2019-12-11 10:11                                                                   ` Peng Fan
2019-12-10 10:54                                                         ` Viresh Kumar
2019-12-10 11:07                                                           ` Rafael J. Wysocki
2019-12-10  8:57                                                     ` Viresh Kumar
2019-12-10 11:03                                                       ` Rafael J. Wysocki
2019-12-10  9:04                                                     ` Rafael J. Wysocki
2019-12-10  8:31                                             ` Viresh Kumar
2019-12-10  8:12                                         ` Rafael J. Wysocki
2019-12-05 11:00                       ` Viresh Kumar
2019-12-05 11:10                         ` Rafael J. Wysocki
2019-12-05 11:17                           ` Viresh Kumar
2019-11-21 10:37   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB3PR0402MB3916BDC24BDA1053B7ADBDCFF5490@DB3PR0402MB3916.eurprd04.prod.outlook.com \
    --to=anson.huang@nxp.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=ping.bai@nxp.com \
    --cc=rafael@kernel.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).