All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: "Elliott, Robert (Servers)" <elliott@hpe.com>,
	"paulmck@kernel.org" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Josh Triplett <josh@joshtriplett.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	"rcu@vger.kernel.org" <rcu@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information
Date: Tue, 8 Nov 2022 14:41:28 +0800	[thread overview]
Message-ID: <e6f96a66-59e6-5889-a0f0-6451d8469c24@huawei.com> (raw)
In-Reply-To: <MW5PR84MB1842F64C2019065948048B92AB3F9@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>



On 2022/11/8 13:53, Elliott, Robert (Servers) wrote:
> 
> 
>> -----Original Message-----
>> From: Leizhen (ThunderTown) <thunder.leizhen@huawei.com>
>> Sent: Monday, November 7, 2022 9:07 PM
>> Subject: Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information
>>
>> On 2022/11/8 5:57, Elliott, Robert (Servers) wrote:
>>> I created a 22 second stall, which triggered two self-detected stall
>>> messages. The second one covers 18 seconds (and reports 17444 ms
>>> of system cputime), but still reports the half_timeout of 2.5 s on
>>
>> Because I have not updated rsrp->gp_seq in print_cpu_stat_info().
>>
>> Please add rsrp->gp_seq-- at the bottom of print_cpu_stat_info()
>> and try again.
>>
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index 2e560a70d88fd87..6f6c95d599e6436 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -455,6 +455,7 @@ static void print_cpu_stat_info(int cpu)
>>                 div_u64(cpustat[CPUTIME_SOFTIRQ] - rsrp->cputime_softirq,
>> NSEC_PER_MSEC),
>>                 div_u64(cpustat[CPUTIME_SYSTEM] - rsrp->cputime_system,
>> NSEC_PER_MSEC),
>>                 jiffies64_to_msecs(half_timeout));
>> +       rsrp->gp_seq--;
>>  }
> 
> 
> That causes a lot more expedited stall messages to be printed (52 of them).

There doesn't seem to be a causal relationship. I haven't figured it out yet.

> [   80.709572] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 3 jiffies s: 481 root: 0x2/.
> [   80.743724] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 311 jiffies s: 481 root: 0x2/.
> [   81.028028] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 620 jiffies s: 481 root: 0x2/.
> [   81.313255] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 929 jiffies s: 481 root: 0x2/.
> [   81.598483] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 1238 jiffies s: 481 root: 0x2/.
> [   81.883710] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 1547 jiffies s: 481 root: 0x2/.
> ...
> [   95.077093] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 15840 jiffies s: 481 root: 0x2/.
> [   95.362327] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 22-.... } 16149 jiffies s: 481 root: 0x2/.
> [   95.647549] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
> 
> 
> Of the three self-detected prints, the second and third system cputimes 
> of 8809 and 8655 are still greater than 1500.

Maybe we should add another member to record jiffies.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b33165dba1ea745..6d9c876ea5c85a6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -880,6 +880,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
                        rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
                        rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
                        rsrp->nr_csw = nr_context_switches_cpu(rdp->cpu);
+                       rsrp->jiffies = jiffies;
                        rsrp->gp_seq = rdp->gp_seq;
                }
        }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index fa159a951ded42e..cbe4f09252dce20 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -172,6 +172,7 @@ struct rcu_snap_record {
        unsigned long   nr_hardirqs;    /* Accumulated number of hard irqs */
        unsigned int    nr_softirqs;    /* Accumulated number of soft irqs */
        unsigned long long nr_csw;      /* Accumulated number of task switches */
+       unsigned long   jiffies;        /* jiffies at the sampling time */
 };

 /* Per-CPU data for read-copy update. */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 2e560a70d88fd87..7b6afb9c7b96dbe 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -431,7 +431,6 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
 static void print_cpu_stat_info(int cpu)
 {
        u64 *cpustat;
-       unsigned long half_timeout;
        struct rcu_snap_record *rsrp;
        struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);

@@ -443,7 +442,6 @@ static void print_cpu_stat_info(int cpu)
                return;

        cpustat = kcpustat_cpu(cpu).cpustat;
-       half_timeout = rcu_jiffies_till_stall_check() / 2;

        pr_err("         hardirqs   softirqs   csw/system\n");
        pr_err(" number: %8ld %10d %12lld\n",
@@ -454,7 +452,7 @@ static void print_cpu_stat_info(int cpu)
                div_u64(cpustat[CPUTIME_IRQ] - rsrp->cputime_irq, NSEC_PER_MSEC),
                div_u64(cpustat[CPUTIME_SOFTIRQ] - rsrp->cputime_softirq, NSEC_PER_MSEC),
                div_u64(cpustat[CPUTIME_SYSTEM] - rsrp->cputime_system, NSEC_PER_MSEC),
-               jiffies64_to_msecs(half_timeout));
+               jiffies64_to_msecs(jiffies - rsrp->jiffies));
 }

 /*


> 
> [   73.774167] tcrypt: rcu testing - kernel_fpu_disable for rude 22 s
> [   76.540071] rcu: INFO: rcu_preempt self-detected stall on CPU 22
> [   76.546118] rcu:     22-....: (2994 ticks this GP) idle=716c/1/0x4000000000000000 softirq=2095/2095 fqs=751
> [   76.555661] rcu:     CPU 22          hardirqs   softirqs          csw     system  cond_resched
> [   76.564067] rcu:     CPU 22   count:        0          4             0                        0
> [   76.572560] rcu:     CPU 22 cputime:        3          0                     1493               ==> 1500 (ms)
> [   76.582277] rcu:     CPU 22 current: in_kernel_fpu_begin=1 this_cpu_preemptible=0
> [   76.589546] rcu:     (self-detected stall on CPU 22 t=3054 jiffies g=13897 q=274 ncpus=56)
> ...
> [   85.021211] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
> [   85.158741] rcu: INFO: rcu_preempt self-detected stall on CPU 22
> [   85.159351]  22-....
> [   85.163204] rcu:     22-....: (11839 ticks this GP) idle=716c/1/0x4000000000000000 softirq=2095/2095 fqs=2959
> [   85.163207] rcu:     CPU 22          hardirqs   softirqs          csw     system  cond_resched
> [   85.163208] rcu:     CPU 22   count:        0         43             0                        0
> [   85.163210] rcu:     CPU 22 cputime:      491          0                     8809               ==> 1500 (ms)
> [   85.166980]  } 4831 jiffies s: 481 root: 0x2/.
> [   85.170745] rcu:     CPU 22 current: in_kernel_fpu_begin=1 this_cpu_preemptible=0
> [   85.170747] rcu:     (self-detected stall on CPU 22 t=12351 jiffies g=13897 q=2542 ncpus=56)
> ...
> [   93.899261] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
> [   93.920482] rcu: INFO: rcu_preempt self-detected stall on CPU 22
> [   93.933812]  22-....
> [   93.939063] rcu:     22-....: (20543 ticks this GP) idle=716c/1/0x4000000000000000 softirq=2095/2095 fqs=5216
> [   93.939066] rcu:     CPU 22          hardirqs   softirqs          csw     system  cond_resched
> [   93.939067] rcu:     CPU 22   count:        0         59             0                        0
> [   93.946245]  } 14342 jiffies s: 481 root: 0x2/.
> [   93.953415] rcu:     CPU 22 cputime:      777          0                     8655               ==> 1500 (ms)
> [   93.953418] rcu:     CPU 22 current: in_kernel_fpu_begin=1 this_cpu_preemptible=0
> [   93.953419] rcu:     (self-detected stall on CPU 22 t=21865 jiffies g=13897 q=6284 ncpus=56)
> [   93.953421] CPU: 22 PID: 2604 Comm: modprobe Not tainted 6.0.0+ #13
> [   93.960598] rcu: blocking rcu_node structures (internal RCU debug):
> 
> (I added "CPU %d" to all the self-detected lines to try to help the interleaving)
> 
> 
> 

-- 
Regards,
  Zhen Lei

  reply	other threads:[~2022-11-08  6:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04  2:12 [PATCH v4 0/4] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-04  2:12 ` [PATCH v4 1/4] genirq: Fix the return type of kstat_cpu_irqs_sum() Zhen Lei
2022-11-04  2:12 ` [PATCH v4 2/4] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
2022-11-04  2:12 ` [PATCH v4 3/4] sched: Add helper nr_context_switches_cpu() Zhen Lei
2022-11-04  2:12 ` [PATCH v4 4/4] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-04  3:12   ` Leizhen (ThunderTown)
2022-11-04 13:43     ` Paul E. McKenney
2022-11-05  1:58   ` Elliott, Robert (Servers)
2022-11-05  7:03     ` Leizhen (ThunderTown)
2022-11-05 20:32       ` Paul E. McKenney
2022-11-07  3:20         ` Leizhen (ThunderTown)
2022-11-05 20:06     ` Paul E. McKenney
2022-11-07  2:10       ` Elliott, Robert (Servers)
2022-11-07 11:49         ` Leizhen (ThunderTown)
2022-11-07 20:38           ` Elliott, Robert (Servers)
2022-11-07 21:57             ` Elliott, Robert (Servers)
2022-11-08  3:06               ` Leizhen (ThunderTown)
2022-11-08  5:53                 ` Elliott, Robert (Servers)
2022-11-08  6:41                   ` Leizhen (ThunderTown) [this message]
2022-11-08 19:29                     ` Elliott, Robert (Servers)
2022-11-08  3:18             ` Leizhen (ThunderTown)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6f96a66-59e6-5889-a0f0-6451d8469c24@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=elliott@hpe.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.