All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Josh Triplett <josh@joshtriplett.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>, <rcu@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Robert Elliott <elliott@hpe.com>
Subject: Re: [PATCH v6 2/2] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
Date: Thu, 10 Nov 2022 10:54:13 +0800	[thread overview]
Message-ID: <54d3262f-3f5f-12af-e965-d56166724bcc@huawei.com> (raw)
In-Reply-To: <20221109150834.GA127536@lothringen>



On 2022/11/9 23:08, Frederic Weisbecker wrote:
> On Wed, Nov 09, 2022 at 05:37:38PM +0800, Zhen Lei wrote:
>> This commit doucments how to quickly determine the bug causing a given
>> RCU CPU stall fault warning based on the output information provided
>> by CONFIG_RCU_CPU_STALL_CPUTIME=y.
>>
>> [ paulmck: Apply wordsmithing. ]
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>> ---
>>  Documentation/RCU/stallwarn.rst | 88 +++++++++++++++++++++++++++++++++
>>  1 file changed, 88 insertions(+)
>>
>> diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
>> index dfa4db8c0931eaf..5e24e849290a286 100644
>> --- a/Documentation/RCU/stallwarn.rst
>> +++ b/Documentation/RCU/stallwarn.rst
>> @@ -390,3 +390,91 @@ for example, "P3421".
>>  
>>  It is entirely possible to see stall warnings from normal and from
>>  expedited grace periods at about the same time during the same run.
>> +
>> +RCU_CPU_STALL_CPUTIME
>> +=====================
>> +
>> +In kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
>> +rcupdate.rcu_cpu_stall_cputime=1, the following additional information
>> +is supplied with each RCU CPU stall warning::
>> +
>> +rcu:          hardirqs   softirqs   csw/system
>> +rcu:  number:      624         45            0
>> +rcu: cputime:       69          1         2425   ==> 2500(ms)
>> +
>> +These statistics are collected during the sampling period. The values
>> +in row "number:" are the number of hard interrupts, number of soft
>> +interrupts, and number of context switches on the stalled CPU. The
>> +first three values in row "cputime:" indicate the CPU time in
>> +milliseconds consumed by hard interrupts, soft interrupts, and tasks
>> +on the stalled CPU.
> 
> Is that since the boot or since the last snapshot?

Since the last snapshot. See the diagram below:

+The sampling period is shown as follows:
+|<------------first timeout---------->|<-----second timeout----->|
+|<--half timeout-->|<--half timeout-->|                          |
+|                  |<--first period-->|                          |
+|                  |<-----------second sampling period---------->|
+|                  |                  |                          |
+|          sampling time point    1st-stall                  2nd-stall
                    |
                    |
                    Take the snapshot at this time

> 
>> The last number is the measurement interval, again
>> +in milliseconds.  Because user-mode tasks normally do not cause RCU CPU
>> +stalls, these tasks are typically kernel tasks, which is why only the
>> +system CPU time are considered.
>> +
>> +The sampling period is shown as follows:
>> +|<------------first timeout---------->|<-----second timeout----->|
>> +|<--half timeout-->|<--half timeout-->|                          |
>> +|                  |<--first period-->|                          |
>> +|                  |<-----------second sampling period---------->|
>> +|                  |                  |                          |
>> +|          sampling time point    1st-stall                  2nd-stall
>> +
>> +
>> +The following describes four typical scenarios:
>> +
>> +1. A CPU looping with interrupts disabled.::
>> +
>> +   rcu:          hardirqs   softirqs   csw/system
>> +   rcu:  number:        0          0            0
>> +   rcu: cputime:        0          0            0   ==> 2500(ms)
>> +
>> +   Because interrupts have been disabled throughout the measurement
>> +   interval, there are no interrupts and no context switches.
>> +   Furthermore, because CPU time consumption was measured using interrupt
>> +   handlers, the system CPU consumption is misleadingly measured as zero.
>> +   This scenario will normally also have "(0 ticks this GP)" printed on
>> +   this CPU's summary line.
> 
> Right, unless you're running with CONFIG_NO_HZ_FULL=y and the target CPU
> is nohz_full=, in that case you should see a delta in stime because the
> cputime is measured with the CPU clock.
> 
> Thanks.
> .
> 

-- 
Regards,
  Zhen Lei

  reply	other threads:[~2022-11-10  2:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09  9:37 [PATCH v6 0/2] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-09  9:37 ` [PATCH v6 1/2] " Zhen Lei
2022-11-09 15:20   ` Frederic Weisbecker
2022-11-10  6:55     ` Leizhen (ThunderTown)
2022-11-09 16:55   ` Elliott, Robert (Servers)
2022-11-09 17:03     ` Elliott, Robert (Servers)
2022-11-10  8:27     ` Leizhen (ThunderTown)
2022-11-09  9:37 ` [PATCH v6 2/2] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information Zhen Lei
2022-11-09 15:08   ` Frederic Weisbecker
2022-11-10  2:54     ` Leizhen (ThunderTown) [this message]
2022-11-09 15:26 ` [PATCH v6 0/2] rcu: Add RCU stall diagnosis information Frederic Weisbecker
2022-11-09 15:59   ` Paul E. McKenney
2022-11-09 17:03     ` Frederic Weisbecker
2022-11-09 17:22       ` Paul E. McKenney
2022-11-10  2:27         ` Leizhen (ThunderTown)
2022-11-12 18:59           ` Paul E. McKenney
2022-11-15  9:11             ` Leizhen (ThunderTown)
2022-11-10  7:29       ` Leizhen (ThunderTown)
2022-11-10 11:35         ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54d3262f-3f5f-12af-e965-d56166724bcc@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=elliott@hpe.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.