All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elliott, Robert (Servers)" <elliott@hpe.com>
To: Zhen Lei <thunder.leizhen@huawei.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	"Neeraj Upadhyay" <quic_neeraju@quicinc.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	"rcu@vger.kernel.org" <rcu@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information
Date: Sat, 5 Nov 2022 01:58:28 +0000	[thread overview]
Message-ID: <MW5PR84MB18425CBD1259317004F7771AAB3A9@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20221104021224.102-5-thunder.leizhen@huawei.com>


> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -95,6 +95,16 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>  	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
>  	  seconds to milliseconds.
> 
> +config RCU_CPU_STALL_CPUTIME

Since the information might change and grow over time, consider
calling it "ADDITIONAL_INFO" rather than just "CPUTIME".

> +	bool "Provide additional rcu stall debug information"
> +	depends on RCU_STALL_COMMON
> +	default n
> +	help
> +	  Statistics during the period from RCU_CPU_STALL_TIMEOUT/2 to
> +	  RCU_CPU_STALL_TIMEOUT, such as the number of (hard interrupts,
> soft
> +	  interrupts, task switches) and the cputime of (hard interrupts,
> soft
> +	  interrupts, kerenl tasks) are added to the rcu stall report.

s/kerenl/kernel/

Those parenthetical words are not grammatically correct, and
nobody will remember to update Kconfig if they change the code.

Try making it is a little less precise: 
"Collect statistics during the RCU stall timeout period
(e.g., counts and CPU time consumed by hard interrupts, soft
interrupts, task switches, and kernel tasks) and include
them in the RCU stall console error message."

...
> +
> +	pr_err("         hardirqs   softirqs   csw/system\n");
> +	pr_err(" number: %8ld %10d %12lld\n",

Everything after a %d is a "number". These are more specifically counts.

> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);

Two columns are plural and one is not. 

Within a few lines, this is referred to as "context_switches", cws, 
and "task switches." One term would be better. vmstat uses "cs", 
pidstat uses "cswch", and /proc/<pid>/status uses "ctxt_switches,"
so there's not one ideal term for that column. Matching one
of those might be better than using another unique name.

Since no other variants are shown, the "/system" suffix is a bit
confusing. 


> +	pr_err("cputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> +		div_u64(cpustat[CPUTIME_IRQ] - rsrp->cputime_irq, NSEC_PER_MSEC),
> +		div_u64(cpustat[CPUTIME_SOFTIRQ] - rsrp->cputime_softirq, NSEC_PER_MSEC),
> +		div_u64(cpustat[CPUTIME_SYSTEM] - rsrp->cputime_system, NSEC_PER_MSEC),
> +		jiffies64_to_msecs(half_timeout));

There is no column header for that fourth number. "half_timeout"
might be good, dropping the "==>".

The "(ms)" at the end is awkward. Try moving it to the left
as "cputime (ms): "

> +int rcu_cpu_stall_cputime __read_mostly =
> IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);

As a config option and module parameter, adding some more
instrumentation overhead might be worthwhile for other
likely causes of rcu stalls.

For example, if enabled, have these functions (if available
on the architecture) maintain a per-CPU running count of
their invocations, which also cause the CPU to be unavailable
for rcu: 
- kernel_fpu_begin() calls - FPU/SIMD context preservation,
  which also calls preempt_disable()
- preempt_disable() calls - scheduler context switches disabled
- local_irq_save() calls - interrupts disabled
- cond_resched() calls - lack of these is a problem

For kernel_fpu_begin and preempt_disable, knowing if it is
currently blocked for those reasons is probably the most
helpful.


  parent reply	other threads:[~2022-11-05  1:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04  2:12 [PATCH v4 0/4] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-04  2:12 ` [PATCH v4 1/4] genirq: Fix the return type of kstat_cpu_irqs_sum() Zhen Lei
2022-11-04  2:12 ` [PATCH v4 2/4] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
2022-11-04  2:12 ` [PATCH v4 3/4] sched: Add helper nr_context_switches_cpu() Zhen Lei
2022-11-04  2:12 ` [PATCH v4 4/4] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-04  3:12   ` Leizhen (ThunderTown)
2022-11-04 13:43     ` Paul E. McKenney
2022-11-05  1:58   ` Elliott, Robert (Servers) [this message]
2022-11-05  7:03     ` Leizhen (ThunderTown)
2022-11-05 20:32       ` Paul E. McKenney
2022-11-07  3:20         ` Leizhen (ThunderTown)
2022-11-05 20:06     ` Paul E. McKenney
2022-11-07  2:10       ` Elliott, Robert (Servers)
2022-11-07 11:49         ` Leizhen (ThunderTown)
2022-11-07 20:38           ` Elliott, Robert (Servers)
2022-11-07 21:57             ` Elliott, Robert (Servers)
2022-11-08  3:06               ` Leizhen (ThunderTown)
2022-11-08  5:53                 ` Elliott, Robert (Servers)
2022-11-08  6:41                   ` Leizhen (ThunderTown)
2022-11-08 19:29                     ` Elliott, Robert (Servers)
2022-11-08  3:18             ` Leizhen (ThunderTown)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW5PR84MB18425CBD1259317004F7771AAB3A9@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM \
    --to=elliott@hpe.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=thunder.leizhen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.