linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/6] rcu: Add RCU stall diagnosis information
@ 2022-11-11 13:07 Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 1/6] genirq: Fix the return type of kstat_cpu_irqs_sum() Zhen Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

v6 --> v7:
1. Use kcpustat_field() to obtain the cputime.
2. Make the output start with "\t" to match other related prints.
3. Aligns the output of the last line of RCU stall.

v5 --> v6:
1. When there are more than two continuous RCU stallings, correctly handle the
   value of the second and subsequent sampling periods. Update comments and
   document.
   Thanks to Elliott, Robert for the test.
2. Change "rcu stall" to "RCU stall".

v4 --> v5:
1. Resolve a git am conflict. No code change.

v3 --> v4:
1. Rename rcu_cpu_stall_deep_debug to rcu_cpu_stall_cputime.

v2 --> v3:
1. Fix the return type of kstat_cpu_irqs_sum()
2. Add Kconfig option CONFIG_RCU_CPU_STALL_DEEP_DEBUG and boot parameter
   rcupdate.rcu_cpu_stall_deep_debug.
3. Add comments and normalize local variable name


v1 --> v2:
1. Fixed a bug in the code. If the rcu stall is detected by another CPU,
   kcpustat_this_cpu cannot be used.
@@ -451,7 +451,7 @@ static void print_cpu_stat_info(int cpu)
        if (r->gp_seq != rdp->gp_seq)
                return;

-       cpustat = kcpustat_this_cpu->cpustat;
+       cpustat = kcpustat_cpu(cpu).cpustat;
2. Move the start point of statistics from rcu_stall_kick_kthreads() to
   rcu_implicit_dynticks_qs(), removing the dependency on irq_work.

v1:
In some extreme cases, such as the I/O pressure test, the CPU usage may
be 100%, causing RCU stall. In this case, the printed information about
current is not useful. Displays the number and usage of hard interrupts,
soft interrupts, and context switches that are generated within half of
the CPU stall timeout, can help us make a general judgment. In other
cases, we can preliminarily determine whether an infinite loop occurs
when local_irq, local_bh or preempt is disabled.

Zhen Lei (6):
  genirq: Fix the return type of kstat_cpu_irqs_sum()
  sched: Add helper kstat_cpu_softirqs_sum()
  sched: Add helper nr_context_switches_cpu()
  rcu: Add RCU stall diagnosis information
  doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
  rcu: Align the output of RCU stall

 Documentation/RCU/stallwarn.rst               | 88 +++++++++++++++++++
 .../admin-guide/kernel-parameters.txt         |  6 ++
 include/linux/kernel_stat.h                   | 14 ++-
 kernel/rcu/Kconfig.debug                      | 11 +++
 kernel/rcu/rcu.h                              |  1 +
 kernel/rcu/tree.c                             | 18 ++++
 kernel/rcu/tree.h                             | 19 ++++
 kernel/rcu/tree_stall.h                       | 35 +++++++-
 kernel/rcu/update.c                           |  2 +
 kernel/sched/core.c                           |  5 ++
 10 files changed, 196 insertions(+), 3 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 1/6] genirq: Fix the return type of kstat_cpu_irqs_sum()
  2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
@ 2022-11-11 13:07 ` Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

The type of member irqs_sum is unsigned long, so kstat_cpu_irqs_sum()
should also return unsigned long. Otherwise, the value maybe truncated.

Fixes: f2c66cd8eedd ("/proc/stat: scalability of irq num per cpu")
Reported-by: Elliott, Robert (Servers) <elliott@hpe.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kernel_stat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index ddb5a358fd829f4..90e2fdc17d79ff8 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -75,7 +75,7 @@ extern unsigned int kstat_irqs_usr(unsigned int irq);
 /*
  * Number of interrupts per cpu, since bootup
  */
-static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
+static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
 {
 	return kstat_cpu(cpu).irqs_sum;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum()
  2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 1/6] genirq: Fix the return type of kstat_cpu_irqs_sum() Zhen Lei
@ 2022-11-11 13:07 ` Zhen Lei
  2022-11-14 11:42   ` Frederic Weisbecker
  2022-11-11 13:07 ` [PATCH v7 3/6] sched: Add helper nr_context_switches_cpu() Zhen Lei
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
interrupts on a specified CPU.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kernel_stat.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 90e2fdc17d79ff8..898076e173a928a 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
        return kstat_cpu(cpu).softirqs[irq];
 }
 
+static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
+{
+	int i;
+	unsigned int sum = 0;
+
+	for (i = 0; i < NR_SOFTIRQS; i++)
+		sum += kstat_softirqs_cpu(i, cpu);
+
+	return sum;
+}
+
 /*
  * Number of interrupts per specific IRQ source, since bootup
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 3/6] sched: Add helper nr_context_switches_cpu()
  2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 1/6] genirq: Fix the return type of kstat_cpu_irqs_sum() Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
@ 2022-11-11 13:07 ` Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 4/6] rcu: Add RCU stall diagnosis information Zhen Lei
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

Returns the number of context switches on the specified CPU, which can be
used to diagnose rcu stall.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kernel_stat.h | 1 +
 kernel/sched/core.c         | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 898076e173a928a..9935f7ecbfb9e31 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -52,6 +52,7 @@ DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat);
 #define kstat_cpu(cpu) per_cpu(kstat, cpu)
 #define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu)
 
+extern unsigned long long nr_context_switches_cpu(int cpu);
 extern unsigned long long nr_context_switches(void);
 
 extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5800b0623ff3068..a0d19f67f2046d9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5229,6 +5229,11 @@ bool single_task_running(void)
 }
 EXPORT_SYMBOL(single_task_running);
 
+unsigned long long nr_context_switches_cpu(int cpu)
+{
+	return cpu_rq(cpu)->nr_switches;
+}
+
 unsigned long long nr_context_switches(void)
 {
 	int i;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
                   ` (2 preceding siblings ...)
  2022-11-11 13:07 ` [PATCH v7 3/6] sched: Add helper nr_context_switches_cpu() Zhen Lei
@ 2022-11-11 13:07 ` Zhen Lei
  2022-11-14 11:24   ` Frederic Weisbecker
  2022-11-16 22:39   ` Frederic Weisbecker
  2022-11-11 13:07 ` [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information Zhen Lei
  2022-11-11 13:07 ` [PATCH v7 6/6] rcu: Align the output of RCU stall Zhen Lei
  5 siblings, 2 replies; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

Because RCU CPU stall warnings are driven from the scheduling-clock
interrupt handler, a workload consisting of a very large number of
short-duration hardware interrupts can result in misleading stall-warning
messages.  On systems supporting only a single level of interrupts,
that is, where interrupts handlers cannot be interrupted, this can
produce misleading diagnostics.  The stack traces will show the
innocent-bystander interrupted task, not the interrupts that are
at the very least exacerbating the stall.

This situation can be improved by displaying the number of interrupts
and the CPU time that they have consumed.  Diagnosing other types
of stalls can be eased by also providing the count of softirqs and
the CPU time that they consumed as well as the number of context
switches and the task-level CPU time consumed.

Consider the following output given this change:

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:     0-....: (1250 ticks this GP) <omitted>
rcu:          hardirqs   softirqs   csw/system
rcu:  number:      624         45            0
rcu: cputime:       69          1         2425   ==> 2500(ms)

This output shows that the number of hard and soft interrupts is small,
there are no context switches, and the system takes up a lot of time. This
indicates that the current task is looping with preemption disabled.

The impact on system performance is negligible because snapshot is
recorded only once for all continuous RCU stalls.

This added debugging information is suppressed by default and can be
enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
by booting with rcupdate.rcu_cpu_stall_cputime=1.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  6 ++++
 kernel/rcu/Kconfig.debug                      | 11 +++++++
 kernel/rcu/rcu.h                              |  1 +
 kernel/rcu/tree.c                             | 18 +++++++++++
 kernel/rcu/tree.h                             | 19 ++++++++++++
 kernel/rcu/tree_stall.h                       | 31 +++++++++++++++++++
 kernel/rcu/update.c                           |  2 ++
 7 files changed, 88 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 811b2e6d4672685..ee7d9d962591c5d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5084,6 +5084,12 @@
 			rcupdate.rcu_cpu_stall_timeout to be used (after
 			conversion from seconds to milliseconds).
 
+	rcupdate.rcu_cpu_stall_cputime= [KNL]
+			Provide statistics on the cputime and count of
+			interrupts and tasks during the sampling period. For
+			multiple continuous RCU stalls, all sampling periods
+			begin at half of the first RCU stall timeout.
+
 	rcupdate.rcu_expedited= [KNL]
 			Use expedited grace-period primitives, for
 			example, synchronize_rcu_expedited() instead
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 1b0c41d490f0588..025566a9ba44667 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -95,6 +95,17 @@ config RCU_EXP_CPU_STALL_TIMEOUT
 	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
 	  seconds to milliseconds.
 
+config RCU_CPU_STALL_CPUTIME
+	bool "Provide additional RCU stall debug information"
+	depends on RCU_STALL_COMMON
+	default n
+	help
+	  Collect statistics during the sampling period, such as the number of
+	  (hard interrupts, soft interrupts, task switches) and the cputime of
+	  (hard interrupts, soft interrupts, kernel tasks) are added to the
+	  RCU stall report. For multiple continuous RCU stalls, all sampling
+	  periods begin at half of the first RCU stall timeout.
+
 config RCU_TRACE
 	bool "Enable tracing for RCU"
 	depends on DEBUG_KERNEL
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 96122f203187f39..4844dec36bddb48 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -231,6 +231,7 @@ extern int rcu_cpu_stall_ftrace_dump;
 extern int rcu_cpu_stall_suppress;
 extern int rcu_cpu_stall_timeout;
 extern int rcu_exp_cpu_stall_timeout;
+extern int rcu_cpu_stall_cputime;
 int rcu_jiffies_till_stall_check(void);
 int rcu_exp_jiffies_till_stall_check(void);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ed93ddb8203d42c..3921aacfd421ba9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
 		}
+
+		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
+			int cpu = rdp->cpu;
+			struct rcu_snap_record *rsrp;
+			struct kernel_cpustat *kcsp;
+
+			kcsp = &kcpustat_cpu(cpu);
+
+			rsrp = &rdp->snap_record;
+			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
+			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
+			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
+			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
+			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
+			rsrp->nr_csw = nr_context_switches_cpu(rdp->cpu);
+			rsrp->jiffies = jiffies;
+			rsrp->gp_seq = rdp->gp_seq;
+		}
 	}
 
 	return 0;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index fcb5d696eb1700d..192536916f9a607 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -158,6 +158,23 @@ union rcu_noqs {
 	u16 s; /* Set of bits, aggregate OR here. */
 };
 
+/*
+ * Record the snapshot of the core stats at half of the first RCU stall timeout.
+ * The member gp_seq is used to ensure that all members are updated only once
+ * during the sampling period. The snapshot is taken only if this gp_seq is not
+ * equal to rdp->gp_seq.
+ */
+struct rcu_snap_record {
+	unsigned long	gp_seq;		/* Track rdp->gp_seq counter */
+	u64		cputime_irq;	/* Accumulated cputime of hard irqs */
+	u64		cputime_softirq;/* Accumulated cputime of soft irqs */
+	u64		cputime_system; /* Accumulated cputime of kernel tasks */
+	unsigned long	nr_hardirqs;	/* Accumulated number of hard irqs */
+	unsigned int	nr_softirqs;	/* Accumulated number of soft irqs */
+	unsigned long long nr_csw;	/* Accumulated number of task switches */
+	unsigned long   jiffies;	/* Track jiffies value */
+};
+
 /* Per-CPU data for read-copy update. */
 struct rcu_data {
 	/* 1) quiescent-state and grace-period handling : */
@@ -262,6 +279,8 @@ struct rcu_data {
 	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
 	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
 	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
+	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
+					    /* the first RCU stall timeout */
 
 	long lazy_len;			/* Length of buffered lazy callbacks. */
 	int cpu;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 5653560573e22d6..a91e844872e59d2 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -428,6 +428,35 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
 	return j > 2 * HZ;
 }
 
+static void print_cpu_stat_info(int cpu)
+{
+	struct rcu_snap_record rsr, *rsrp;
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
+
+	if (!rcu_cpu_stall_cputime)
+		return;
+
+	rsrp = &rdp->snap_record;
+	if (rsrp->gp_seq != rdp->gp_seq)
+		return;
+
+	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
+	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
+	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
+
+	pr_err("\t         hardirqs   softirqs   csw/system\n");
+	pr_err("\t number: %8ld %10d %12lld\n",
+		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
+		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
+		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
+	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
+		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
+		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
+		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
+		jiffies64_to_msecs(jiffies - rsrp->jiffies));
+}
+
 /*
  * Print out diagnostic information for the specified stalled CPU.
  *
@@ -484,6 +513,8 @@ static void print_cpu_stall_info(int cpu)
 	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
 	       rcuc_starved ? buf : "",
 	       falsepositive ? " (false positive?)" : "");
+
+	print_cpu_stat_info(cpu);
 }
 
 /* Complain about starvation of grace-period kthread.  */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a05e23648c6b99f..76f9848a21cd5be 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -508,6 +508,8 @@ int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
 module_param(rcu_cpu_stall_timeout, int, 0644);
 int rcu_exp_cpu_stall_timeout __read_mostly = CONFIG_RCU_EXP_CPU_STALL_TIMEOUT;
 module_param(rcu_exp_cpu_stall_timeout, int, 0644);
+int rcu_cpu_stall_cputime __read_mostly = IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);
+module_param(rcu_cpu_stall_cputime, int, 0644);
 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
 
 // Suppress boot-time RCU CPU stall warnings and rcutorture writer stall
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
  2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
                   ` (3 preceding siblings ...)
  2022-11-11 13:07 ` [PATCH v7 4/6] rcu: Add RCU stall diagnosis information Zhen Lei
@ 2022-11-11 13:07 ` Zhen Lei
  2022-11-16 22:55   ` Frederic Weisbecker
  2022-11-11 13:07 ` [PATCH v7 6/6] rcu: Align the output of RCU stall Zhen Lei
  5 siblings, 1 reply; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

This commit doucments how to quickly determine the bug causing a given
RCU CPU stall fault warning based on the output information provided
by CONFIG_RCU_CPU_STALL_CPUTIME=y.

[ paulmck: Apply wordsmithing. ]

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/RCU/stallwarn.rst | 88 +++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index dfa4db8c0931eaf..5e24e849290a286 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -390,3 +390,91 @@ for example, "P3421".
 
 It is entirely possible to see stall warnings from normal and from
 expedited grace periods at about the same time during the same run.
+
+RCU_CPU_STALL_CPUTIME
+=====================
+
+In kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
+rcupdate.rcu_cpu_stall_cputime=1, the following additional information
+is supplied with each RCU CPU stall warning::
+
+rcu:          hardirqs   softirqs   csw/system
+rcu:  number:      624         45            0
+rcu: cputime:       69          1         2425   ==> 2500(ms)
+
+These statistics are collected during the sampling period. The values
+in row "number:" are the number of hard interrupts, number of soft
+interrupts, and number of context switches on the stalled CPU. The
+first three values in row "cputime:" indicate the CPU time in
+milliseconds consumed by hard interrupts, soft interrupts, and tasks
+on the stalled CPU.  The last number is the measurement interval, again
+in milliseconds.  Because user-mode tasks normally do not cause RCU CPU
+stalls, these tasks are typically kernel tasks, which is why only the
+system CPU time are considered.
+
+The sampling period is shown as follows:
+|<------------first timeout---------->|<-----second timeout----->|
+|<--half timeout-->|<--half timeout-->|                          |
+|                  |<--first period-->|                          |
+|                  |<-----------second sampling period---------->|
+|                  |                  |                          |
+|          sampling time point    1st-stall                  2nd-stall
+
+
+The following describes four typical scenarios:
+
+1. A CPU looping with interrupts disabled.::
+
+   rcu:          hardirqs   softirqs   csw/system
+   rcu:  number:        0          0            0
+   rcu: cputime:        0          0            0   ==> 2500(ms)
+
+   Because interrupts have been disabled throughout the measurement
+   interval, there are no interrupts and no context switches.
+   Furthermore, because CPU time consumption was measured using interrupt
+   handlers, the system CPU consumption is misleadingly measured as zero.
+   This scenario will normally also have "(0 ticks this GP)" printed on
+   this CPU's summary line.
+
+2. A CPU looping with bottom halves disabled.
+
+   This is similar to the previous example, but with non-zero number of
+   and CPU time consumed by hard interrupts, along with non-zero CPU
+   time consumed by in-kernel execution.::
+
+   rcu:          hardirqs   softirqs   csw/system
+   rcu:  number:      624          0            0
+   rcu: cputime:       49          0         2446   ==> 2500(ms)
+
+   The fact that there are zero softirqs gives a hint that these were
+   disabled, perhaps via local_bh_disable().  It is of course possible
+   that there were no softirqs, perhaps because all events that would
+   result in softirq execution are confined to other CPUs.  In this case,
+   the diagnosis should continue as shown in the next example.
+
+3. A CPU looping with preemption disabled.
+
+   Here, only the number of context switches is zero.::
+
+   rcu:          hardirqs   softirqs   csw/system
+   rcu:  number:      624         45            0
+   rcu: cputime:       69          1         2425   ==> 2500(ms)
+
+   This situation hints that the stalled CPU was looping with preemption
+   disabled.
+
+4. No looping, but massive hard and soft interrupts.::
+
+   rcu:          hardirqs   softirqs   csw/system
+   rcu:  number:       xx         xx            0
+   rcu: cputime:       xx         xx            0   ==> 2500(ms)
+
+   Here, the number and CPU time of hard interrupts are all non-zero,
+   but the number of context switches and the in-kernel CPU time consumed
+   are zero. The number and cputime of soft interrupts will usually be
+   non-zero, but could be zero, for example, if the CPU was spinning
+   within a single hard interrupt handler.
+
+   If this type of RCU CPU stall warning can be reproduced, you can
+   narrow it down by looking at /proc/interrupts or by writing code to
+   trace each interrupt, for example, by referring to show_interrupts().
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 6/6] rcu: Align the output of RCU stall
  2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
                   ` (4 preceding siblings ...)
  2022-11-11 13:07 ` [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information Zhen Lei
@ 2022-11-11 13:07 ` Zhen Lei
  5 siblings, 0 replies; 23+ messages in thread
From: Zhen Lei @ 2022-11-11 13:07 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei, Robert Elliott

The time stamps is added to the output when CONFIG_PRINTK_TIME=y, which
will cause the alignment function to fail. So replace pr_cont() with
pr_err(), which also decouples the printing of subfunctions.

Before:
[   37.567343] rcu: INFO: rcu_preempt self-detected stall on CPU
[   37.567839] rcu:     0-....: (1500 ticks this GP) idle=***
[   37.568270]  (t=1501 jiffies g=4717 q=28 ncpus=4)
[   37.568668] CPU: 0 PID: 313 Comm: test0 Not tainted 6.1.0-rc4 #8

After:
[   36.762074] rcu: INFO: rcu_preempt self-detected stall on CPU
[   36.762543] rcu:     0-....: (1499 ticks this GP) idle=***
[   36.763003] rcu:     (t=1500 jiffies g=5097 q=27 ncpus=4)
[   36.763522] CPU: 0 PID: 313 Comm: test0 Not tainted 6.1.0-rc4 #9

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree_stall.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index a91e844872e59d2..9dc76ba28d372e9 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -619,7 +619,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 
 	for_each_possible_cpu(cpu)
 		totqlen += rcu_get_n_cbs_cpu(cpu);
-	pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
+	pr_err("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
 	       smp_processor_id(), (long)(jiffies - gps),
 	       (long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
 	if (ndetected) {
@@ -680,7 +680,7 @@ static void print_cpu_stall(unsigned long gps)
 	raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
 	for_each_possible_cpu(cpu)
 		totqlen += rcu_get_n_cbs_cpu(cpu);
-	pr_cont("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
+	pr_err("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
 		jiffies - gps,
 		(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-11 13:07 ` [PATCH v7 4/6] rcu: Add RCU stall diagnosis information Zhen Lei
@ 2022-11-14 11:24   ` Frederic Weisbecker
  2022-11-14 12:32     ` Leizhen (ThunderTown)
  2022-11-16 22:39   ` Frederic Weisbecker
  1 sibling, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-14 11:24 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> Because RCU CPU stall warnings are driven from the scheduling-clock
> interrupt handler, a workload consisting of a very large number of
> short-duration hardware interrupts can result in misleading stall-warning
> messages.  On systems supporting only a single level of interrupts,
> that is, where interrupts handlers cannot be interrupted, this can
> produce misleading diagnostics.  The stack traces will show the
> innocent-bystander interrupted task, not the interrupts that are
> at the very least exacerbating the stall.
> 
> This situation can be improved by displaying the number of interrupts
> and the CPU time that they have consumed.  Diagnosing other types
> of stalls can be eased by also providing the count of softirqs and
> the CPU time that they consumed as well as the number of context
> switches and the task-level CPU time consumed.
> 
> Consider the following output given this change:
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     0-....: (1250 ticks this GP) <omitted>
> rcu:          hardirqs   softirqs   csw/system
> rcu:  number:      624         45            0
> rcu: cputime:       69          1         2425   ==> 2500(ms)
> 
> This output shows that the number of hard and soft interrupts is small,
> there are no context switches, and the system takes up a lot of time. This
> indicates that the current task is looping with preemption disabled.
> 
> The impact on system performance is negligible because snapshot is
> recorded only once for all continuous RCU stalls.
> 
> This added debugging information is suppressed by default and can be
> enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
> by booting with rcupdate.rcu_cpu_stall_cputime=1.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>  kernel/rcu/Kconfig.debug                      | 11 +++++++
>  kernel/rcu/rcu.h                              |  1 +
>  kernel/rcu/tree.c                             | 18 +++++++++++
>  kernel/rcu/tree.h                             | 19 ++++++++++++
>  kernel/rcu/tree_stall.h                       | 31 +++++++++++++++++++
>  kernel/rcu/update.c                           |  2 ++
>  7 files changed, 88 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 811b2e6d4672685..ee7d9d962591c5d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5084,6 +5084,12 @@
>  			rcupdate.rcu_cpu_stall_timeout to be used (after
>  			conversion from seconds to milliseconds).
>  
> +	rcupdate.rcu_cpu_stall_cputime= [KNL]
> +			Provide statistics on the cputime and count of
> +			interrupts and tasks during the sampling period. For
> +			multiple continuous RCU stalls, all sampling periods
> +			begin at half of the first RCU stall timeout.
> +
>  	rcupdate.rcu_expedited= [KNL]
>  			Use expedited grace-period primitives, for
>  			example, synchronize_rcu_expedited() instead
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 1b0c41d490f0588..025566a9ba44667 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -95,6 +95,17 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>  	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
>  	  seconds to milliseconds.
>  
> +config RCU_CPU_STALL_CPUTIME
> +	bool "Provide additional RCU stall debug information"
> +	depends on RCU_STALL_COMMON
> +	default n
> +	help
> +	  Collect statistics during the sampling period, such as the number of
> +	  (hard interrupts, soft interrupts, task switches) and the cputime of
> +	  (hard interrupts, soft interrupts, kernel tasks) are added to the
> +	  RCU stall report. For multiple continuous RCU stalls, all sampling
> +	  periods begin at half of the first RCU stall timeout.
> +
>  config RCU_TRACE
>  	bool "Enable tracing for RCU"
>  	depends on DEBUG_KERNEL
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 96122f203187f39..4844dec36bddb48 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -231,6 +231,7 @@ extern int rcu_cpu_stall_ftrace_dump;
>  extern int rcu_cpu_stall_suppress;
>  extern int rcu_cpu_stall_timeout;
>  extern int rcu_exp_cpu_stall_timeout;
> +extern int rcu_cpu_stall_cputime;
>  int rcu_jiffies_till_stall_check(void);
>  int rcu_exp_jiffies_till_stall_check(void);
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index ed93ddb8203d42c..3921aacfd421ba9 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
>  		}
> +
> +		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
> +			int cpu = rdp->cpu;
> +			struct rcu_snap_record *rsrp;
> +			struct kernel_cpustat *kcsp;
> +
> +			kcsp = &kcpustat_cpu(cpu);
> +
> +			rsrp = &rdp->snap_record;
> +			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> +			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> +			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> +			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
> +			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);

Getting the sum of all CPU's IRQs, with even two iterations on all of them, look
costly. So I have to ask: why is this information useful and why can't we deduce
it from other CPUs stall reports?

I'm also asking because this rcu_cpu_stall_cputime is likely to be very useful for
distros, to the point that I expect it to be turned on by default as doing a
snapshot of kcpustat fields is cheap. But doing that wide CPU snapshot is
definetly going to be an unbearable overhead.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum()
  2022-11-11 13:07 ` [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
@ 2022-11-14 11:42   ` Frederic Weisbecker
  2022-11-14 12:45     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-14 11:42 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Fri, Nov 11, 2022 at 09:07:05PM +0800, Zhen Lei wrote:
> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
> interrupts on a specified CPU.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  include/linux/kernel_stat.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
> index 90e2fdc17d79ff8..898076e173a928a 100644
> --- a/include/linux/kernel_stat.h
> +++ b/include/linux/kernel_stat.h
> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
>         return kstat_cpu(cpu).softirqs[irq];
>  }
>  
> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
> +{
> +	int i;
> +	unsigned int sum = 0;
> +
> +	for (i = 0; i < NR_SOFTIRQS; i++)
> +		sum += kstat_softirqs_cpu(i, cpu);

This should return a u64 or at least an unsigned long so that high
numbers of CPUs don't overflow easily.

Thanks.

> +
> +	return sum;
> +}
> +
>  /*
>   * Number of interrupts per specific IRQ source, since bootup
>   */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-14 11:24   ` Frederic Weisbecker
@ 2022-11-14 12:32     ` Leizhen (ThunderTown)
  2022-11-14 12:46       ` Frederic Weisbecker
  0 siblings, 1 reply; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-14 12:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/14 19:24, Frederic Weisbecker wrote:
> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>> Because RCU CPU stall warnings are driven from the scheduling-clock
>> interrupt handler, a workload consisting of a very large number of
>> short-duration hardware interrupts can result in misleading stall-warning
>> messages.  On systems supporting only a single level of interrupts,
>> that is, where interrupts handlers cannot be interrupted, this can
>> produce misleading diagnostics.  The stack traces will show the
>> innocent-bystander interrupted task, not the interrupts that are
>> at the very least exacerbating the stall.
>>
>> This situation can be improved by displaying the number of interrupts
>> and the CPU time that they have consumed.  Diagnosing other types
>> of stalls can be eased by also providing the count of softirqs and
>> the CPU time that they consumed as well as the number of context
>> switches and the task-level CPU time consumed.
>>
>> Consider the following output given this change:
>>
>> rcu: INFO: rcu_preempt self-detected stall on CPU
>> rcu:     0-....: (1250 ticks this GP) <omitted>
>> rcu:          hardirqs   softirqs   csw/system
>> rcu:  number:      624         45            0
>> rcu: cputime:       69          1         2425   ==> 2500(ms)
>>
>> This output shows that the number of hard and soft interrupts is small,
>> there are no context switches, and the system takes up a lot of time. This
>> indicates that the current task is looping with preemption disabled.
>>
>> The impact on system performance is negligible because snapshot is
>> recorded only once for all continuous RCU stalls.
>>
>> This added debugging information is suppressed by default and can be
>> enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
>> by booting with rcupdate.rcu_cpu_stall_cputime=1.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>>  kernel/rcu/Kconfig.debug                      | 11 +++++++
>>  kernel/rcu/rcu.h                              |  1 +
>>  kernel/rcu/tree.c                             | 18 +++++++++++
>>  kernel/rcu/tree.h                             | 19 ++++++++++++
>>  kernel/rcu/tree_stall.h                       | 31 +++++++++++++++++++
>>  kernel/rcu/update.c                           |  2 ++
>>  7 files changed, 88 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 811b2e6d4672685..ee7d9d962591c5d 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -5084,6 +5084,12 @@
>>  			rcupdate.rcu_cpu_stall_timeout to be used (after
>>  			conversion from seconds to milliseconds).
>>  
>> +	rcupdate.rcu_cpu_stall_cputime= [KNL]
>> +			Provide statistics on the cputime and count of
>> +			interrupts and tasks during the sampling period. For
>> +			multiple continuous RCU stalls, all sampling periods
>> +			begin at half of the first RCU stall timeout.
>> +
>>  	rcupdate.rcu_expedited= [KNL]
>>  			Use expedited grace-period primitives, for
>>  			example, synchronize_rcu_expedited() instead
>> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
>> index 1b0c41d490f0588..025566a9ba44667 100644
>> --- a/kernel/rcu/Kconfig.debug
>> +++ b/kernel/rcu/Kconfig.debug
>> @@ -95,6 +95,17 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>>  	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
>>  	  seconds to milliseconds.
>>  
>> +config RCU_CPU_STALL_CPUTIME
>> +	bool "Provide additional RCU stall debug information"
>> +	depends on RCU_STALL_COMMON
>> +	default n
>> +	help
>> +	  Collect statistics during the sampling period, such as the number of
>> +	  (hard interrupts, soft interrupts, task switches) and the cputime of
>> +	  (hard interrupts, soft interrupts, kernel tasks) are added to the
>> +	  RCU stall report. For multiple continuous RCU stalls, all sampling
>> +	  periods begin at half of the first RCU stall timeout.
>> +
>>  config RCU_TRACE
>>  	bool "Enable tracing for RCU"
>>  	depends on DEBUG_KERNEL
>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
>> index 96122f203187f39..4844dec36bddb48 100644
>> --- a/kernel/rcu/rcu.h
>> +++ b/kernel/rcu/rcu.h
>> @@ -231,6 +231,7 @@ extern int rcu_cpu_stall_ftrace_dump;
>>  extern int rcu_cpu_stall_suppress;
>>  extern int rcu_cpu_stall_timeout;
>>  extern int rcu_exp_cpu_stall_timeout;
>> +extern int rcu_cpu_stall_cputime;
>>  int rcu_jiffies_till_stall_check(void);
>>  int rcu_exp_jiffies_till_stall_check(void);
>>  
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index ed93ddb8203d42c..3921aacfd421ba9 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
>>  		}
>> +
>> +		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
>> +			int cpu = rdp->cpu;
>> +			struct rcu_snap_record *rsrp;
>> +			struct kernel_cpustat *kcsp;
>> +
>> +			kcsp = &kcpustat_cpu(cpu);
>> +
>> +			rsrp = &rdp->snap_record;
>> +			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
>> +			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
>> +			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
>> +			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
>> +			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
> 
> Getting the sum of all CPU's IRQs, with even two iterations on all of them, look
> costly. So I have to ask: why is this information useful and why can't we deduce
> it from other CPUs stall reports?

Only the RCU stalled CPUs are recorded. Why all CPUs?

static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
{
	rcu_for_each_leaf_node(rnp) {
		if (rnp->qsmask == 0) {
			continue;
		}
		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
			if (f(rdp))

> 
> I'm also asking because this rcu_cpu_stall_cputime is likely to be very useful for
> distros, to the point that I expect it to be turned on by default as doing a
> snapshot of kcpustat fields is cheap. But doing that wide CPU snapshot is
> definetly going to be an unbearable overhead.

I purposely added a print test, only the RCU stalled CPU would be taken snapshots and
calculated differentials.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d1f0d857dc85df5..693e7c83bd17d1e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -872,6 +872,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
                        struct rcu_snap_record *rsrp;
                        struct kernel_cpustat *kcsp;

+                       printk("fixme: cpu=%d\n", smp_processor_id());
                        kcsp = &kcpustat_cpu(cpu);

                        rsrp = &rdp->snap_record;

> 
> Thanks.
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum()
  2022-11-14 11:42   ` Frederic Weisbecker
@ 2022-11-14 12:45     ` Leizhen (ThunderTown)
  2022-11-14 12:50       ` Frederic Weisbecker
  0 siblings, 1 reply; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-14 12:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/14 19:42, Frederic Weisbecker wrote:
> On Fri, Nov 11, 2022 at 09:07:05PM +0800, Zhen Lei wrote:
>> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
>> interrupts on a specified CPU.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  include/linux/kernel_stat.h | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
>> index 90e2fdc17d79ff8..898076e173a928a 100644
>> --- a/include/linux/kernel_stat.h
>> +++ b/include/linux/kernel_stat.h
>> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
>>         return kstat_cpu(cpu).softirqs[irq];
>>  }
>>  
>> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
>> +{
>> +	int i;
>> +	unsigned int sum = 0;
>> +
>> +	for (i = 0; i < NR_SOFTIRQS; i++)
>> +		sum += kstat_softirqs_cpu(i, cpu);
> 
> This should return a u64 or at least an unsigned long so that high
> numbers of CPUs don't overflow easily.

OK! Then I'll adjust the following types by the way.

struct kernel_stat {
        unsigned long irqs_sum;
        unsigned int softirqs[NR_SOFTIRQS];
};

static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)




> 
> Thanks.
> 
>> +
>> +	return sum;
>> +}
>> +
>>  /*
>>   * Number of interrupts per specific IRQ source, since bootup
>>   */
>> -- 
>> 2.25.1
>>
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-14 12:32     ` Leizhen (ThunderTown)
@ 2022-11-14 12:46       ` Frederic Weisbecker
  0 siblings, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-14 12:46 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Mon, Nov 14, 2022 at 08:32:19PM +0800, Leizhen (ThunderTown) wrote:
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index ed93ddb8203d42c..3921aacfd421ba9 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -866,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
> >>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
> >>  		}
> >> +
> >> +		if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
> >> +			int cpu = rdp->cpu;
> >> +			struct rcu_snap_record *rsrp;
> >> +			struct kernel_cpustat *kcsp;
> >> +
> >> +			kcsp = &kcpustat_cpu(cpu);
> >> +
> >> +			rsrp = &rdp->snap_record;
> >> +			rsrp->cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> >> +			rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> >> +			rsrp->cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> >> +			rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
> >> +			rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
> > 
> > Getting the sum of all CPU's IRQs, with even two iterations on all of them, look
> > costly. So I have to ask: why is this information useful and why can't we deduce
> > it from other CPUs stall reports?
> 
> Only the RCU stalled CPUs are recorded. Why all CPUs?

Bah, I misread kstat_cpu_softirqs_sum() kstat_cpu_irqs_sum() content. Sorry
about that, my brainfart... :-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum()
  2022-11-14 12:45     ` Leizhen (ThunderTown)
@ 2022-11-14 12:50       ` Frederic Weisbecker
  2022-11-14 14:26         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-14 12:50 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Mon, Nov 14, 2022 at 08:45:26PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/11/14 19:42, Frederic Weisbecker wrote:
> > On Fri, Nov 11, 2022 at 09:07:05PM +0800, Zhen Lei wrote:
> >> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
> >> interrupts on a specified CPU.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> ---
> >>  include/linux/kernel_stat.h | 11 +++++++++++
> >>  1 file changed, 11 insertions(+)
> >>
> >> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
> >> index 90e2fdc17d79ff8..898076e173a928a 100644
> >> --- a/include/linux/kernel_stat.h
> >> +++ b/include/linux/kernel_stat.h
> >> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
> >>         return kstat_cpu(cpu).softirqs[irq];
> >>  }
> >>  
> >> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
> >> +{
> >> +	int i;
> >> +	unsigned int sum = 0;
> >> +
> >> +	for (i = 0; i < NR_SOFTIRQS; i++)
> >> +		sum += kstat_softirqs_cpu(i, cpu);
> > 
> > This should return a u64 or at least an unsigned long so that high
> > numbers of CPUs don't overflow easily.
> 
> OK! Then I'll adjust the following types by the way.
> 
> struct kernel_stat {
>         unsigned long irqs_sum;
>         unsigned int softirqs[NR_SOFTIRQS];
> };
> 
> static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)

And I also misread kstat_softirqs_cpu() content, my bad. Although at some point
the sum of all softirqs within a single CPU might overflow in an int so I guess
that still applies...

Thanks.


> 
> 
> 
> 
> > 
> > Thanks.
> > 
> >> +
> >> +	return sum;
> >> +}
> >> +
> >>  /*
> >>   * Number of interrupts per specific IRQ source, since bootup
> >>   */
> >> -- 
> >> 2.25.1
> >>
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum()
  2022-11-14 12:50       ` Frederic Weisbecker
@ 2022-11-14 14:26         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-14 14:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/14 20:50, Frederic Weisbecker wrote:
> On Mon, Nov 14, 2022 at 08:45:26PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/11/14 19:42, Frederic Weisbecker wrote:
>>> On Fri, Nov 11, 2022 at 09:07:05PM +0800, Zhen Lei wrote:
>>>> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
>>>> interrupts on a specified CPU.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  include/linux/kernel_stat.h | 11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
>>>> index 90e2fdc17d79ff8..898076e173a928a 100644
>>>> --- a/include/linux/kernel_stat.h
>>>> +++ b/include/linux/kernel_stat.h
>>>> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
>>>>         return kstat_cpu(cpu).softirqs[irq];
>>>>  }
>>>>  
>>>> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
>>>> +{
>>>> +	int i;
>>>> +	unsigned int sum = 0;
>>>> +
>>>> +	for (i = 0; i < NR_SOFTIRQS; i++)
>>>> +		sum += kstat_softirqs_cpu(i, cpu);
>>>
>>> This should return a u64 or at least an unsigned long so that high
>>> numbers of CPUs don't overflow easily.
>>
>> OK! Then I'll adjust the following types by the way.
>>
>> struct kernel_stat {
>>         unsigned long irqs_sum;
>>         unsigned int softirqs[NR_SOFTIRQS];
>> };
>>
>> static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
> 
> And I also misread kstat_softirqs_cpu() content, my bad. Although at some point
> the sum of all softirqs within a single CPU might overflow in an int so I guess
> that still applies...

Yes, this was discussed before. In fact, I think unsigned int and unsigned
long are all right.

> 
> Thanks.
> 
> 
>>
>>
>>
>>
>>>
>>> Thanks.
>>>
>>>> +
>>>> +	return sum;
>>>> +}
>>>> +
>>>>  /*
>>>>   * Number of interrupts per specific IRQ source, since bootup
>>>>   */
>>>> -- 
>>>> 2.25.1
>>>>
>>> .
>>>
>>
>> -- 
>> Regards,
>>   Zhen Lei
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-11 13:07 ` [PATCH v7 4/6] rcu: Add RCU stall diagnosis information Zhen Lei
  2022-11-14 11:24   ` Frederic Weisbecker
@ 2022-11-16 22:39   ` Frederic Weisbecker
  2022-11-17  1:57     ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-16 22:39 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> @@ -262,6 +279,8 @@ struct rcu_data {
>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
> +					    /* the first RCU stall timeout */

This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME

> +static void print_cpu_stat_info(int cpu)
> +{
> +	struct rcu_snap_record rsr, *rsrp;
> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
> +
> +	if (!rcu_cpu_stall_cputime)
> +		return;
> +
> +	rsrp = &rdp->snap_record;
> +	if (rsrp->gp_seq != rdp->gp_seq)
> +		return;
> +
> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> +
> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
> +	pr_err("\t number: %8ld %10d %12lld\n",
> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));

jiffies_to_msecs() should be enough.

Thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
  2022-11-11 13:07 ` [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information Zhen Lei
@ 2022-11-16 22:55   ` Frederic Weisbecker
  2022-11-17  2:03     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-16 22:55 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Fri, Nov 11, 2022 at 09:07:08PM +0800, Zhen Lei wrote:
> +1. A CPU looping with interrupts disabled.::
> +
> +   rcu:          hardirqs   softirqs   csw/system
> +   rcu:  number:        0          0            0
> +65;6003;1c   rcu: cputime:        0          0            0   ==> 2500(ms)
> +
> +   Because interrupts have been disabled throughout the measurement
> +   interval, there are no interrupts and no context switches.
> +   Furthermore, because CPU time consumption was measured using interrupt
> +   handlers, the system CPU consumption is misleadingly measured as zero.
> +   This scenario will normally also have "(0 ticks this GP)" printed on
> +   this CPU's summary line.
> +
> +2. A CPU looping with bottom halves disabled.
> +
> +   This is similar to the previous example, but with non-zero number of
> +   and CPU time consumed by hard interrupts, along with non-zero CPU
> +   time consumed by in-kernel execution.::
> +
> +   rcu:          hardirqs   softirqs   csw/system
> +   rcu:  number:      624          0            0
> +   rcu: cputime:       49          0         2446   ==> 2500(ms)
> +
> +   The fact that there are zero softirqs gives a hint that these were
> +   disabled, perhaps via local_bh_disable().  It is of course possible
> +   that there were no softirqs, perhaps because all events that would
> +   result in softirq execution are confined to other CPUs.  In this case,
> +   the diagnosis should continue as shown in the next example.
> +
> +3. A CPU looping with preemption disabled.
> +
> +   Here, only the number of context switches is zero.::
> +
> +   rcu:          hardirqs   softirqs   csw/system
> +   rcu:  number:      624         45            0
> +   rcu: cputime:       69          1         2425   ==> 2500(ms)
> +
> +   This situation hints that the stalled CPU was looping with preemption
> +   disabled.
> +
> +4. No looping, but massive hard and soft interrupts.::
> +
> +   rcu:          hardirqs   softirqs   csw/system
> +   rcu:  number:       xx         xx            0
> +   rcu: cputime:       xx         xx            0   ==> 2500(ms)
> +
> +   Here, the number and CPU time of hard interrupts are all non-zero,
> +   but the number of context switches and the in-kernel CPU time consumed
> +   are zero. The number and cputime of soft interrupts will usually be
> +   non-zero, but could be zero, for example, if the CPU was spinning
> +   within a single hard interrupt handler.
> +
> +   If this type of RCU CPU stall warning can be reproduced, you can
> +   narrow it down by looking at /proc/interrupts or by writing code to
> +   trace each interrupt, for example, by referring to show_interrupts().

One last question I have. Usually all these informations can be deduced by
just looking at the stacktrace that comes along an RCU stall report. So on
which kind of situation the stacktrace is not enough?

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-16 22:39   ` Frederic Weisbecker
@ 2022-11-17  1:57     ` Leizhen (ThunderTown)
  2022-11-17 12:22       ` Frederic Weisbecker
  0 siblings, 1 reply; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-17  1:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/17 6:39, Frederic Weisbecker wrote:
> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>> @@ -262,6 +279,8 @@ struct rcu_data {
>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
>> +					    /* the first RCU stall timeout */
> 
> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME

This will not work for now because we also support boot option rcupdate.rcu_cpu_stall_cputime.

> 
>> +static void print_cpu_stat_info(int cpu)
>> +{
>> +	struct rcu_snap_record rsr, *rsrp;
>> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
>> +
>> +	if (!rcu_cpu_stall_cputime)
>> +		return;
>> +
>> +	rsrp = &rdp->snap_record;
>> +	if (rsrp->gp_seq != rdp->gp_seq)
>> +		return;
>> +
>> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
>> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
>> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
>> +
>> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
>> +	pr_err("\t number: %8ld %10d %12lld\n",
>> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
>> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
>> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
>> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
>> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
>> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
>> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
>> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));
> 
> jiffies_to_msecs() should be enough.

OK, thanks.

> 
> Thanks.
> 
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
  2022-11-16 22:55   ` Frederic Weisbecker
@ 2022-11-17  2:03     ` Leizhen (ThunderTown)
  2022-11-17 12:23       ` Frederic Weisbecker
  0 siblings, 1 reply; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-17  2:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/17 6:55, Frederic Weisbecker wrote:
> On Fri, Nov 11, 2022 at 09:07:08PM +0800, Zhen Lei wrote:
>> +1. A CPU looping with interrupts disabled.::
>> +
>> +   rcu:          hardirqs   softirqs   csw/system
>> +   rcu:  number:        0          0            0
>> +65;6003;1c   rcu: cputime:        0          0            0   ==> 2500(ms)
>> +
>> +   Because interrupts have been disabled throughout the measurement
>> +   interval, there are no interrupts and no context switches.
>> +   Furthermore, because CPU time consumption was measured using interrupt
>> +   handlers, the system CPU consumption is misleadingly measured as zero.
>> +   This scenario will normally also have "(0 ticks this GP)" printed on
>> +   this CPU's summary line.
>> +
>> +2. A CPU looping with bottom halves disabled.
>> +
>> +   This is similar to the previous example, but with non-zero number of
>> +   and CPU time consumed by hard interrupts, along with non-zero CPU
>> +   time consumed by in-kernel execution.::
>> +
>> +   rcu:          hardirqs   softirqs   csw/system
>> +   rcu:  number:      624          0            0
>> +   rcu: cputime:       49          0         2446   ==> 2500(ms)
>> +
>> +   The fact that there are zero softirqs gives a hint that these were
>> +   disabled, perhaps via local_bh_disable().  It is of course possible
>> +   that there were no softirqs, perhaps because all events that would
>> +   result in softirq execution are confined to other CPUs.  In this case,
>> +   the diagnosis should continue as shown in the next example.
>> +
>> +3. A CPU looping with preemption disabled.
>> +
>> +   Here, only the number of context switches is zero.::
>> +
>> +   rcu:          hardirqs   softirqs   csw/system
>> +   rcu:  number:      624         45            0
>> +   rcu: cputime:       69          1         2425   ==> 2500(ms)
>> +
>> +   This situation hints that the stalled CPU was looping with preemption
>> +   disabled.
>> +
>> +4. No looping, but massive hard and soft interrupts.::
>> +
>> +   rcu:          hardirqs   softirqs   csw/system
>> +   rcu:  number:       xx         xx            0
>> +   rcu: cputime:       xx         xx            0   ==> 2500(ms)
>> +
>> +   Here, the number and CPU time of hard interrupts are all non-zero,
>> +   but the number of context switches and the in-kernel CPU time consumed
>> +   are zero. The number and cputime of soft interrupts will usually be
>> +   non-zero, but could be zero, for example, if the CPU was spinning
>> +   within a single hard interrupt handler.
>> +
>> +   If this type of RCU CPU stall warning can be reproduced, you can
>> +   narrow it down by looking at /proc/interrupts or by writing code to
>> +   trace each interrupt, for example, by referring to show_interrupts().
> 
> One last question I have. Usually all these informations can be deduced by
> just looking at the stacktrace that comes along an RCU stall report. So on
> which kind of situation the stacktrace is not enough?

Interrupt storm.

> 
> Thanks.
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-17  1:57     ` Leizhen (ThunderTown)
@ 2022-11-17 12:22       ` Frederic Weisbecker
  2022-11-17 13:25         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-17 12:22 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/11/17 6:39, Frederic Weisbecker wrote:
> > On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> >> @@ -262,6 +279,8 @@ struct rcu_data {
> >>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
> >>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
> >>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> >> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
> >> +					    /* the first RCU stall timeout */
> > 
> > This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
> 
> This will not work for now because we also support boot option
> rcupdate.rcu_cpu_stall_cputime.

I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
no effect, right?

Thanks.

> 
> > 
> >> +static void print_cpu_stat_info(int cpu)
> >> +{
> >> +	struct rcu_snap_record rsr, *rsrp;
> >> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
> >> +
> >> +	if (!rcu_cpu_stall_cputime)
> >> +		return;
> >> +
> >> +	rsrp = &rdp->snap_record;
> >> +	if (rsrp->gp_seq != rdp->gp_seq)
> >> +		return;
> >> +
> >> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
> >> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
> >> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
> >> +
> >> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
> >> +	pr_err("\t number: %8ld %10d %12lld\n",
> >> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
> >> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
> >> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
> >> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> >> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
> >> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
> >> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
> >> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));
> > 
> > jiffies_to_msecs() should be enough.
> 
> OK, thanks.
> 
> > 
> > Thanks.
> > 
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
  2022-11-17  2:03     ` Leizhen (ThunderTown)
@ 2022-11-17 12:23       ` Frederic Weisbecker
  0 siblings, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-17 12:23 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Thu, Nov 17, 2022 at 10:03:17AM +0800, Leizhen (ThunderTown) wrote:
> On 2022/11/17 6:55, Frederic Weisbecker wrote:
> > On Fri, Nov 11, 2022 at 09:07:08PM +0800, Zhen Lei wrote:
> > One last question I have. Usually all these informations can be deduced by
> > just looking at the stacktrace that comes along an RCU stall report. So on
> > which kind of situation the stacktrace is not enough?
> 
> Interrupt storm.

Now that makes sense :)

Thanks.

> 
> > 
> > Thanks.
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-17 12:22       ` Frederic Weisbecker
@ 2022-11-17 13:25         ` Leizhen (ThunderTown)
  2022-11-17 14:26           ` Frederic Weisbecker
  0 siblings, 1 reply; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-17 13:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/17 20:22, Frederic Weisbecker wrote:
> On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/11/17 6:39, Frederic Weisbecker wrote:
>>> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>>>> @@ -262,6 +279,8 @@ struct rcu_data {
>>>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>>>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
>>>> +					    /* the first RCU stall timeout */
>>>
>>> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
>>
>> This will not work for now because we also support boot option
>> rcupdate.rcu_cpu_stall_cputime.
> 
> I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
> no effect, right?

No, rcupdate.rcu_cpu_stall_cputime override CONFIG_RCU_CPU_STALL_CPUTIME. Because
the default value of CONFIG_RCU_CPU_STALL_CPUTIME is n, so in most cases, we need
rcupdate.rcu_cpu_stall_cputime as the escape route.

If CONFIG_RCU_CPU_STALL_CPUTIME=y is default, your suggestion is more appropriate.

> 
> Thanks.
> 
>>
>>>
>>>> +static void print_cpu_stat_info(int cpu)
>>>> +{
>>>> +	struct rcu_snap_record rsr, *rsrp;
>>>> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>>>> +	struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
>>>> +
>>>> +	if (!rcu_cpu_stall_cputime)
>>>> +		return;
>>>> +
>>>> +	rsrp = &rdp->snap_record;
>>>> +	if (rsrp->gp_seq != rdp->gp_seq)
>>>> +		return;
>>>> +
>>>> +	rsr.cputime_irq     = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
>>>> +	rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
>>>> +	rsr.cputime_system  = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
>>>> +
>>>> +	pr_err("\t         hardirqs   softirqs   csw/system\n");
>>>> +	pr_err("\t number: %8ld %10d %12lld\n",
>>>> +		kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
>>>> +		kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
>>>> +		nr_context_switches_cpu(cpu) - rsrp->nr_csw);
>>>> +	pr_err("\tcputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
>>>> +		div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
>>>> +		div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
>>>> +		div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
>>>> +		jiffies64_to_msecs(jiffies - rsrp->jiffies));
>>>
>>> jiffies_to_msecs() should be enough.
>>
>> OK, thanks.
>>
>>>
>>> Thanks.
>>>
>>> .
>>>
>>
>> -- 
>> Regards,
>>   Zhen Lei
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-17 13:25         ` Leizhen (ThunderTown)
@ 2022-11-17 14:26           ` Frederic Weisbecker
  2022-11-18  2:03             ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2022-11-17 14:26 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott

On Thu, Nov 17, 2022 at 09:25:44PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/11/17 20:22, Frederic Weisbecker wrote:
> > On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2022/11/17 6:39, Frederic Weisbecker wrote:
> >>> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
> >>>> @@ -262,6 +279,8 @@ struct rcu_data {
> >>>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
> >>>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
> >>>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> >>>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
> >>>> +					    /* the first RCU stall timeout */
> >>>
> >>> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
> >>
> >> This will not work for now because we also support boot option
> >> rcupdate.rcu_cpu_stall_cputime.
> > 
> > I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
> > no effect, right?
> 
> No, rcupdate.rcu_cpu_stall_cputime override CONFIG_RCU_CPU_STALL_CPUTIME. Because
> the default value of CONFIG_RCU_CPU_STALL_CPUTIME is n, so in most cases, we need
> rcupdate.rcu_cpu_stall_cputime as the escape route.
> 
> If CONFIG_RCU_CPU_STALL_CPUTIME=y is default, your suggestion is more
> appropriate.

Oh ok I thought it was a support Kconfig switch.

Then please just mention that rcupdate.rcu_cpu_stall_cputime overrides
CONFIG_RCU_CPU_STALL_CPUTIME behaviour in the Kconfig help text.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 4/6] rcu: Add RCU stall diagnosis information
  2022-11-17 14:26           ` Frederic Weisbecker
@ 2022-11-18  2:03             ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2022-11-18  2:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel, Robert Elliott



On 2022/11/17 22:26, Frederic Weisbecker wrote:
> On Thu, Nov 17, 2022 at 09:25:44PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/11/17 20:22, Frederic Weisbecker wrote:
>>> On Thu, Nov 17, 2022 at 09:57:18AM +0800, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2022/11/17 6:39, Frederic Weisbecker wrote:
>>>>> On Fri, Nov 11, 2022 at 09:07:07PM +0800, Zhen Lei wrote:
>>>>>> @@ -262,6 +279,8 @@ struct rcu_data {
>>>>>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>>>>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>>>>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>>>>>> +	struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
>>>>>> +					    /* the first RCU stall timeout */
>>>>>
>>>>> This should be under #ifdef CONFIG_RCU_CPU_STALL_CPUTIME
>>>>
>>>> This will not work for now because we also support boot option
>>>> rcupdate.rcu_cpu_stall_cputime.
>>>
>>> I'm confused. If CONFIG_RCU_CPU_STALL_CPUTIME=n then rcupdate.rcu_cpu_stall_cputime has
>>> no effect, right?
>>
>> No, rcupdate.rcu_cpu_stall_cputime override CONFIG_RCU_CPU_STALL_CPUTIME. Because
>> the default value of CONFIG_RCU_CPU_STALL_CPUTIME is n, so in most cases, we need
>> rcupdate.rcu_cpu_stall_cputime as the escape route.
>>
>> If CONFIG_RCU_CPU_STALL_CPUTIME=y is default, your suggestion is more
>> appropriate.
> 
> Oh ok I thought it was a support Kconfig switch.
> 
> Then please just mention that rcupdate.rcu_cpu_stall_cputime overrides
> CONFIG_RCU_CPU_STALL_CPUTIME behaviour in the Kconfig help text.

Okay, I'll add the description.

> 
> Thanks.
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-11-18  2:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-11 13:07 [PATCH v7 0/6] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-11 13:07 ` [PATCH v7 1/6] genirq: Fix the return type of kstat_cpu_irqs_sum() Zhen Lei
2022-11-11 13:07 ` [PATCH v7 2/6] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
2022-11-14 11:42   ` Frederic Weisbecker
2022-11-14 12:45     ` Leizhen (ThunderTown)
2022-11-14 12:50       ` Frederic Weisbecker
2022-11-14 14:26         ` Leizhen (ThunderTown)
2022-11-11 13:07 ` [PATCH v7 3/6] sched: Add helper nr_context_switches_cpu() Zhen Lei
2022-11-11 13:07 ` [PATCH v7 4/6] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-14 11:24   ` Frederic Weisbecker
2022-11-14 12:32     ` Leizhen (ThunderTown)
2022-11-14 12:46       ` Frederic Weisbecker
2022-11-16 22:39   ` Frederic Weisbecker
2022-11-17  1:57     ` Leizhen (ThunderTown)
2022-11-17 12:22       ` Frederic Weisbecker
2022-11-17 13:25         ` Leizhen (ThunderTown)
2022-11-17 14:26           ` Frederic Weisbecker
2022-11-18  2:03             ` Leizhen (ThunderTown)
2022-11-11 13:07 ` [PATCH v7 5/6] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information Zhen Lei
2022-11-16 22:55   ` Frederic Weisbecker
2022-11-17  2:03     ` Leizhen (ThunderTown)
2022-11-17 12:23       ` Frederic Weisbecker
2022-11-11 13:07 ` [PATCH v7 6/6] rcu: Align the output of RCU stall Zhen Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).