All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] rcu: Add RCU stall diagnosis information
@ 2022-10-22 12:45 Zhen Lei
  2022-10-22 12:45 ` [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Zhen Lei @ 2022-10-22 12:45 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei

v1 --> v2:
1. Fixed a bug in the code. If the rcu stall is detected by another CPU,
   kcpustat_this_cpu cannot be used.
@@ -451,7 +451,7 @@ static void print_cpu_stat_info(int cpu)
        if (r->gp_seq != rdp->gp_seq)
                return;

-       cpustat = kcpustat_this_cpu->cpustat;
+       cpustat = kcpustat_cpu(cpu).cpustat;
2. Move the start point of statistics from rcu_stall_kick_kthreads() to
   rcu_implicit_dynticks_qs(), removing the dependency on irq_work.

v1:
In some extreme cases, such as the I/O pressure test, the CPU usage may
be 100%, causing RCU stall. In this case, the printed information about
current is not useful. Displays the number and usage of hard interrupts,
soft interrupts, and context switches that are generated within half of
the CPU stall timeout, can help us make a general judgment. In other
cases, we can preliminarily determine whether an infinite loop occurs
when local_irq, local_bh or preempt is disabled.

Zhen Lei (3):
  sched: Add helper kstat_cpu_softirqs_sum()
  sched: Add helper nr_context_switches_cpu()
  rcu: Add RCU stall diagnosis information

 include/linux/kernel_stat.h | 12 ++++++++++++
 kernel/rcu/tree.c           | 16 ++++++++++++++++
 kernel/rcu/tree.h           | 11 +++++++++++
 kernel/rcu/tree_stall.h     | 28 ++++++++++++++++++++++++++++
 kernel/sched/core.c         |  5 +++++
 5 files changed, 72 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum()
  2022-10-22 12:45 [PATCH v2 0/3] rcu: Add RCU stall diagnosis information Zhen Lei
@ 2022-10-22 12:45 ` Zhen Lei
  2022-10-27 19:04   ` Elliott, Robert (Servers)
  2022-10-22 12:45 ` [PATCH v2 2/3] sched: Add helper nr_context_switches_cpu() Zhen Lei
  2022-10-22 12:45 ` [PATCH v2 3/3] rcu: Add RCU stall diagnosis information Zhen Lei
  2 siblings, 1 reply; 12+ messages in thread
From: Zhen Lei @ 2022-10-22 12:45 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei

Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
interrupts on a specified CPU.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kernel_stat.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index ddb5a358fd829f4..61d427c1962bf1c 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
        return kstat_cpu(cpu).softirqs[irq];
 }
 
+static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
+{
+	int i;
+	unsigned int sum = 0;
+
+	for (i = 0; i < NR_SOFTIRQS; i++)
+		sum += kstat_softirqs_cpu(i, cpu);
+
+	return sum;
+}
+
 /*
  * Number of interrupts per specific IRQ source, since bootup
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] sched: Add helper nr_context_switches_cpu()
  2022-10-22 12:45 [PATCH v2 0/3] rcu: Add RCU stall diagnosis information Zhen Lei
  2022-10-22 12:45 ` [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
@ 2022-10-22 12:45 ` Zhen Lei
  2022-10-22 12:45 ` [PATCH v2 3/3] rcu: Add RCU stall diagnosis information Zhen Lei
  2 siblings, 0 replies; 12+ messages in thread
From: Zhen Lei @ 2022-10-22 12:45 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei

Returns the number of context switches on the specified CPU, which can be
used to diagnose rcu stall.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kernel_stat.h | 1 +
 kernel/sched/core.c         | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 61d427c1962bf1c..ac4f9302b559c9d 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -52,6 +52,7 @@ DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat);
 #define kstat_cpu(cpu) per_cpu(kstat, cpu)
 #define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu)
 
+extern unsigned long long nr_context_switches_cpu(int cpu);
 extern unsigned long long nr_context_switches(void);
 
 extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f419388c14ad95b..d020d3c1f0d6613 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5234,6 +5234,11 @@ bool single_task_running(void)
 }
 EXPORT_SYMBOL(single_task_running);
 
+unsigned long long nr_context_switches_cpu(int cpu)
+{
+	return cpu_rq(cpu)->nr_switches;
+}
+
 unsigned long long nr_context_switches(void)
 {
 	int i;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] rcu: Add RCU stall diagnosis information
  2022-10-22 12:45 [PATCH v2 0/3] rcu: Add RCU stall diagnosis information Zhen Lei
  2022-10-22 12:45 ` [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
  2022-10-22 12:45 ` [PATCH v2 2/3] sched: Add helper nr_context_switches_cpu() Zhen Lei
@ 2022-10-22 12:45 ` Zhen Lei
  2022-10-27 17:33   ` Paul E. McKenney
  2 siblings, 1 reply; 12+ messages in thread
From: Zhen Lei @ 2022-10-22 12:45 UTC (permalink / raw)
  To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel
  Cc: Zhen Lei

In some extreme cases, such as the I/O pressure test, the CPU usage may
be 100%, causing RCU stall. In this case, the printed information about
current is not useful. Displays the number and usage of hard interrupts,
soft interrupts, and context switches that are generated within half of
the CPU stall timeout, can help us make a general judgment. In other
cases, we can preliminarily determine whether an infinite loop occurs
when local_irq, local_bh or preempt is disabled.

For example:
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:     0-....: (1250 ticks this GP) <omitted>
rcu:          hardirqs   softirqs   csw/system
rcu:  number:      624         45            0
rcu: cputime:       69          1         2425   ==> 2500(ms)

The example above shows that the number of hard and soft interrupts is
small, there is zero context switching, and the system takes up a lot of
time. We can quickly conclude that the current task is infinitely looped
with preempt_disable().

The impact on system performance is negligible because snapshot is
recorded only one time after 1/2 CPU stall timeout.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/rcu/tree.c       | 16 ++++++++++++++++
 kernel/rcu/tree.h       | 11 +++++++++++
 kernel/rcu/tree_stall.h | 28 ++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bb8e72bc8151ef..56c49a3117e7a81 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -931,6 +931,22 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
 		}
+
+		if (rdp->snap_record.gp_seq != rdp->gp_seq) {
+			u64 *cpustat;
+			struct rcu_snap_record *r;
+
+			cpustat = kcpustat_cpu(rdp->cpu).cpustat;
+
+			r = &rdp->snap_record;
+			r->cputime_irq     = cpustat[CPUTIME_IRQ];
+			r->cputime_softirq = cpustat[CPUTIME_SOFTIRQ];
+			r->cputime_system  = cpustat[CPUTIME_SYSTEM];
+			r->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
+			r->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
+			r->nr_csw = nr_context_switches_cpu(rdp->cpu);
+			r->gp_seq = rdp->gp_seq;
+		}
 	}
 
 	return 0;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index d4a97e40ea9c3e2..fb3121d15cca6f8 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -158,6 +158,16 @@ union rcu_noqs {
 	u16 s; /* Set of bits, aggregate OR here. */
 };
 
+struct rcu_snap_record {
+	unsigned long	gp_seq;
+	u64		cputime_irq;
+	u64		cputime_softirq;
+	u64		cputime_system;
+	unsigned int	nr_hardirqs;
+	unsigned int	nr_softirqs;
+	unsigned long long nr_csw;
+};
+
 /* Per-CPU data for read-copy update. */
 struct rcu_data {
 	/* 1) quiescent-state and grace-period handling : */
@@ -262,6 +272,7 @@ struct rcu_data {
 	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
 	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
 	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
+	struct rcu_snap_record snap_record;
 
 	int cpu;
 };
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 5653560573e22d6..f8c9d0284d116a8 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -428,6 +428,32 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
 	return j > 2 * HZ;
 }
 
+static void print_cpu_stat_info(int cpu)
+{
+	u64 *cpustat;
+	unsigned long half_timeout;
+	struct rcu_snap_record *r;
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+	r = &rdp->snap_record;
+	if (r->gp_seq != rdp->gp_seq)
+		return;
+
+	cpustat = kcpustat_cpu(cpu).cpustat;
+	half_timeout = rcu_jiffies_till_stall_check() / 2;
+
+	pr_err("         hardirqs   softirqs   csw/system\n");
+	pr_err(" number: %8d %10d %12lld\n",
+		kstat_cpu_irqs_sum(cpu) - r->nr_hardirqs,
+		kstat_cpu_softirqs_sum(cpu) - r->nr_softirqs,
+		nr_context_switches_cpu(cpu) - r->nr_csw);
+	pr_err("cputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
+		div_u64(cpustat[CPUTIME_IRQ] - r->cputime_irq, NSEC_PER_MSEC),
+		div_u64(cpustat[CPUTIME_SOFTIRQ] - r->cputime_softirq, NSEC_PER_MSEC),
+		div_u64(cpustat[CPUTIME_SYSTEM] - r->cputime_system, NSEC_PER_MSEC),
+		jiffies64_to_msecs(half_timeout));
+}
+
 /*
  * Print out diagnostic information for the specified stalled CPU.
  *
@@ -484,6 +510,8 @@ static void print_cpu_stall_info(int cpu)
 	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
 	       rcuc_starved ? buf : "",
 	       falsepositive ? " (false positive?)" : "");
+
+	print_cpu_stat_info(cpu);
 }
 
 /* Complain about starvation of grace-period kthread.  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] rcu: Add RCU stall diagnosis information
  2022-10-22 12:45 ` [PATCH v2 3/3] rcu: Add RCU stall diagnosis information Zhen Lei
@ 2022-10-27 17:33   ` Paul E. McKenney
  2022-10-28  3:16     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-10-27 17:33 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel

On Sat, Oct 22, 2022 at 08:45:25PM +0800, Zhen Lei wrote:
> In some extreme cases, such as the I/O pressure test, the CPU usage may
> be 100%, causing RCU stall. In this case, the printed information about
> current is not useful. Displays the number and usage of hard interrupts,
> soft interrupts, and context switches that are generated within half of
> the CPU stall timeout, can help us make a general judgment. In other
> cases, we can preliminarily determine whether an infinite loop occurs
> when local_irq, local_bh or preempt is disabled.
> 
> For example:
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     0-....: (1250 ticks this GP) <omitted>
> rcu:          hardirqs   softirqs   csw/system
> rcu:  number:      624         45            0
> rcu: cputime:       69          1         2425   ==> 2500(ms)
> 
> The example above shows that the number of hard and soft interrupts is
> small, there is zero context switching, and the system takes up a lot of
> time. We can quickly conclude that the current task is infinitely looped
> with preempt_disable().
> 
> The impact on system performance is negligible because snapshot is
> recorded only one time after 1/2 CPU stall timeout.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  kernel/rcu/tree.c       | 16 ++++++++++++++++
>  kernel/rcu/tree.h       | 11 +++++++++++
>  kernel/rcu/tree_stall.h | 28 ++++++++++++++++++++++++++++
>  3 files changed, 55 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6bb8e72bc8151ef..56c49a3117e7a81 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -931,6 +931,22 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
>  		}
> +
> +		if (rdp->snap_record.gp_seq != rdp->gp_seq) {
> +			u64 *cpustat;
> +			struct rcu_snap_record *r;
> +
> +			cpustat = kcpustat_cpu(rdp->cpu).cpustat;
> +
> +			r = &rdp->snap_record;
> +			r->cputime_irq     = cpustat[CPUTIME_IRQ];
> +			r->cputime_softirq = cpustat[CPUTIME_SOFTIRQ];
> +			r->cputime_system  = cpustat[CPUTIME_SYSTEM];
> +			r->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
> +			r->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
> +			r->nr_csw = nr_context_switches_cpu(rdp->cpu);
> +			r->gp_seq = rdp->gp_seq;

This needs to be optional.  Yes, it is normally rarely executed, but
people who don't want the additional information should not pay the
price for it.

> +		}
>  	}
>  
>  	return 0;
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d4a97e40ea9c3e2..fb3121d15cca6f8 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -158,6 +158,16 @@ union rcu_noqs {
>  	u16 s; /* Set of bits, aggregate OR here. */
>  };
>  
> +struct rcu_snap_record {
> +	unsigned long	gp_seq;
> +	u64		cputime_irq;
> +	u64		cputime_softirq;
> +	u64		cputime_system;
> +	unsigned int	nr_hardirqs;
> +	unsigned int	nr_softirqs;
> +	unsigned long long nr_csw;
> +};

Please add a comment saying what this is and what it is used for.

> +
>  /* Per-CPU data for read-copy update. */
>  struct rcu_data {
>  	/* 1) quiescent-state and grace-period handling : */
> @@ -262,6 +272,7 @@ struct rcu_data {
>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> +	struct rcu_snap_record snap_record;

And here as well, please.

>  	int cpu;
>  };
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 5653560573e22d6..f8c9d0284d116a8 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -428,6 +428,32 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
>  	return j > 2 * HZ;
>  }
>  
> +static void print_cpu_stat_info(int cpu)
> +{
> +	u64 *cpustat;
> +	unsigned long half_timeout;
> +	struct rcu_snap_record *r;

Let's please follow convention and call it "rsrp" rather than just "r".

> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> +	r = &rdp->snap_record;
> +	if (r->gp_seq != rdp->gp_seq)
> +		return;
> +
> +	cpustat = kcpustat_cpu(cpu).cpustat;
> +	half_timeout = rcu_jiffies_till_stall_check() / 2;
> +
> +	pr_err("         hardirqs   softirqs   csw/system\n");
> +	pr_err(" number: %8d %10d %12lld\n",
> +		kstat_cpu_irqs_sum(cpu) - r->nr_hardirqs,
> +		kstat_cpu_softirqs_sum(cpu) - r->nr_softirqs,
> +		nr_context_switches_cpu(cpu) - r->nr_csw);
> +	pr_err("cputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> +		div_u64(cpustat[CPUTIME_IRQ] - r->cputime_irq, NSEC_PER_MSEC),
> +		div_u64(cpustat[CPUTIME_SOFTIRQ] - r->cputime_softirq, NSEC_PER_MSEC),
> +		div_u64(cpustat[CPUTIME_SYSTEM] - r->cputime_system, NSEC_PER_MSEC),
> +		jiffies64_to_msecs(half_timeout));
> +}
> +
>  /*
>   * Print out diagnostic information for the specified stalled CPU.
>   *
> @@ -484,6 +510,8 @@ static void print_cpu_stall_info(int cpu)
>  	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
>  	       rcuc_starved ? buf : "",
>  	       falsepositive ? " (false positive?)" : "");
> +
> +	print_cpu_stat_info(cpu);

Again, please make this conditional.  One way to do that is with a
Kconfig option.  Another is with a kernel boot parameter, as is done
wtih module_param() elsewhere in tree_stall.h.  Or if the parsing needs
to be fancy (it shouldn't) using kernel_param_ops as is done in tree.c.
Distros tend to like kernel boot parameters, while people dealing with
large numbers of devices tend to like Kconfig options.  Choose wisely.  ;-)

Please make this initially default-off.  If enough people enable it for
long enough, we can later switch it to default-on and maybe even later
to unconditional.  But let's start carefully.

							Thanx, Paul

>  }
>  
>  /* Complain about starvation of grace-period kthread.  */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum()
  2022-10-22 12:45 ` [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
@ 2022-10-27 19:04   ` Elliott, Robert (Servers)
  2022-10-28  2:38     ` Leizhen (ThunderTown)
  2022-10-28 17:57     ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Elliott, Robert (Servers) @ 2022-10-27 19:04 UTC (permalink / raw)
  To: Zhen Lei, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu,
	linux-kernel


> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
> interrupts on a specified CPU.
> 
> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
>         return kstat_cpu(cpu).softirqs[irq];
>  }
> 
> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
> +{
> +	int i;
> +	unsigned int sum = 0;
> +
> +	for (i = 0; i < NR_SOFTIRQS; i++)
> +		sum += kstat_softirqs_cpu(i, cpu);
> +
> +	return sum;
> +}

In the function upon which this is based:

irqs_sumstruct kernel_stat {
        unsigned long irqs_sum;
        unsigned int softirqs[NR_SOFTIRQS];
};

static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
{
        return kstat_cpu(cpu).irqs_sum;
}

kstat_cpu_irqs_sum returns an unsigned long as an unsigned int, which
could cause large values to be truncated. Should that return
unsigned long? The only existing caller is fs/proc/stat.c which
puts it into a u64:
        u64 sum = 0;
        ...
        sum             += kstat_cpu_irqs_sum(i);

The softirqs field is an unsigned int, so the new function doesn't have
this inconsistency.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum()
  2022-10-27 19:04   ` Elliott, Robert (Servers)
@ 2022-10-28  2:38     ` Leizhen (ThunderTown)
  2022-10-28 22:35       ` Paul E. McKenney
  2022-10-28 17:57     ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2022-10-28  2:38 UTC (permalink / raw)
  To: Elliott, Robert (Servers),
	Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel



On 2022/10/28 3:04, Elliott, Robert (Servers) wrote:
> 
>> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
>> interrupts on a specified CPU.
>>
>> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
>> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
>>         return kstat_cpu(cpu).softirqs[irq];
>>  }
>>
>> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
>> +{
>> +	int i;
>> +	unsigned int sum = 0;
>> +
>> +	for (i = 0; i < NR_SOFTIRQS; i++)
>> +		sum += kstat_softirqs_cpu(i, cpu);
>> +
>> +	return sum;
>> +}
> 
> In the function upon which this is based:
> 
> struct kernel_stat {
>         unsigned long irqs_sum;
>         unsigned int softirqs[NR_SOFTIRQS];
> };
> 
> static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
> {
>         return kstat_cpu(cpu).irqs_sum;
> }
> 
> kstat_cpu_irqs_sum returns an unsigned long as an unsigned int, which
> could cause large values to be truncated. Should that return
> unsigned long? The only existing caller is fs/proc/stat.c which

This should be a mistake on:
commit f2c66cd8eeddedb4 ("/proc/stat: scalability of irq num per cpu")

I'll correct it to "unsigned long" in the next version. Thanks.

> puts it into a u64:
>         u64 sum = 0;
>         ...
>         sum             += kstat_cpu_irqs_sum(i);
> 
> The softirqs field is an unsigned int, so the new function doesn't have
> this inconsistency.

OK.

To be honest, I did the math. CONFIG_HZ=250
2^32 / 250 / 3600 / 24 / 365 = 0.545 < 1 year

So, in theory, for those 32-bit processors, we should use "unsigned long long".
Of course, from a programming point of view, 64-bit consists of two 32-bits,
and there is an atomicity problem. I think that's probably why members of
struct kernel_stat don't use u64.

However, it seems that the type of member softirqs can currently be changed to
unsigned long. So, at least on a 64-bit processor, it won't have a count
overflow problem.

> 
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] rcu: Add RCU stall diagnosis information
  2022-10-27 17:33   ` Paul E. McKenney
@ 2022-10-28  3:16     ` Leizhen (ThunderTown)
  2022-10-28 22:55       ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2022-10-28  3:16 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel



On 2022/10/28 1:33, Paul E. McKenney wrote:
> On Sat, Oct 22, 2022 at 08:45:25PM +0800, Zhen Lei wrote:
>> In some extreme cases, such as the I/O pressure test, the CPU usage may
>> be 100%, causing RCU stall. In this case, the printed information about
>> current is not useful. Displays the number and usage of hard interrupts,
>> soft interrupts, and context switches that are generated within half of
>> the CPU stall timeout, can help us make a general judgment. In other
>> cases, we can preliminarily determine whether an infinite loop occurs
>> when local_irq, local_bh or preempt is disabled.
>>
>> For example:
>> rcu: INFO: rcu_preempt self-detected stall on CPU
>> rcu:     0-....: (1250 ticks this GP) <omitted>
>> rcu:          hardirqs   softirqs   csw/system
>> rcu:  number:      624         45            0
>> rcu: cputime:       69          1         2425   ==> 2500(ms)
>>
>> The example above shows that the number of hard and soft interrupts is
>> small, there is zero context switching, and the system takes up a lot of
>> time. We can quickly conclude that the current task is infinitely looped
>> with preempt_disable().
>>
>> The impact on system performance is negligible because snapshot is
>> recorded only one time after 1/2 CPU stall timeout.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  kernel/rcu/tree.c       | 16 ++++++++++++++++
>>  kernel/rcu/tree.h       | 11 +++++++++++
>>  kernel/rcu/tree_stall.h | 28 ++++++++++++++++++++++++++++
>>  3 files changed, 55 insertions(+)
>>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index 6bb8e72bc8151ef..56c49a3117e7a81 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -931,6 +931,22 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
>>  		}
>> +
>> +		if (rdp->snap_record.gp_seq != rdp->gp_seq) {
>> +			u64 *cpustat;
>> +			struct rcu_snap_record *r;
>> +
>> +			cpustat = kcpustat_cpu(rdp->cpu).cpustat;
>> +
>> +			r = &rdp->snap_record;
>> +			r->cputime_irq     = cpustat[CPUTIME_IRQ];
>> +			r->cputime_softirq = cpustat[CPUTIME_SOFTIRQ];
>> +			r->cputime_system  = cpustat[CPUTIME_SYSTEM];
>> +			r->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
>> +			r->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
>> +			r->nr_csw = nr_context_switches_cpu(rdp->cpu);
>> +			r->gp_seq = rdp->gp_seq;
> 
> This needs to be optional.  Yes, it is normally rarely executed, but
> people who don't want the additional information should not pay the
> price for it.
> 
>> +		}
>>  	}
>>  
>>  	return 0;
>> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
>> index d4a97e40ea9c3e2..fb3121d15cca6f8 100644
>> --- a/kernel/rcu/tree.h
>> +++ b/kernel/rcu/tree.h
>> @@ -158,6 +158,16 @@ union rcu_noqs {
>>  	u16 s; /* Set of bits, aggregate OR here. */
>>  };
>>  
>> +struct rcu_snap_record {
>> +	unsigned long	gp_seq;
>> +	u64		cputime_irq;
>> +	u64		cputime_softirq;
>> +	u64		cputime_system;
>> +	unsigned int	nr_hardirqs;
>> +	unsigned int	nr_softirqs;
>> +	unsigned long long nr_csw;
>> +};
> 
> Please add a comment saying what this is and what it is used for.

OK, I will do it.

> 
>> +
>>  /* Per-CPU data for read-copy update. */
>>  struct rcu_data {
>>  	/* 1) quiescent-state and grace-period handling : */
>> @@ -262,6 +272,7 @@ struct rcu_data {
>>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
>>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
>>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
>> +	struct rcu_snap_record snap_record;
> 
> And here as well, please.

OK

> 
>>  	int cpu;
>>  };
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index 5653560573e22d6..f8c9d0284d116a8 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -428,6 +428,32 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
>>  	return j > 2 * HZ;
>>  }
>>  
>> +static void print_cpu_stat_info(int cpu)
>> +{
>> +	u64 *cpustat;
>> +	unsigned long half_timeout;
>> +	struct rcu_snap_record *r;
> 
> Let's please follow convention and call it "rsrp" rather than just "r".

OK

> 
>> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>> +
>> +	r = &rdp->snap_record;
>> +	if (r->gp_seq != rdp->gp_seq)
>> +		return;
>> +
>> +	cpustat = kcpustat_cpu(cpu).cpustat;
>> +	half_timeout = rcu_jiffies_till_stall_check() / 2;
>> +
>> +	pr_err("         hardirqs   softirqs   csw/system\n");
>> +	pr_err(" number: %8d %10d %12lld\n",
>> +		kstat_cpu_irqs_sum(cpu) - r->nr_hardirqs,
>> +		kstat_cpu_softirqs_sum(cpu) - r->nr_softirqs,
>> +		nr_context_switches_cpu(cpu) - r->nr_csw);
>> +	pr_err("cputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
>> +		div_u64(cpustat[CPUTIME_IRQ] - r->cputime_irq, NSEC_PER_MSEC),
>> +		div_u64(cpustat[CPUTIME_SOFTIRQ] - r->cputime_softirq, NSEC_PER_MSEC),
>> +		div_u64(cpustat[CPUTIME_SYSTEM] - r->cputime_system, NSEC_PER_MSEC),
>> +		jiffies64_to_msecs(half_timeout));
>> +}
>> +
>>  /*
>>   * Print out diagnostic information for the specified stalled CPU.
>>   *
>> @@ -484,6 +510,8 @@ static void print_cpu_stall_info(int cpu)
>>  	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
>>  	       rcuc_starved ? buf : "",
>>  	       falsepositive ? " (false positive?)" : "");
>> +
>> +	print_cpu_stat_info(cpu);
> 
> Again, please make this conditional.  One way to do that is with a
> Kconfig option.  Another is with a kernel boot parameter, as is done
> wtih module_param() elsewhere in tree_stall.h.  Or if the parsing needs
> to be fancy (it shouldn't) using kernel_param_ops as is done in tree.c.
> Distros tend to like kernel boot parameters, while people dealing with
> large numbers of devices tend to like Kconfig options.  Choose wisely.  ;-)

OK. I will add both Kconfig option and boot parameter, let the user
choose freely.

> 
> Please make this initially default-off.  If enough people enable it for
> long enough, we can later switch it to default-on and maybe even later
> to unconditional.  But let's start carefully.

Right. You're so far-sighted.

> 
> 							Thanx, Paul
> 
>>  }
>>  
>>  /* Complain about starvation of grace-period kthread.  */
>> -- 
>> 2.25.1
>>
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum()
  2022-10-27 19:04   ` Elliott, Robert (Servers)
  2022-10-28  2:38     ` Leizhen (ThunderTown)
@ 2022-10-28 17:57     ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-10-28 17:57 UTC (permalink / raw)
  To: Elliott, Robert (Servers)
  Cc: Zhen Lei, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel

On Thu, Oct 27, 2022 at 07:04:53PM +0000, Elliott, Robert (Servers) wrote:
> 
> > Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
> > interrupts on a specified CPU.
> > 
> > diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
> > @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
> >         return kstat_cpu(cpu).softirqs[irq];
> >  }
> > 
> > +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
> > +{
> > +	int i;
> > +	unsigned int sum = 0;
> > +
> > +	for (i = 0; i < NR_SOFTIRQS; i++)
> > +		sum += kstat_softirqs_cpu(i, cpu);
> > +
> > +	return sum;
> > +}
> 
> In the function upon which this is based:
> 
> irqs_sumstruct kernel_stat {
>         unsigned long irqs_sum;
>         unsigned int softirqs[NR_SOFTIRQS];
> };
> 
> static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
> {
>         return kstat_cpu(cpu).irqs_sum;
> }
> 
> kstat_cpu_irqs_sum returns an unsigned long as an unsigned int, which
> could cause large values to be truncated. Should that return
> unsigned long? The only existing caller is fs/proc/stat.c which
> puts it into a u64:
>         u64 sum = 0;
>         ...
>         sum             += kstat_cpu_irqs_sum(i);
> 
> The softirqs field is an unsigned int, so the new function doesn't have
> this inconsistency.

Good point!

Zhen Lei, thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum()
  2022-10-28  2:38     ` Leizhen (ThunderTown)
@ 2022-10-28 22:35       ` Paul E. McKenney
  2022-10-29  9:51         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-10-28 22:35 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Elliott, Robert (Servers),
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel

On Fri, Oct 28, 2022 at 10:38:15AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/10/28 3:04, Elliott, Robert (Servers) wrote:
> > 
> >> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
> >> interrupts on a specified CPU.
> >>
> >> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
> >> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
> >>         return kstat_cpu(cpu).softirqs[irq];
> >>  }
> >>
> >> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
> >> +{
> >> +	int i;
> >> +	unsigned int sum = 0;
> >> +
> >> +	for (i = 0; i < NR_SOFTIRQS; i++)
> >> +		sum += kstat_softirqs_cpu(i, cpu);
> >> +
> >> +	return sum;
> >> +}
> > 
> > In the function upon which this is based:
> > 
> > struct kernel_stat {
> >         unsigned long irqs_sum;
> >         unsigned int softirqs[NR_SOFTIRQS];
> > };
> > 
> > static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
> > {
> >         return kstat_cpu(cpu).irqs_sum;
> > }
> > 
> > kstat_cpu_irqs_sum returns an unsigned long as an unsigned int, which
> > could cause large values to be truncated. Should that return
> > unsigned long? The only existing caller is fs/proc/stat.c which
> 
> This should be a mistake on:
> commit f2c66cd8eeddedb4 ("/proc/stat: scalability of irq num per cpu")
> 
> I'll correct it to "unsigned long" in the next version. Thanks.
> 
> > puts it into a u64:
> >         u64 sum = 0;
> >         ...
> >         sum             += kstat_cpu_irqs_sum(i);
> > 
> > The softirqs field is an unsigned int, so the new function doesn't have
> > this inconsistency.
> 
> OK.
> 
> To be honest, I did the math. CONFIG_HZ=250
> 2^32 / 250 / 3600 / 24 / 365 = 0.545 < 1 year

For this to be a problem, our RCU CPU stall warning would have to be
for a months-long grace period, even on systems with HZ=1000.  In almost
all cases, the system would have OOMed long before then.

> So, in theory, for those 32-bit processors, we should use "unsigned long long".
> Of course, from a programming point of view, 64-bit consists of two 32-bits,
> and there is an atomicity problem. I think that's probably why members of
> struct kernel_stat don't use u64.
> 
> However, it seems that the type of member softirqs can currently be changed to
> unsigned long. So, at least on a 64-bit processor, it won't have a count
> overflow problem.

An unsigned long should suffice.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] rcu: Add RCU stall diagnosis information
  2022-10-28  3:16     ` Leizhen (ThunderTown)
@ 2022-10-28 22:55       ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-10-28 22:55 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel

On Fri, Oct 28, 2022 at 11:16:55AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/10/28 1:33, Paul E. McKenney wrote:
> > On Sat, Oct 22, 2022 at 08:45:25PM +0800, Zhen Lei wrote:
> >> In some extreme cases, such as the I/O pressure test, the CPU usage may
> >> be 100%, causing RCU stall. In this case, the printed information about
> >> current is not useful. Displays the number and usage of hard interrupts,
> >> soft interrupts, and context switches that are generated within half of
> >> the CPU stall timeout, can help us make a general judgment. In other
> >> cases, we can preliminarily determine whether an infinite loop occurs
> >> when local_irq, local_bh or preempt is disabled.
> >>
> >> For example:
> >> rcu: INFO: rcu_preempt self-detected stall on CPU
> >> rcu:     0-....: (1250 ticks this GP) <omitted>
> >> rcu:          hardirqs   softirqs   csw/system
> >> rcu:  number:      624         45            0
> >> rcu: cputime:       69          1         2425   ==> 2500(ms)
> >>
> >> The example above shows that the number of hard and soft interrupts is
> >> small, there is zero context switching, and the system takes up a lot of
> >> time. We can quickly conclude that the current task is infinitely looped
> >> with preempt_disable().
> >>
> >> The impact on system performance is negligible because snapshot is
> >> recorded only one time after 1/2 CPU stall timeout.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> ---
> >>  kernel/rcu/tree.c       | 16 ++++++++++++++++
> >>  kernel/rcu/tree.h       | 11 +++++++++++
> >>  kernel/rcu/tree_stall.h | 28 ++++++++++++++++++++++++++++
> >>  3 files changed, 55 insertions(+)
> >>
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index 6bb8e72bc8151ef..56c49a3117e7a81 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -931,6 +931,22 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
> >>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
> >>  		}
> >> +
> >> +		if (rdp->snap_record.gp_seq != rdp->gp_seq) {
> >> +			u64 *cpustat;
> >> +			struct rcu_snap_record *r;
> >> +
> >> +			cpustat = kcpustat_cpu(rdp->cpu).cpustat;
> >> +
> >> +			r = &rdp->snap_record;
> >> +			r->cputime_irq     = cpustat[CPUTIME_IRQ];
> >> +			r->cputime_softirq = cpustat[CPUTIME_SOFTIRQ];
> >> +			r->cputime_system  = cpustat[CPUTIME_SYSTEM];
> >> +			r->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
> >> +			r->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
> >> +			r->nr_csw = nr_context_switches_cpu(rdp->cpu);
> >> +			r->gp_seq = rdp->gp_seq;
> > 
> > This needs to be optional.  Yes, it is normally rarely executed, but
> > people who don't want the additional information should not pay the
> > price for it.
> > 
> >> +		}
> >>  	}
> >>  
> >>  	return 0;
> >> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> >> index d4a97e40ea9c3e2..fb3121d15cca6f8 100644
> >> --- a/kernel/rcu/tree.h
> >> +++ b/kernel/rcu/tree.h
> >> @@ -158,6 +158,16 @@ union rcu_noqs {
> >>  	u16 s; /* Set of bits, aggregate OR here. */
> >>  };
> >>  
> >> +struct rcu_snap_record {
> >> +	unsigned long	gp_seq;
> >> +	u64		cputime_irq;
> >> +	u64		cputime_softirq;
> >> +	u64		cputime_system;
> >> +	unsigned int	nr_hardirqs;
> >> +	unsigned int	nr_softirqs;
> >> +	unsigned long long nr_csw;
> >> +};
> > 
> > Please add a comment saying what this is and what it is used for.
> 
> OK, I will do it.
> 
> > 
> >> +
> >>  /* Per-CPU data for read-copy update. */
> >>  struct rcu_data {
> >>  	/* 1) quiescent-state and grace-period handling : */
> >> @@ -262,6 +272,7 @@ struct rcu_data {
> >>  	short rcu_onl_gp_flags;		/* ->gp_flags at last online. */
> >>  	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
> >>  	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
> >> +	struct rcu_snap_record snap_record;
> > 
> > And here as well, please.
> 
> OK
> 
> > 
> >>  	int cpu;
> >>  };
> >> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> >> index 5653560573e22d6..f8c9d0284d116a8 100644
> >> --- a/kernel/rcu/tree_stall.h
> >> +++ b/kernel/rcu/tree_stall.h
> >> @@ -428,6 +428,32 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
> >>  	return j > 2 * HZ;
> >>  }
> >>  
> >> +static void print_cpu_stat_info(int cpu)
> >> +{
> >> +	u64 *cpustat;
> >> +	unsigned long half_timeout;
> >> +	struct rcu_snap_record *r;
> > 
> > Let's please follow convention and call it "rsrp" rather than just "r".
> 
> OK
> 
> > 
> >> +	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >> +
> >> +	r = &rdp->snap_record;
> >> +	if (r->gp_seq != rdp->gp_seq)
> >> +		return;
> >> +
> >> +	cpustat = kcpustat_cpu(cpu).cpustat;
> >> +	half_timeout = rcu_jiffies_till_stall_check() / 2;
> >> +
> >> +	pr_err("         hardirqs   softirqs   csw/system\n");
> >> +	pr_err(" number: %8d %10d %12lld\n",
> >> +		kstat_cpu_irqs_sum(cpu) - r->nr_hardirqs,
> >> +		kstat_cpu_softirqs_sum(cpu) - r->nr_softirqs,
> >> +		nr_context_switches_cpu(cpu) - r->nr_csw);
> >> +	pr_err("cputime: %8lld %10lld %12lld   ==> %lld(ms)\n",
> >> +		div_u64(cpustat[CPUTIME_IRQ] - r->cputime_irq, NSEC_PER_MSEC),
> >> +		div_u64(cpustat[CPUTIME_SOFTIRQ] - r->cputime_softirq, NSEC_PER_MSEC),
> >> +		div_u64(cpustat[CPUTIME_SYSTEM] - r->cputime_system, NSEC_PER_MSEC),
> >> +		jiffies64_to_msecs(half_timeout));
> >> +}
> >> +
> >>  /*
> >>   * Print out diagnostic information for the specified stalled CPU.
> >>   *
> >> @@ -484,6 +510,8 @@ static void print_cpu_stall_info(int cpu)
> >>  	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
> >>  	       rcuc_starved ? buf : "",
> >>  	       falsepositive ? " (false positive?)" : "");
> >> +
> >> +	print_cpu_stat_info(cpu);
> > 
> > Again, please make this conditional.  One way to do that is with a
> > Kconfig option.  Another is with a kernel boot parameter, as is done
> > wtih module_param() elsewhere in tree_stall.h.  Or if the parsing needs
> > to be fancy (it shouldn't) using kernel_param_ops as is done in tree.c.
> > Distros tend to like kernel boot parameters, while people dealing with
> > large numbers of devices tend to like Kconfig options.  Choose wisely.  ;-)
> 
> OK. I will add both Kconfig option and boot parameter, let the user
> choose freely.

Works for me!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum()
  2022-10-28 22:35       ` Paul E. McKenney
@ 2022-10-29  9:51         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 12+ messages in thread
From: Leizhen (ThunderTown) @ 2022-10-29  9:51 UTC (permalink / raw)
  To: paulmck
  Cc: Elliott, Robert (Servers),
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	rcu, linux-kernel



On 2022/10/29 6:35, Paul E. McKenney wrote:
> On Fri, Oct 28, 2022 at 10:38:15AM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/10/28 3:04, Elliott, Robert (Servers) wrote:
>>>
>>>> Similar to kstat_cpu_irqs_sum(), it counts the sum of all software
>>>> interrupts on a specified CPU.
>>>>
>>>> diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
>>>> @@ -67,6 +67,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
>>>>         return kstat_cpu(cpu).softirqs[irq];
>>>>  }
>>>>
>>>> +static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
>>>> +{
>>>> +	int i;
>>>> +	unsigned int sum = 0;
>>>> +
>>>> +	for (i = 0; i < NR_SOFTIRQS; i++)
>>>> +		sum += kstat_softirqs_cpu(i, cpu);
>>>> +
>>>> +	return sum;
>>>> +}
>>>
>>> In the function upon which this is based:
>>>
>>> struct kernel_stat {
>>>         unsigned long irqs_sum;
>>>         unsigned int softirqs[NR_SOFTIRQS];
>>> };
>>>
>>> static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
>>> {
>>>         return kstat_cpu(cpu).irqs_sum;
>>> }
>>>
>>> kstat_cpu_irqs_sum returns an unsigned long as an unsigned int, which
>>> could cause large values to be truncated. Should that return
>>> unsigned long? The only existing caller is fs/proc/stat.c which
>>
>> This should be a mistake on:
>> commit f2c66cd8eeddedb4 ("/proc/stat: scalability of irq num per cpu")
>>
>> I'll correct it to "unsigned long" in the next version. Thanks.
>>
>>> puts it into a u64:
>>>         u64 sum = 0;
>>>         ...
>>>         sum             += kstat_cpu_irqs_sum(i);
>>>
>>> The softirqs field is an unsigned int, so the new function doesn't have
>>> this inconsistency.
>>
>> OK.
>>
>> To be honest, I did the math. CONFIG_HZ=250
>> 2^32 / 250 / 3600 / 24 / 365 = 0.545 < 1 year
> 
> For this to be a problem, our RCU CPU stall warning would have to be
> for a months-long grace period, even on systems with HZ=1000.  In almost
> all cases, the system would have OOMed long before then.

Yes.

> 
>> So, in theory, for those 32-bit processors, we should use "unsigned long long".
>> Of course, from a programming point of view, 64-bit consists of two 32-bits,
>> and there is an atomicity problem. I think that's probably why members of
>> struct kernel_stat don't use u64.
>>
>> However, it seems that the type of member softirqs can currently be changed to
>> unsigned long. So, at least on a 64-bit processor, it won't have a count
>> overflow problem.
> 
> An unsigned long should suffice.  ;-)

include/linux/irqdesc.h:58:     unsigned int __percpu   *kstat_irqs;

I found another place where the hard interrupt count was stored with type "unsigned int",
it's used by "/proc/interrupts". Maybe the user-mode program gets it periodically and
accumulates it to a 64-bit value. Of course, maybe half a year later, no one cares about
the specific interrupts count anymore.

So, apart from what Elliott mentioned, I won't change the rest.

> 
> 							Thanx, Paul
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-10-29  9:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-22 12:45 [PATCH v2 0/3] rcu: Add RCU stall diagnosis information Zhen Lei
2022-10-22 12:45 ` [PATCH v2 1/3] sched: Add helper kstat_cpu_softirqs_sum() Zhen Lei
2022-10-27 19:04   ` Elliott, Robert (Servers)
2022-10-28  2:38     ` Leizhen (ThunderTown)
2022-10-28 22:35       ` Paul E. McKenney
2022-10-29  9:51         ` Leizhen (ThunderTown)
2022-10-28 17:57     ` Paul E. McKenney
2022-10-22 12:45 ` [PATCH v2 2/3] sched: Add helper nr_context_switches_cpu() Zhen Lei
2022-10-22 12:45 ` [PATCH v2 3/3] rcu: Add RCU stall diagnosis information Zhen Lei
2022-10-27 17:33   ` Paul E. McKenney
2022-10-28  3:16     ` Leizhen (ThunderTown)
2022-10-28 22:55       ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.