linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* /proc/stat interrupt counter wrap-around
@ 2021-09-10  8:53 Alexei Lozovsky
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-10  8:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

A monitoring dashboard caught my attention when it displayed weird
spikes in computed interrupt rates. A machine with constant network
load shows about ~250k interrupts per second, then every 4-5 hours
there's a one-off spike to 11 billions.

Turns out, if you plot the interrupt counter, you get a graph like this:

                                     ###.                           
                                  ###   | ####.              #######
                  #####.     #####      ##    |     #########       
             #####     |   ##                 ######                
            #          ####                                         
        ####                                                        
       #                                                            
    ###                                                             

While monitoring tools are typically used to handling counter
wrap-arounds, they may not be ready to handle dips like this.


What is the impact
------------------

Not much, actually.

The counters always decrement by exactly 2^32 (which is suggestive),
so if you mask out the high bits of the counter and consider only
the low 32 bits, then the value sequence actually make sense,
given an appropriate sampling rate.

However, if you don't mask out the value and assume it to be accurate --
well, that assumption is incorrect. Interrupt sums might look correct
and contain some big number, but it could be arbitrarily distant from
the actual number of interrupts serviced since the boot time.

This concerns only the total value of "intr" and "softirq" rows:

    intr    14390913189 32 11 0 0 238 0 0 0 0 0 0 0 88 0 [...]
    softirq 14625063745 0 596000256 300149 272619841 0 0 [...]
            ^^^^^^^^^^^
            these ones


Why this happens
----------------

The reason for such behaviour is that the "total" interrupt counters
presented by /proc/stat are actually computed by adding up per-interrupt
per-CPU counters. Most of these are "unsigned int", while some of them
are "unsigned long", and the accumulator is "u64". What a mess...

Individual counters are monotonically increasing (modulo wrapping),
however if you add multiple values with different bit widths then
the sum is *not* guaranteed to be monotonically increasing.


What can be done
----------------

 1. Do nothing.

    Userspace can trivially compensate for this 'curious' behavior
    by masking out the high bits, observing only the low
    sizeof(unsigned) part, and taking care to handle wrap-arounds.

    This maintains status quo, but the "issue" of interrupt sums
    not being quite accurate remains.


 2. Change the presentation type to the lowest denominator.

    That is, unsigned int. Make the kernel mask out not-quite-accurate
    bits from the value it reports. Keep it that way until every
    underlying counter type is changed to something wider.

    The benefit here is that users that *are* ready to handle proper
    wrap-arounds will be able to handle them automagically without
    undocumented hacks (see option 1).

    This changes the observed value and will cause "unexpected"
    wrap-arounds to happen earlier in some use-cases, which might
    upset users that are not ready to handle them, or don't want
    to poll /proc/stat more frequently.

    It's debatable what's better: a lower-width value that might
    need to be polled more often, or a wider-width value that is
    not completely accurate.


 3. Change the interrupt counter types to be wider.

    A different take on the issue: instead of narrowing the presentation
    from faux-u64 to unsigned it, widen the interrupt counters from
    unsigned int to... something else:

    - u64             interrupt counters are 64-bit everywhere, period

    - unsigned long   interrupt counters are 64-bit if the platform
                      thinks that "long" is longer than "int"

    Whatever the type is used, it must be the same for all interrupt
    counters across the kernel as well as the type used to compute
    and display the sum of all these counters by /proc/stat.

    The advantage here is that 64-bit counters will be probably enough
    for *anything* to not overflow anytime soon before the heat death
    of the universe, thus making the wrap-around problem irrelevant.

    The disadvantage here is that some hardware counters are 32-bit,
    and you can't make them wider. Some platforms also don't have
    proper atomic support for 64-bit integers, making wider counters
    problematic to implement efficiently.


So what do we do?
-----------------

I suggest to wrap interrupt counter sum at "unsigned int", the same
type used for (most) individual counters. That makes for the most
predictable behavior.

I have a patch set cooking that does this.

Will this be of any interest? Or do you think changing the behavior
of /proc/stat will cause more trouble than merit?


Prior discussion
----------------

This question is by no means new, it has been discussed several times:

2019 - genirq, proc: Speedup /proc/stat interrupt statistics

    The issue of overflow and wrap-around has been touched upon,
    suggesting that userspace should just deal with it. The issue of
    using u64 for the sum has been brought up too, but it did not
    go anywhere.

https://lore.kernel.org/all/20190208143255.9dec696b15f03bf00f4c60c2@linux-foundation.org/
https://lore.kernel.org/all/3460540b50784dca813a57ddbbd41656@AcuMS.aculab.com/


2014 - Why do we still have 32 bit counters? Interrupt counters overflow within 50 days

    Discussion on whether it's appropriate to bump counter width to
    64 bits in order to avoid the overflow issues entirely.

https://lore.kernel.org/lkml/alpine.DEB.2.11.1410030435260.8324@gentwo.org/


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq"
  2021-09-10  8:53 /proc/stat interrupt counter wrap-around Alexei Lozovsky
@ 2021-09-11  3:48 ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 1/7] genirq: Use unsigned int for irqs_sum Alexei Lozovsky
                     ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Here's a patch set that makes /proc/stat report total interrupt counts
as monotonically increasing values, just like individual counters for
interrupt types and CPUs are.

This is as if the sum was a shared counter that all CPUs increment
atomically together with their individual counters, with the sum
correctly and expectedly wrapping around to zero once it reaches
UINT_MAX.

I've also added some documentation bits to codify this behavior and make
it explicit that wrap-arounds must be expected and handled if userspace
wants to maintain accurate total interrupt count for whatever reasons.

Alexei Lozovsky (7):
  genirq: Use unsigned int for irqs_sum
  powerpc/irq: arch_irq_stat_cpu() returns unsigned int
  x86/irq: arch_irq_stat_cpu() returns unsigned int
  x86/irq: arch_irq_stat() returns unsigned int
  proc/stat: Use unsigned int for "intr" sum
  proc/stat: Use unsigned int for "softirq" sum
  docs: proc.rst: stat: Note the interrupt counter wrap-around

 Documentation/filesystems/proc.rst | 7 +++++++
 arch/powerpc/include/asm/hardirq.h | 2 +-
 arch/powerpc/kernel/irq.c          | 4 ++--
 arch/x86/include/asm/hardirq.h     | 4 ++--
 arch/x86/kernel/irq.c              | 8 ++++----
 fs/proc/stat.c                     | 8 ++++----
 include/linux/kernel_stat.h        | 2 +-
 7 files changed, 21 insertions(+), 14 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/7] genirq: Use unsigned int for irqs_sum
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 2/7] powerpc/irq: arch_irq_stat_cpu() returns unsigned int Alexei Lozovsky
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

It's read as unsigned int via kstat_cpu_irqs_sum() for /proc/stat.
There is no point in having this counter wider than necessary.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 include/linux/kernel_stat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 44ae1a7eb9e3..72818fb39ae8 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -36,7 +36,7 @@ struct kernel_cpustat {
 };
 
 struct kernel_stat {
-	unsigned long irqs_sum;
+	unsigned int irqs_sum;
 	unsigned int softirqs[NR_SOFTIRQS];
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/7] powerpc/irq: arch_irq_stat_cpu() returns unsigned int
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 1/7] genirq: Use unsigned int for irqs_sum Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 3/7] x86/irq: " Alexei Lozovsky
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

The interrupt counters this function sums up are all unsigned int
(see irq_cpustat_t). If the sum overflows, so be it: you should
monitor the counter and take note when it wraps around.

Summing up unsigned int values into u64 does not "handle" overflows,
but if any of the individual counters overflows then the computed
sum is inaccurate.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/powerpc/include/asm/hardirq.h | 2 +-
 arch/powerpc/kernel/irq.c          | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
index f133b5930ae1..5248adcb50b4 100644
--- a/arch/powerpc/include/asm/hardirq.h
+++ b/arch/powerpc/include/asm/hardirq.h
@@ -32,7 +32,7 @@ static inline void ack_bad_irq(unsigned int irq)
 	printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
 }
 
-extern u64 arch_irq_stat_cpu(unsigned int cpu);
+extern unsigned int arch_irq_stat_cpu(unsigned int cpu);
 #define arch_irq_stat_cpu	arch_irq_stat_cpu
 
 #endif /* _ASM_POWERPC_HARDIRQ_H */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 551b653228c4..a100c967892e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -645,9 +645,9 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 /*
  * /proc/stat helpers
  */
-u64 arch_irq_stat_cpu(unsigned int cpu)
+unsigned int arch_irq_stat_cpu(unsigned int cpu)
 {
-	u64 sum = per_cpu(irq_stat, cpu).timer_irqs_event;
+	unsigned int sum = per_cpu(irq_stat, cpu).timer_irqs_event;
 
 	sum += per_cpu(irq_stat, cpu).broadcast_irqs_event;
 	sum += per_cpu(irq_stat, cpu).pmu_irqs;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/7] x86/irq: arch_irq_stat_cpu() returns unsigned int
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 1/7] genirq: Use unsigned int for irqs_sum Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 2/7] powerpc/irq: arch_irq_stat_cpu() returns unsigned int Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 4/7] x86/irq: arch_irq_stat() " Alexei Lozovsky
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Like with PowerPC's version, on x86 the interrupt counters that are
added here are all unsigned int too (see irq_cpustat_t for x86)
as well as mce_exception_count and mce_poll_count.

Summing up unsigned int values into u64 does not "handle" overflows,
but if any of the individual counters overflows then the computed
sum is inaccurate.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/x86/include/asm/hardirq.h | 2 +-
 arch/x86/kernel/irq.c          | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 275e7fd20310..461536b45391 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -54,7 +54,7 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
 
 extern void ack_bad_irq(unsigned int irq);
 
-extern u64 arch_irq_stat_cpu(unsigned int cpu);
+extern unsigned int arch_irq_stat_cpu(unsigned int cpu);
 #define arch_irq_stat_cpu	arch_irq_stat_cpu
 
 extern u64 arch_irq_stat(void);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index e28f6a5d14f1..cefe1bc9f42c 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -188,9 +188,9 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 /*
  * /proc/stat helpers
  */
-u64 arch_irq_stat_cpu(unsigned int cpu)
+unsigned int arch_irq_stat_cpu(unsigned int cpu)
 {
-	u64 sum = irq_stats(cpu)->__nmi_count;
+	unsigned int sum = irq_stats(cpu)->__nmi_count;
 
 #ifdef CONFIG_X86_LOCAL_APIC
 	sum += irq_stats(cpu)->apic_timer_irqs;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/7] x86/irq: arch_irq_stat() returns unsigned int
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
                     ` (2 preceding siblings ...)
  2021-09-11  3:48   ` [PATCH 3/7] x86/irq: " Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 5/7] proc/stat: Use unsigned int for "intr" sum Alexei Lozovsky
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

irq_err_count is atomic_t, extending it to u64 does not make the value
wider, it's still unsigned int and will wrap around like unsigned int.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/x86/include/asm/hardirq.h | 2 +-
 arch/x86/kernel/irq.c          | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 461536b45391..3b0d79e72871 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -57,7 +57,7 @@ extern void ack_bad_irq(unsigned int irq);
 extern unsigned int arch_irq_stat_cpu(unsigned int cpu);
 #define arch_irq_stat_cpu	arch_irq_stat_cpu
 
-extern u64 arch_irq_stat(void);
+extern unsigned int arch_irq_stat(void);
 #define arch_irq_stat		arch_irq_stat
 
 
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index cefe1bc9f42c..18691ed499d1 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -218,9 +218,9 @@ unsigned int arch_irq_stat_cpu(unsigned int cpu)
 	return sum;
 }
 
-u64 arch_irq_stat(void)
+unsigned int arch_irq_stat(void)
 {
-	u64 sum = atomic_read(&irq_err_count);
+	unsigned int sum = atomic_read(&irq_err_count);
 	return sum;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/7] proc/stat: Use unsigned int for "intr" sum
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
                     ` (3 preceding siblings ...)
  2021-09-11  3:48   ` [PATCH 4/7] x86/irq: arch_irq_stat() " Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 6/7] proc/stat: Use unsigned int for "softirq" sum Alexei Lozovsky
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Now that all values that are collected into "sum" are unsigned int,
make the sum itself unsigned int so that it overflows consistently
with individual components and thus retains the monotonicity.

Since seq_put_decimal_ull() is a function, we don't have to explicitly
cast sum into unsigned long long. Integer promotion will take care of
that (and the compiler will issue warnings if the types don't agree).

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 fs/proc/stat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 6561a06ef905..d31b83b2a175 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -110,7 +110,7 @@ static int show_stat(struct seq_file *p, void *v)
 	int i, j;
 	u64 user, nice, system, idle, iowait, irq, softirq, steal;
 	u64 guest, guest_nice;
-	u64 sum = 0;
+	unsigned int sum = 0;
 	u64 sum_softirq = 0;
 	unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
 	struct timespec64 boottime;
@@ -192,7 +192,7 @@ static int show_stat(struct seq_file *p, void *v)
 		seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
 		seq_putc(p, '\n');
 	}
-	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
+	seq_put_decimal_ull(p, "intr ", sum);
 
 	show_all_irqs(p);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/7] proc/stat: Use unsigned int for "softirq" sum
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
                     ` (4 preceding siblings ...)
  2021-09-11  3:48   ` [PATCH 5/7] proc/stat: Use unsigned int for "intr" sum Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:48   ` [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Similarly to "intr" sum value, "softirq" sum is computed by adding up
unsigned int counters for each CPU returned by kstat_softirqs_cpu().
To preserve monotonicity, use the same integer type so that the sum
wraps around consistently. And just like before, this value does not
need to be explicitly casted into unsigned long long for display.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 fs/proc/stat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index d31b83b2a175..b7a7de3cd822 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -111,7 +111,7 @@ static int show_stat(struct seq_file *p, void *v)
 	u64 user, nice, system, idle, iowait, irq, softirq, steal;
 	u64 guest, guest_nice;
 	unsigned int sum = 0;
-	u64 sum_softirq = 0;
+	unsigned int sum_softirq = 0;
 	unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
 	struct timespec64 boottime;
 
@@ -208,7 +208,7 @@ static int show_stat(struct seq_file *p, void *v)
 		nr_running(),
 		nr_iowait());
 
-	seq_put_decimal_ull(p, "softirq ", (unsigned long long)sum_softirq);
+	seq_put_decimal_ull(p, "softirq ", sum_softirq);
 
 	for (i = 0; i < NR_SOFTIRQS; i++)
 		seq_put_decimal_ull(p, " ", per_softirq_sums[i]);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
                     ` (5 preceding siblings ...)
  2021-09-11  3:48   ` [PATCH 6/7] proc/stat: Use unsigned int for "softirq" sum Alexei Lozovsky
@ 2021-09-11  3:48   ` Alexei Lozovsky
  2021-09-11  3:59     ` Randy Dunlap
  2021-09-12  9:30   ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexey Dobriyan
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
  8 siblings, 1 reply; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-11  3:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Let's make wrap-around documented behavior so that userspace has no
excuses for not handling it properly if they want accurate values.

Both "intr" and "softirq" counters (as well as many others, actually)
can and will wrap-around, given enough time since boot.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 Documentation/filesystems/proc.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 042c418f4090..06a0e3aa2e0e 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1513,6 +1513,13 @@ interrupts serviced  including  unnumbered  architecture specific  interrupts;
 each  subsequent column is the  total for that particular numbered interrupt.
 Unnumbered interrupts are not shown, only summed into the total.
 
+.. note::
+
+   Interrupt counters on most platforms are 32-bit, including the total count.
+   Depending on the system load, ths values will sooner or later wrap around.
+   If you want accurate accouting of the rate and *real* number of interrupts
+   serviced, you should monitor the value closely and handle wrap-arounds.
+
 The "ctxt" line gives the total number of context switches across all CPUs.
 
 The "btime" line gives  the time at which the  system booted, in seconds since
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around
  2021-09-11  3:48   ` [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
@ 2021-09-11  3:59     ` Randy Dunlap
  0 siblings, 0 replies; 27+ messages in thread
From: Randy Dunlap @ 2021-09-11  3:59 UTC (permalink / raw)
  To: Alexei Lozovsky, Thomas Gleixner
  Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

On 9/10/21 8:48 PM, Alexei Lozovsky wrote:
> Let's make wrap-around documented behavior so that userspace has no
> excuses for not handling it properly if they want accurate values.
> 
> Both "intr" and "softirq" counters (as well as many others, actually)
> can and will wrap-around, given enough time since boot.
> 
> Signed-off-by: Alexei Lozovsky <me@ilammy.net>
> ---
>   Documentation/filesystems/proc.rst | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 042c418f4090..06a0e3aa2e0e 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1513,6 +1513,13 @@ interrupts serviced  including  unnumbered  architecture specific  interrupts;
>   each  subsequent column is the  total for that particular numbered interrupt.
>   Unnumbered interrupts are not shown, only summed into the total.
>   
> +.. note::
> +
> +   Interrupt counters on most platforms are 32-bit, including the total count.
> +   Depending on the system load, ths values will sooner or later wrap around.

                                     these

> +   If you want accurate accouting of the rate and *real* number of interrupts

                            accounting

> +   serviced, you should monitor the value closely and handle wrap-arounds.
> +
>   The "ctxt" line gives the total number of context switches across all CPUs.
>   
>   The "btime" line gives  the time at which the  system booted, in seconds since
> 

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq"
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
                     ` (6 preceding siblings ...)
  2021-09-11  3:48   ` [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
@ 2021-09-12  9:30   ` Alexey Dobriyan
  2021-09-12 12:37     ` Alexei Lozovsky
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
  8 siblings, 1 reply; 27+ messages in thread
From: Alexey Dobriyan @ 2021-09-12  9:30 UTC (permalink / raw)
  To: Alexei Lozovsky; +Cc: Thomas Gleixner, Christoph Lameter, LKML, linux-fsdevel

On Sat, Sep 11, 2021 at 12:48:01PM +0900, Alexei Lozovsky wrote:
> Here's a patch set that makes /proc/stat report total interrupt counts
> as monotonically increasing values, just like individual counters for
> interrupt types and CPUs are.
> 
> This is as if the sum was a shared counter that all CPUs increment
> atomically together with their individual counters, with the sum
> correctly and expectedly wrapping around to zero once it reaches
> UINT_MAX.
> 
> I've also added some documentation bits to codify this behavior and make
> it explicit that wrap-arounds must be expected and handled if userspace
> wants to maintain accurate total interrupt count for whatever reasons.

How about making everything "unsigned long" or even "u64" like NIC
drivers do?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq"
  2021-09-12  9:30   ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexey Dobriyan
@ 2021-09-12 12:37     ` Alexei Lozovsky
  2021-09-14 14:11       ` Thomas Gleixner
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-12 12:37 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Thomas Gleixner, Christoph Lameter, LKML, linux-fsdevel

On Sun, Sep 12, 2021, at 18:30, Alexey Dobriyan wrote:
> How about making everything "unsigned long" or even "u64" like NIC
> drivers do?

I see some possible hurdles ahead:

- Not all architectures have atomic operations for 64-bit values

  All those "unsigned int" counters are incremented with __this_cpu_inc()
  which tries to use atomics if possible. Though, I'm not quite sure
  how this works for read side which does not seem to use atomic reads
  at all. I guess, just by the virtue of properly aligned 32-bit reads
  being atomic everywhere? If that's so, I think widening counters to
  64 bits will come with an asterisk.

- We'll need to update all counters to be 64-bit.

  Like, *everyone*. Every field that gets summed up needs to be 64-bit
  (or else wrap-arounds will be incorrect). Basically every counter in
  every irq_cpustat_t will need to become twice as wide. If that's
  a fine price to pay for accurate, full-width counters...

  Previously I thought that some of these counters even come from
  hardware, but now that I'm reviewing them, that does not seem to be
  the case. Thankfully.

So right now I don't see why it shouldn't be doable in theory.
I'll give it a shot, I guess, and see how it works in practice,
at least as far as the patches go (since I can't really test on all
architectures).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq"
  2021-09-12 12:37     ` Alexei Lozovsky
@ 2021-09-14 14:11       ` Thomas Gleixner
  2021-09-15  4:24         ` Alexei Lozovsky
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-14 14:11 UTC (permalink / raw)
  To: Alexei Lozovsky, Alexey Dobriyan; +Cc: Christoph Lameter, LKML, linux-fsdevel

On Sun, Sep 12 2021 at 21:37, Alexei Lozovsky wrote:
> On Sun, Sep 12, 2021, at 18:30, Alexey Dobriyan wrote:
>> How about making everything "unsigned long" or even "u64" like NIC
>> drivers do?
>
> I see some possible hurdles ahead:
>
> - Not all architectures have atomic operations for 64-bit values

This is not about atomics.

>   All those "unsigned int" counters are incremented with __this_cpu_inc()
>   which tries to use atomics if possible. Though, I'm not quite sure

It does not use atomics. It's a CPU local increment.

>   how this works for read side which does not seem to use atomic reads
>   at all. I guess, just by the virtue of properly aligned 32-bit reads
>   being atomic everywhere? If that's so, I think widening counters to
>   64 bits will come with an asterisk.

The stats are accumulated racy, i.e. the interrupt might be handled and
one of the per cpu counters or irq_desc->tot_count might be incremented
concurrently.

On 32bit systems a 32bit load (as long as the compiler does not emit
load tearing) is always consistent even when there is a concurrent
increment going on. It either gets the old or the new value.

A 64bit read on a 32bit system is always two loads which means that a
concurrent increment will make it possible to observe a half updated
value. And no, you can't play reread tricks here without adding barriers
on weakly ordered architectures.

> - We'll need to update all counters to be 64-bit.
>
>   Like, *everyone*. Every field that gets summed up needs to be 64-bit
>   (or else wrap-arounds will be incorrect). Basically every counter in
>   every irq_cpustat_t will need to become twice as wide. If that's
>   a fine price to pay for accurate, full-width counters...

The storage size should not be a problem.

> So right now I don't see why it shouldn't be doable in theory.

So much for the theory :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq"
  2021-09-14 14:11       ` Thomas Gleixner
@ 2021-09-15  4:24         ` Alexei Lozovsky
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15  4:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Thanks for vetting my ideas!

On Tue, Sep 14, 2021, at 23:11, Thomas Gleixner wrote:
> On Sun, Sep 12 2021 at 21:37, Alexei Lozovsky wrote:
>> On Sun, Sep 12, 2021, at 18:30, Alexey Dobriyan wrote:
>>> How about making everything "unsigned long" or even "u64" like NIC
>>> drivers do?
>> 
>> I see some possible hurdles ahead:
>> 
>> - Not all architectures have atomic operations for 64-bit values
> 
> This is not about atomics.

Yeah, I got mixed up in terminology. As you said, atomic
read-modify-write for increment is not important here, but what
*is* important is absence of tearing when doing loads and stores.

If there is no tearing we don't need any barriers to observe counters
that make sense. They might be slightly outdated but we don't care
as long as they are observed to be monotonically increasing and
we don't see the low bits wrap before the high bits are updated
because 64-bit store got split into two 32-bit ones.

That said, I believe this rules out updating counter types to u64
because on 32-bit platforms those will tear. However, we can use
unsigned long so that platforms with 64-bit native words get 64-bit
counters and platforms with 32-bit words stay with 32-bit counters
that wrap like they should.

I've checked this on Godbolt for a number of archs and it seems that
all of them will emit single loads and stores for unsigned long.
Well, except for 16-bit platforms, but those would certainly not use
PPC or x86 and procfs in the first place, so I think we can ignore
them for this matter.

> On 32bit systems a 32bit load (as long as the compiler does not emit
> load tearing) is always consistent even when there is a concurrent
> increment going on. It either gets the old or the new value.

Regarding tearing, I thought about wrapping counter reads in READ_ONCE()
to signal that they should be performed in one load. __this_cpu_inc()
should probably do WRITE_ONCE() for the sake of pairing, but that
should not be too important.

Is it a good idea to use READ_ONCE here?
Or just assume that compiler will not emit any weird loads?

(READ_ONCE does not strictly check that reads will not tear. Right now
it allows unsigned long long because reasons. But I guess it will enable
some extra debugging checks.)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 00/12] proc/stat: Maintain monotonicity of "intr" and "softirq"
  2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
                     ` (7 preceding siblings ...)
  2021-09-12  9:30   ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexey Dobriyan
@ 2021-09-15 17:58   ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 01/12] genirq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
                       ` (11 more replies)
  8 siblings, 12 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Here's a patch set that makes /proc/stat report total interrupt counts
as monotonically increasing values, just like individual counters for
interrupt types and CPUs are.

The total counters are not actually maintained but computed from
individual counters. These counters must all have the same width
for their sum to wrap around in correct and expected manner.

This patch set unifies all counters that are displayed by /proc/stat
and /proc/interrupts to use "unsigned long" (instead of "unsigned int"
values that get summed into u64, which causes problems for userspace
monitoring tools when the individual counters wrap around).

v1 -> v2:

  - Added READ_ONCE to all reads of per-CPU counters
  - Widened and unified all counters to "unsigned long"
    instead of clamping them all to "unsigned int"
  - Fixed typos in documentation

Since the scope of changes has expanded, I think these patches will
need to be seen and vetted by more people (get-maintainer.pl agrees)
but for now I'll keep the same CC list as for v1.

Unresolved questions:

  - Does READ_ONCE addition has any merit without WRITE_ONCE?

  - Some counters turned out to be unaccounted for in the total sum.
    Should they be included there? Or are they omitted on purpose?

  - I haven't tested this on PowerPC since I don't have the hardware

Alexei Lozovsky (12):
  genirq: Use READ_ONCE for IRQ counter reads
  genirq: Use unsigned long for IRQ counters
  powerpc/irq: Use READ_ONCE for IRQ counter reads
  powerpc/irq: Use unsigned long for IRQ counters
  powerpc/irq: Use unsigned long for IRQ counter sum
  x86/irq: Use READ_ONCE for IRQ counter reads
  x86/irq: Use unsigned long for IRQ counters
  x86/irq: Use unsigned long for IRQ counters more
  x86/irq: Use unsigned long for IRQ counter sum
  proc/stat: Use unsigned long for "intr" sum
  proc/stat: Use unsigned long for "softirq" sum
  docs: proc.rst: stat: Note the interrupt counter wrap-around

 Documentation/filesystems/proc.rst |  8 +++
 arch/powerpc/include/asm/hardirq.h | 20 ++++----
 arch/powerpc/include/asm/paca.h    |  2 +-
 arch/powerpc/kernel/irq.c          | 42 +++++++--------
 arch/x86/include/asm/hardirq.h     | 26 +++++-----
 arch/x86/include/asm/hw_irq.h      |  4 +-
 arch/x86/include/asm/mce.h         |  4 +-
 arch/x86/kernel/apic/apic.c        |  2 +-
 arch/x86/kernel/apic/io_apic.c     |  4 +-
 arch/x86/kernel/cpu/mce/core.c     |  4 +-
 arch/x86/kernel/i8259.c            |  2 +-
 arch/x86/kernel/irq.c              | 82 +++++++++++++++---------------
 fs/proc/softirqs.c                 |  2 +-
 fs/proc/stat.c                     | 12 ++---
 include/linux/kernel_stat.h        | 10 ++--
 kernel/rcu/tree.h                  |  2 +-
 kernel/rcu/tree_stall.h            |  4 +-
 17 files changed, 119 insertions(+), 111 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/12] genirq: Use READ_ONCE for IRQ counter reads
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 02/12] genirq: Use unsigned long for IRQ counters Alexei Lozovsky
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

IRQ counters are updated by each CPU independently, access to them
does not need to synchronized and it's okay to race: as long as the
counter is not seen going backwards (hopefully, cache coherency
takes care of that) and the stores and loads are not torn.

The last part is currently sorta-kinda expected to happen because
all the counters use "unsigned int" which is expected to fit into
machine word and not be torn.

Make this expectation explicit by wrapping the reads in READ_ONCE.
Note that writes are typically perfomed via this_cpu_inc() and its
fellows which do not do matching WRITE_ONCE().

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 include/linux/kernel_stat.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 44ae1a7eb9e3..90f2e2faf999 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -61,7 +61,7 @@ static inline void kstat_incr_softirqs_this_cpu(unsigned int irq)
 
 static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
 {
-       return kstat_cpu(cpu).softirqs[irq];
+	return READ_ONCE(kstat_cpu(cpu).softirqs[irq]);
 }
 
 /*
@@ -74,7 +74,7 @@ extern unsigned int kstat_irqs_usr(unsigned int irq);
  */
 static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 {
-	return kstat_cpu(cpu).irqs_sum;
+	return READ_ONCE(kstat_cpu(cpu).irqs_sum);
 }
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 02/12] genirq: Use unsigned long for IRQ counters
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 01/12] genirq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 03/12] powerpc/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Widen the counters to unsigned long. In fact, some counters already
use unsigned long and we merely make the other ones agree. that's
one of the reasons for the change: using the same type everywhere
means that counter sum wraps around in a consistent manner,
allowing accurate accounting of the total number of interrupts.

Another aspect is simply widening the type on architectures where
this is possible (i.e., 64-bit ones). 32-bit architectures will keep
32-bit counters, 64-bit archs will be able to use 64-bit counters.
Since 64-bit counters have such huge range, it's unlikely that they
will wrap around in the first place.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 fs/proc/softirqs.c          | 2 +-
 fs/proc/stat.c              | 2 +-
 include/linux/kernel_stat.h | 6 +++---
 kernel/rcu/tree.h           | 2 +-
 kernel/rcu/tree_stall.h     | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/softirqs.c b/fs/proc/softirqs.c
index 12901dcf57e2..2bc05c23d22e 100644
--- a/fs/proc/softirqs.c
+++ b/fs/proc/softirqs.c
@@ -19,7 +19,7 @@ static int show_softirqs(struct seq_file *p, void *v)
 	for (i = 0; i < NR_SOFTIRQS; i++) {
 		seq_printf(p, "%12s:", softirq_to_name[i]);
 		for_each_possible_cpu(j)
-			seq_printf(p, " %10u", kstat_softirqs_cpu(i, j));
+			seq_printf(p, " %10lu", kstat_softirqs_cpu(i, j));
 		seq_putc(p, '\n');
 	}
 	return 0;
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 6561a06ef905..d9d89d7a959c 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -142,7 +142,7 @@ static int show_stat(struct seq_file *p, void *v)
 		sum		+= arch_irq_stat_cpu(i);
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
-			unsigned int softirq_stat = kstat_softirqs_cpu(j, i);
+			unsigned long softirq_stat = kstat_softirqs_cpu(j, i);
 
 			per_softirq_sums[j] += softirq_stat;
 			sum_softirq += softirq_stat;
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 90f2e2faf999..41541ce67dfa 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -37,7 +37,7 @@ struct kernel_cpustat {
 
 struct kernel_stat {
 	unsigned long irqs_sum;
-	unsigned int softirqs[NR_SOFTIRQS];
+	unsigned long softirqs[NR_SOFTIRQS];
 };
 
 DECLARE_PER_CPU(struct kernel_stat, kstat);
@@ -59,7 +59,7 @@ static inline void kstat_incr_softirqs_this_cpu(unsigned int irq)
 	__this_cpu_inc(kstat.softirqs[irq]);
 }
 
-static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
+static inline unsigned long kstat_softirqs_cpu(unsigned int irq, int cpu)
 {
 	return READ_ONCE(kstat_cpu(cpu).softirqs[irq]);
 }
@@ -72,7 +72,7 @@ extern unsigned int kstat_irqs_usr(unsigned int irq);
 /*
  * Number of interrupts per cpu, since bootup
  */
-static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
+static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
 {
 	return READ_ONCE(kstat_cpu(cpu).irqs_sum);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 305cf6aeb408..94e4f022f995 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -242,7 +242,7 @@ struct rcu_data {
 	char rcu_cpu_has_work;
 
 	/* 7) Diagnostic data, including RCU CPU stall warnings. */
-	unsigned int softirq_snap;	/* Snapshot of softirq activity. */
+	unsigned long softirq_snap;	/* Snapshot of softirq activity. */
 	/* ->rcu_iw* fields protected by leaf rcu_node ->lock. */
 	struct irq_work rcu_iw;		/* Check for non-irq activity. */
 	bool rcu_iw_pending;		/* Is ->rcu_iw pending? */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 6c76988cc019..35e67275b5b4 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -435,7 +435,7 @@ static void print_cpu_stall_info(int cpu)
 	delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
 	falsepositive = rcu_is_gp_kthread_starving(NULL) &&
 			rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp));
-	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s%s\n",
+	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%lu/%lu fqs=%ld %s%s\n",
 	       cpu,
 	       "O."[!!cpu_online(cpu)],
 	       "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
@@ -510,7 +510,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
 		       data_race(rcu_state.gp_flags),
 		       gp_state_getname(RCU_GP_WAIT_FQS), RCU_GP_WAIT_FQS,
 		       gpk->__state);
-		pr_err("\tPossible timer handling issue on cpu=%d timer-softirq=%u\n",
+		pr_err("\tPossible timer handling issue on cpu=%d timer-softirq=%lu\n",
 		       cpu, kstat_softirqs_cpu(TIMER_SOFTIRQ, cpu));
 	}
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 03/12] powerpc/irq: Use READ_ONCE for IRQ counter reads
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 01/12] genirq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 02/12] genirq: Use unsigned long for IRQ counters Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 04/12] powerpc/irq: Use unsigned long for IRQ counters Alexei Lozovsky
                       ` (8 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Just like with generic IRQ counters in the previous commit,
wrap accesses to per-CPU counters from irq_cpustat_t in READ_ONCE.
Grab that one "hmi_irqs" oddball from struct paca_struct as well.
These memory loads should not be torn, let's make it explicit.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/powerpc/kernel/irq.c | 40 +++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 551b653228c4..e5082d8be700 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -581,52 +581,52 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).timer_irqs_event);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).timer_irqs_event));
         seq_printf(p, "  Local timer interrupts for timer event device\n");
 
 	seq_printf(p, "%*s: ", prec, "BCT");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).broadcast_irqs_event);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).broadcast_irqs_event));
 	seq_printf(p, "  Broadcast timer interrupts for timer event device\n");
 
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).timer_irqs_others);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).timer_irqs_others));
         seq_printf(p, "  Local timer interrupts for others\n");
 
 	seq_printf(p, "%*s: ", prec, "SPU");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).spurious_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).spurious_irqs));
 	seq_printf(p, "  Spurious interrupts\n");
 
 	seq_printf(p, "%*s: ", prec, "PMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).pmu_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).pmu_irqs));
 	seq_printf(p, "  Performance monitoring interrupts\n");
 
 	seq_printf(p, "%*s: ", prec, "MCE");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).mce_exceptions);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).mce_exceptions));
 	seq_printf(p, "  Machine check exceptions\n");
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	if (cpu_has_feature(CPU_FTR_HVMODE)) {
 		seq_printf(p, "%*s: ", prec, "HMI");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", paca_ptrs[j]->hmi_irqs);
+			seq_printf(p, "%10u ", READ_ONCE(paca_ptrs[j]->hmi_irqs));
 		seq_printf(p, "  Hypervisor Maintenance Interrupts\n");
 	}
 #endif
 
 	seq_printf(p, "%*s: ", prec, "NMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).sreset_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).sreset_irqs));
 	seq_printf(p, "  System Reset interrupts\n");
 
 #ifdef CONFIG_PPC_WATCHDOG
 	seq_printf(p, "%*s: ", prec, "WDG");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(irq_stat, j).soft_nmi_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).soft_nmi_irqs));
 	seq_printf(p, "  Watchdog soft-NMI interrupts\n");
 #endif
 
@@ -634,7 +634,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	if (cpu_has_feature(CPU_FTR_DBELL)) {
 		seq_printf(p, "%*s: ", prec, "DBL");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", per_cpu(irq_stat, j).doorbell_irqs);
+			seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).doorbell_irqs));
 		seq_printf(p, "  Doorbell interrupts\n");
 	}
 #endif
@@ -647,22 +647,22 @@ int arch_show_interrupts(struct seq_file *p, int prec)
  */
 u64 arch_irq_stat_cpu(unsigned int cpu)
 {
-	u64 sum = per_cpu(irq_stat, cpu).timer_irqs_event;
+	u64 sum = READ_ONCE(per_cpu(irq_stat, cpu).timer_irqs_event);
 
-	sum += per_cpu(irq_stat, cpu).broadcast_irqs_event;
-	sum += per_cpu(irq_stat, cpu).pmu_irqs;
-	sum += per_cpu(irq_stat, cpu).mce_exceptions;
-	sum += per_cpu(irq_stat, cpu).spurious_irqs;
-	sum += per_cpu(irq_stat, cpu).timer_irqs_others;
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).broadcast_irqs_event);
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).pmu_irqs);
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).mce_exceptions);
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).spurious_irqs);
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).timer_irqs_others);
 #ifdef CONFIG_PPC_BOOK3S_64
-	sum += paca_ptrs[cpu]->hmi_irqs;
+	sum += READ_ONCE(paca_ptrs[cpu]->hmi_irqs);
 #endif
-	sum += per_cpu(irq_stat, cpu).sreset_irqs;
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).sreset_irqs);
 #ifdef CONFIG_PPC_WATCHDOG
-	sum += per_cpu(irq_stat, cpu).soft_nmi_irqs;
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).soft_nmi_irqs);
 #endif
 #ifdef CONFIG_PPC_DOORBELL
-	sum += per_cpu(irq_stat, cpu).doorbell_irqs;
+	sum += READ_ONCE(per_cpu(irq_stat, cpu).doorbell_irqs);
 #endif
 
 	return sum;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 04/12] powerpc/irq: Use unsigned long for IRQ counters
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (2 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 03/12] powerpc/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 05/12] powerpc/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
                       ` (7 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Widen individual counters from irq_cpustat_t to unsigned long. This
makes their overflow virtually impossible within reasonable time on
PowerPC64.

Switch "hmi_irqs" in paca to unsigned long as well since it's going
to be summed up with all those other counters and it must have the
same width for the addition to wrap around correctly.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/powerpc/include/asm/hardirq.h | 18 +++++++++---------
 arch/powerpc/include/asm/paca.h    |  2 +-
 arch/powerpc/kernel/irq.c          | 20 ++++++++++----------
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
index f133b5930ae1..efc5391f84fb 100644
--- a/arch/powerpc/include/asm/hardirq.h
+++ b/arch/powerpc/include/asm/hardirq.h
@@ -7,18 +7,18 @@
 
 typedef struct {
 	unsigned int __softirq_pending;
-	unsigned int timer_irqs_event;
-	unsigned int broadcast_irqs_event;
-	unsigned int timer_irqs_others;
-	unsigned int pmu_irqs;
-	unsigned int mce_exceptions;
-	unsigned int spurious_irqs;
-	unsigned int sreset_irqs;
+	unsigned long timer_irqs_event;
+	unsigned long broadcast_irqs_event;
+	unsigned long timer_irqs_others;
+	unsigned long pmu_irqs;
+	unsigned long mce_exceptions;
+	unsigned long spurious_irqs;
+	unsigned long sreset_irqs;
 #ifdef CONFIG_PPC_WATCHDOG
-	unsigned int soft_nmi_irqs;
+	unsigned long soft_nmi_irqs;
 #endif
 #ifdef CONFIG_PPC_DOORBELL
-	unsigned int doorbell_irqs;
+	unsigned long doorbell_irqs;
 #endif
 } ____cacheline_aligned irq_cpustat_t;
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index dc05a862e72a..97d78fcf5529 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -230,7 +230,7 @@ struct paca_struct {
 	u16 in_mce;
 	u8 hmi_event_available;		/* HMI event is available */
 	u8 hmi_p9_special_emu;		/* HMI P9 special emulation */
-	u32 hmi_irqs;			/* HMI irq stat */
+	unsigned long hmi_irqs;		/* HMI irq stat */
 #endif
 	u8 ftrace_enabled;		/* Hard disable ftrace */
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index e5082d8be700..9f39de114a0e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -581,52 +581,52 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).timer_irqs_event));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).timer_irqs_event));
         seq_printf(p, "  Local timer interrupts for timer event device\n");
 
 	seq_printf(p, "%*s: ", prec, "BCT");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).broadcast_irqs_event));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).broadcast_irqs_event));
 	seq_printf(p, "  Broadcast timer interrupts for timer event device\n");
 
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).timer_irqs_others));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).timer_irqs_others));
         seq_printf(p, "  Local timer interrupts for others\n");
 
 	seq_printf(p, "%*s: ", prec, "SPU");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).spurious_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).spurious_irqs));
 	seq_printf(p, "  Spurious interrupts\n");
 
 	seq_printf(p, "%*s: ", prec, "PMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).pmu_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).pmu_irqs));
 	seq_printf(p, "  Performance monitoring interrupts\n");
 
 	seq_printf(p, "%*s: ", prec, "MCE");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).mce_exceptions));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).mce_exceptions));
 	seq_printf(p, "  Machine check exceptions\n");
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	if (cpu_has_feature(CPU_FTR_HVMODE)) {
 		seq_printf(p, "%*s: ", prec, "HMI");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", READ_ONCE(paca_ptrs[j]->hmi_irqs));
+			seq_printf(p, "%10lu ", READ_ONCE(paca_ptrs[j]->hmi_irqs));
 		seq_printf(p, "  Hypervisor Maintenance Interrupts\n");
 	}
 #endif
 
 	seq_printf(p, "%*s: ", prec, "NMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).sreset_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).sreset_irqs));
 	seq_printf(p, "  System Reset interrupts\n");
 
 #ifdef CONFIG_PPC_WATCHDOG
 	seq_printf(p, "%*s: ", prec, "WDG");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).soft_nmi_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).soft_nmi_irqs));
 	seq_printf(p, "  Watchdog soft-NMI interrupts\n");
 #endif
 
@@ -634,7 +634,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	if (cpu_has_feature(CPU_FTR_DBELL)) {
 		seq_printf(p, "%*s: ", prec, "DBL");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", READ_ONCE(per_cpu(irq_stat, j).doorbell_irqs));
+			seq_printf(p, "%10lu ", READ_ONCE(per_cpu(irq_stat, j).doorbell_irqs));
 		seq_printf(p, "  Doorbell interrupts\n");
 	}
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/12] powerpc/irq: Use unsigned long for IRQ counter sum
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (3 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 04/12] powerpc/irq: Use unsigned long for IRQ counters Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 06/12] x86/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
                       ` (6 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Now that all individual counters consistently use unsigned long, use the
same type for their sum. This ensures correct handling of wrap-around
(which is more important for 32-bit PowerPC at this point).

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/powerpc/include/asm/hardirq.h | 2 +-
 arch/powerpc/kernel/irq.c          | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
index efc5391f84fb..4d3a7d66c299 100644
--- a/arch/powerpc/include/asm/hardirq.h
+++ b/arch/powerpc/include/asm/hardirq.h
@@ -32,7 +32,7 @@ static inline void ack_bad_irq(unsigned int irq)
 	printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
 }
 
-extern u64 arch_irq_stat_cpu(unsigned int cpu);
+extern unsigned long arch_irq_stat_cpu(unsigned int cpu);
 #define arch_irq_stat_cpu	arch_irq_stat_cpu
 
 #endif /* _ASM_POWERPC_HARDIRQ_H */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 9f39de114a0e..1a0a161ee38b 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -645,9 +645,9 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 /*
  * /proc/stat helpers
  */
-u64 arch_irq_stat_cpu(unsigned int cpu)
+unsigned long arch_irq_stat_cpu(unsigned int cpu)
 {
-	u64 sum = READ_ONCE(per_cpu(irq_stat, cpu).timer_irqs_event);
+	unsigned long sum = READ_ONCE(per_cpu(irq_stat, cpu).timer_irqs_event);
 
 	sum += READ_ONCE(per_cpu(irq_stat, cpu).broadcast_irqs_event);
 	sum += READ_ONCE(per_cpu(irq_stat, cpu).pmu_irqs);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 06/12] x86/irq: Use READ_ONCE for IRQ counter reads
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (4 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 05/12] powerpc/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 07/12] x86/irq: Use unsigned long for IRQ counters Alexei Lozovsky
                       ` (5 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Just like with generic IRQ counters, wrap accesses to counters from
irq_cpustat_t into READ_ONCE to ensure these loads don't get torn.

mce_exception_count and mce_poll_count are also updated by each CPU
independently and we don't want these loads to tear as well.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/x86/kernel/irq.c | 69 ++++++++++++++++++++++---------------------
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index e28f6a5d14f1..4ff04ce22eb6 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -62,77 +62,77 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 
 	seq_printf(p, "%*s: ", prec, "NMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->__nmi_count));
 	seq_puts(p, "  Non-maskable interrupts\n");
 #ifdef CONFIG_X86_LOCAL_APIC
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->apic_timer_irqs));
 	seq_puts(p, "  Local timer interrupts\n");
 
 	seq_printf(p, "%*s: ", prec, "SPU");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_spurious_count));
 	seq_puts(p, "  Spurious interrupts\n");
 	seq_printf(p, "%*s: ", prec, "PMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->apic_perf_irqs));
 	seq_puts(p, "  Performance monitoring interrupts\n");
 	seq_printf(p, "%*s: ", prec, "IWI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->apic_irq_work_irqs));
 	seq_puts(p, "  IRQ work interrupts\n");
 	seq_printf(p, "%*s: ", prec, "RTR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->icr_read_retry_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->icr_read_retry_count));
 	seq_puts(p, "  APIC ICR read retries\n");
 	if (x86_platform_ipi_callback) {
 		seq_printf(p, "%*s: ", prec, "PLT");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis);
+			seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->x86_platform_ipis));
 		seq_puts(p, "  Platform interrupts\n");
 	}
 #endif
 #ifdef CONFIG_SMP
 	seq_printf(p, "%*s: ", prec, "RES");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_resched_count));
 	seq_puts(p, "  Rescheduling interrupts\n");
 	seq_printf(p, "%*s: ", prec, "CAL");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_call_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_call_count));
 	seq_puts(p, "  Function call interrupts\n");
 	seq_printf(p, "%*s: ", prec, "TLB");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_tlb_count));
 	seq_puts(p, "  TLB shootdowns\n");
 #endif
 #ifdef CONFIG_X86_THERMAL_VECTOR
 	seq_printf(p, "%*s: ", prec, "TRM");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_thermal_count));
 	seq_puts(p, "  Thermal event interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE_THRESHOLD
 	seq_printf(p, "%*s: ", prec, "THR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_threshold_count));
 	seq_puts(p, "  Threshold APIC interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE_AMD
 	seq_printf(p, "%*s: ", prec, "DFR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_deferred_error_count);
+		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_deferred_error_count));
 	seq_puts(p, "  Deferred Error APIC interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE
 	seq_printf(p, "%*s: ", prec, "MCE");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(mce_exception_count, j));
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(mce_exception_count, j)));
 	seq_puts(p, "  Machine check exceptions\n");
 	seq_printf(p, "%*s: ", prec, "MCP");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
+		seq_printf(p, "%10u ", READ_ONCE(per_cpu(mce_poll_count, j)));
 	seq_puts(p, "  Machine check polls\n");
 #endif
 #ifdef CONFIG_X86_HV_CALLBACK_VECTOR
@@ -140,7 +140,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 		seq_printf(p, "%*s: ", prec, "HYP");
 		for_each_online_cpu(j)
 			seq_printf(p, "%10u ",
-				   irq_stats(j)->irq_hv_callback_count);
+				READ_ONCE(irq_stats(j)->irq_hv_callback_count));
 		seq_puts(p, "  Hypervisor callback interrupts\n");
 	}
 #endif
@@ -149,14 +149,14 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 		seq_printf(p, "%*s: ", prec, "HRE");
 		for_each_online_cpu(j)
 			seq_printf(p, "%10u ",
-				   irq_stats(j)->irq_hv_reenlightenment_count);
+				READ_ONCE(irq_stats(j)->irq_hv_reenlightenment_count));
 		seq_puts(p, "  Hyper-V reenlightenment interrupts\n");
 	}
 	if (test_bit(HYPERV_STIMER0_VECTOR, system_vectors)) {
 		seq_printf(p, "%*s: ", prec, "HVS");
 		for_each_online_cpu(j)
 			seq_printf(p, "%10u ",
-				   irq_stats(j)->hyperv_stimer0_count);
+				READ_ONCE(irq_stats(j)->hyperv_stimer0_count));
 		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
 	}
 #endif
@@ -167,19 +167,20 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 #ifdef CONFIG_HAVE_KVM
 	seq_printf(p, "%*s: ", prec, "PIN");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->kvm_posted_intr_ipis);
+		seq_printf(p, "%10u ",
+			READ_ONCE(irq_stats(j)->kvm_posted_intr_ipis));
 	seq_puts(p, "  Posted-interrupt notification event\n");
 
 	seq_printf(p, "%*s: ", prec, "NPI");
 	for_each_online_cpu(j)
 		seq_printf(p, "%10u ",
-			   irq_stats(j)->kvm_posted_intr_nested_ipis);
+			READ_ONCE(irq_stats(j)->kvm_posted_intr_nested_ipis));
 	seq_puts(p, "  Nested posted-interrupt event\n");
 
 	seq_printf(p, "%*s: ", prec, "PIW");
 	for_each_online_cpu(j)
 		seq_printf(p, "%10u ",
-			   irq_stats(j)->kvm_posted_intr_wakeup_ipis);
+			READ_ONCE(irq_stats(j)->kvm_posted_intr_wakeup_ipis));
 	seq_puts(p, "  Posted-interrupt wakeup event\n");
 #endif
 	return 0;
@@ -190,30 +191,30 @@ int arch_show_interrupts(struct seq_file *p, int prec)
  */
 u64 arch_irq_stat_cpu(unsigned int cpu)
 {
-	u64 sum = irq_stats(cpu)->__nmi_count;
+	u64 sum = READ_ONCE(irq_stats(cpu)->__nmi_count);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-	sum += irq_stats(cpu)->apic_timer_irqs;
-	sum += irq_stats(cpu)->irq_spurious_count;
-	sum += irq_stats(cpu)->apic_perf_irqs;
-	sum += irq_stats(cpu)->apic_irq_work_irqs;
-	sum += irq_stats(cpu)->icr_read_retry_count;
+	sum += READ_ONCE(irq_stats(cpu)->apic_timer_irqs);
+	sum += READ_ONCE(irq_stats(cpu)->irq_spurious_count);
+	sum += READ_ONCE(irq_stats(cpu)->apic_perf_irqs);
+	sum += READ_ONCE(irq_stats(cpu)->apic_irq_work_irqs);
+	sum += READ_ONCE(irq_stats(cpu)->icr_read_retry_count);
 	if (x86_platform_ipi_callback)
-		sum += irq_stats(cpu)->x86_platform_ipis;
+		sum += READ_ONCE(irq_stats(cpu)->x86_platform_ipis);
 #endif
 #ifdef CONFIG_SMP
-	sum += irq_stats(cpu)->irq_resched_count;
-	sum += irq_stats(cpu)->irq_call_count;
+	sum += READ_ONCE(irq_stats(cpu)->irq_resched_count);
+	sum += READ_ONCE(irq_stats(cpu)->irq_call_count);
 #endif
 #ifdef CONFIG_X86_THERMAL_VECTOR
-	sum += irq_stats(cpu)->irq_thermal_count;
+	sum += READ_ONCE(irq_stats(cpu)->irq_thermal_count);
 #endif
 #ifdef CONFIG_X86_MCE_THRESHOLD
-	sum += irq_stats(cpu)->irq_threshold_count;
+	sum += READ_ONCE(irq_stats(cpu)->irq_threshold_count);
 #endif
 #ifdef CONFIG_X86_MCE
-	sum += per_cpu(mce_exception_count, cpu);
-	sum += per_cpu(mce_poll_count, cpu);
+	sum += READ_ONCE(per_cpu(mce_exception_count, cpu));
+	sum += READ_ONCE(per_cpu(mce_poll_count, cpu));
 #endif
 	return sum;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/12] x86/irq: Use unsigned long for IRQ counters
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (5 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 06/12] x86/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 08/12] x86/irq: Use unsigned long for IRQ counters more Alexei Lozovsky
                       ` (4 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Similarly to PowerPC in the previous patches, bump the counters from
irq_cpustat_t to a wider type for better wrap around handling on x86_64.
Not all of them, just the ones reported via procfs.

Also grab mce_exception_count and mce_poll_count which are reported
along with counters from irq_cpustat_t.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/x86/include/asm/hardirq.h | 22 +++++++++++-----------
 arch/x86/include/asm/mce.h     |  4 ++--
 arch/x86/kernel/cpu/mce/core.c |  4 ++--
 arch/x86/kernel/irq.c          | 26 +++++++++++++-------------
 4 files changed, 28 insertions(+), 28 deletions(-)

So, about the "not all of them" part. What about all other counters that
are present in irq_cpustat_t, printed out by arch_show_interrupts() for
/proc/interrupts, but are not included in arch_irq_stat_cpu() and don't
show up in total counter of /proc/stat?

These ones:

    kvm_posted_intr_ipis
    kvm_posted_intr_wakeup_ipis
    kvm_posted_intr_nested_ipis
    irq_tlb_count
    irq_deferred_error_count
    irq_hv_callback_count
    irq_hv_reenlightenment_count
    hyperv_stimer0_count

I have a feeling they should be included into the total interrupt
counter sum (and be widened to unsigned long as well). Should they?

diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 275e7fd20310..2dc9c076f611 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -9,30 +9,30 @@ typedef struct {
 #if IS_ENABLED(CONFIG_KVM_INTEL)
 	u8	     kvm_cpu_l1tf_flush_l1d;
 #endif
-	unsigned int __nmi_count;	/* arch dependent */
+	unsigned long __nmi_count;		/* arch dependent */
 #ifdef CONFIG_X86_LOCAL_APIC
-	unsigned int apic_timer_irqs;	/* arch dependent */
-	unsigned int irq_spurious_count;
-	unsigned int icr_read_retry_count;
+	unsigned long apic_timer_irqs;		/* arch dependent */
+	unsigned long irq_spurious_count;
+	unsigned long icr_read_retry_count;
 #endif
 #ifdef CONFIG_HAVE_KVM
 	unsigned int kvm_posted_intr_ipis;
 	unsigned int kvm_posted_intr_wakeup_ipis;
 	unsigned int kvm_posted_intr_nested_ipis;
 #endif
-	unsigned int x86_platform_ipis;	/* arch dependent */
-	unsigned int apic_perf_irqs;
-	unsigned int apic_irq_work_irqs;
+	unsigned long x86_platform_ipis;	/* arch dependent */
+	unsigned long apic_perf_irqs;
+	unsigned long apic_irq_work_irqs;
 #ifdef CONFIG_SMP
-	unsigned int irq_resched_count;
-	unsigned int irq_call_count;
+	unsigned long irq_resched_count;
+	unsigned long irq_call_count;
 #endif
 	unsigned int irq_tlb_count;
 #ifdef CONFIG_X86_THERMAL_VECTOR
-	unsigned int irq_thermal_count;
+	unsigned long irq_thermal_count;
 #endif
 #ifdef CONFIG_X86_MCE_THRESHOLD
-	unsigned int irq_threshold_count;
+	unsigned long irq_threshold_count;
 #endif
 #ifdef CONFIG_X86_MCE_AMD
 	unsigned int irq_deferred_error_count;
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4f5091..65776c6a8478 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -255,8 +255,8 @@ bool mce_is_memory_error(struct mce *m);
 bool mce_is_correctable(struct mce *m);
 int mce_usable_address(struct mce *m);
 
-DECLARE_PER_CPU(unsigned, mce_exception_count);
-DECLARE_PER_CPU(unsigned, mce_poll_count);
+DECLARE_PER_CPU(unsigned long, mce_exception_count);
+DECLARE_PER_CPU(unsigned long, mce_poll_count);
 
 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..38d418913a8f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -63,7 +63,7 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 #define SPINUNIT		100	/* 100ns */
 
-DEFINE_PER_CPU(unsigned, mce_exception_count);
+DEFINE_PER_CPU(unsigned long, mce_exception_count);
 
 DEFINE_PER_CPU_READ_MOSTLY(unsigned int, mce_num_banks);
 
@@ -718,7 +718,7 @@ static void mce_read_aux(struct mce *m, int i)
 	}
 }
 
-DEFINE_PER_CPU(unsigned, mce_poll_count);
+DEFINE_PER_CPU(unsigned long, mce_poll_count);
 
 /*
  * Poll for corrected events or events that happened before reset.
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 4ff04ce22eb6..9e47c2dd7ef9 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -62,45 +62,45 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 
 	seq_printf(p, "%*s: ", prec, "NMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->__nmi_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->__nmi_count));
 	seq_puts(p, "  Non-maskable interrupts\n");
 #ifdef CONFIG_X86_LOCAL_APIC
 	seq_printf(p, "%*s: ", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->apic_timer_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->apic_timer_irqs));
 	seq_puts(p, "  Local timer interrupts\n");
 
 	seq_printf(p, "%*s: ", prec, "SPU");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_spurious_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->irq_spurious_count));
 	seq_puts(p, "  Spurious interrupts\n");
 	seq_printf(p, "%*s: ", prec, "PMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->apic_perf_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->apic_perf_irqs));
 	seq_puts(p, "  Performance monitoring interrupts\n");
 	seq_printf(p, "%*s: ", prec, "IWI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->apic_irq_work_irqs));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->apic_irq_work_irqs));
 	seq_puts(p, "  IRQ work interrupts\n");
 	seq_printf(p, "%*s: ", prec, "RTR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->icr_read_retry_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->icr_read_retry_count));
 	seq_puts(p, "  APIC ICR read retries\n");
 	if (x86_platform_ipi_callback) {
 		seq_printf(p, "%*s: ", prec, "PLT");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->x86_platform_ipis));
+			seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->x86_platform_ipis));
 		seq_puts(p, "  Platform interrupts\n");
 	}
 #endif
 #ifdef CONFIG_SMP
 	seq_printf(p, "%*s: ", prec, "RES");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_resched_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->irq_resched_count));
 	seq_puts(p, "  Rescheduling interrupts\n");
 	seq_printf(p, "%*s: ", prec, "CAL");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_call_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->irq_call_count));
 	seq_puts(p, "  Function call interrupts\n");
 	seq_printf(p, "%*s: ", prec, "TLB");
 	for_each_online_cpu(j)
@@ -110,13 +110,13 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 #ifdef CONFIG_X86_THERMAL_VECTOR
 	seq_printf(p, "%*s: ", prec, "TRM");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_thermal_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->irq_thermal_count));
 	seq_puts(p, "  Thermal event interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE_THRESHOLD
 	seq_printf(p, "%*s: ", prec, "THR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(irq_stats(j)->irq_threshold_count));
+		seq_printf(p, "%10lu ", READ_ONCE(irq_stats(j)->irq_threshold_count));
 	seq_puts(p, "  Threshold APIC interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE_AMD
@@ -128,11 +128,11 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 #ifdef CONFIG_X86_MCE
 	seq_printf(p, "%*s: ", prec, "MCE");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(mce_exception_count, j)));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(mce_exception_count, j)));
 	seq_puts(p, "  Machine check exceptions\n");
 	seq_printf(p, "%*s: ", prec, "MCP");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", READ_ONCE(per_cpu(mce_poll_count, j)));
+		seq_printf(p, "%10lu ", READ_ONCE(per_cpu(mce_poll_count, j)));
 	seq_puts(p, "  Machine check polls\n");
 #endif
 #ifdef CONFIG_X86_HV_CALLBACK_VECTOR
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/12] x86/irq: Use unsigned long for IRQ counters more
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (6 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 07/12] x86/irq: Use unsigned long for IRQ counters Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 09/12] x86/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

These two stand aside since they are atomic_t and all, but these
counters are also reported by procfs and included into the total
interrupt count. Make them unsigned long as well.

Strictly speaking, only irq_err_count is reported in the totals,
but I felt bad for its friend to miss out on becoming wiiiiider.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/x86/include/asm/hw_irq.h  | 4 ++--
 arch/x86/kernel/apic/apic.c    | 2 +-
 arch/x86/kernel/apic/io_apic.c | 4 ++--
 arch/x86/kernel/i8259.c        | 2 +-
 arch/x86/kernel/irq.c          | 8 ++++----
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index d465ece58151..684310f723ac 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -111,8 +111,8 @@ static inline void unlock_vector_lock(void) {}
 #endif	/* CONFIG_X86_LOCAL_APIC */
 
 /* Statistics */
-extern atomic_t irq_err_count;
-extern atomic_t irq_mis_count;
+extern atomic_long_t irq_err_count;
+extern atomic_long_t irq_mis_count;
 
 extern void elcr_set_level_irq(unsigned int irq);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index d262811ce14b..d0add3f1841b 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2220,7 +2220,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_error_interrupt)
 		apic_write(APIC_ESR, 0);
 	v = apic_read(APIC_ESR);
 	ack_APIC_irq();
-	atomic_inc(&irq_err_count);
+	atomic_long_inc(&irq_err_count);
 
 	apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x",
 		    smp_processor_id(), v);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 39224e035e47..038737f5fb88 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1693,7 +1693,7 @@ static unsigned int startup_ioapic_irq(struct irq_data *data)
 	return was_pending;
 }
 
-atomic_t irq_mis_count;
+atomic_long_t irq_mis_count;
 
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 static bool io_apic_level_ack_pending(struct mp_chip_data *data)
@@ -1835,7 +1835,7 @@ static void ioapic_ack_level(struct irq_data *irq_data)
 	 * at the cpu.
 	 */
 	if (!(v & (1 << (i & 0x1f)))) {
-		atomic_inc(&irq_mis_count);
+		atomic_long_inc(&irq_mis_count);
 		eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
 	}
 
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 282b4ee1339f..7ef2facea165 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -211,7 +211,7 @@ static void mask_and_ack_8259A(struct irq_data *data)
 			       "spurious 8259A interrupt: IRQ%d.\n", irq);
 			spurious_irq_mask |= irqmask;
 		}
-		atomic_inc(&irq_err_count);
+		atomic_long_inc(&irq_err_count);
 		/*
 		 * Theoretically we do not have to handle this IRQ,
 		 * but in Linux this does not cause problems and is
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 9e47c2dd7ef9..6e7c6b4cebc1 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -29,7 +29,7 @@
 DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
 EXPORT_PER_CPU_SYMBOL(irq_stat);
 
-atomic_t irq_err_count;
+atomic_long_t irq_err_count;
 
 /*
  * 'what should we do if we get a hw irq event on an illegal vector'.
@@ -160,9 +160,9 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
 	}
 #endif
-	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
+	seq_printf(p, "%*s: %10lu\n", prec, "ERR", atomic_long_read(&irq_err_count));
 #if defined(CONFIG_X86_IO_APIC)
-	seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count));
+	seq_printf(p, "%*s: %10lu\n", prec, "MIS", atomic_long_read(&irq_mis_count));
 #endif
 #ifdef CONFIG_HAVE_KVM
 	seq_printf(p, "%*s: ", prec, "PIN");
@@ -221,7 +221,7 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
 
 u64 arch_irq_stat(void)
 {
-	u64 sum = atomic_read(&irq_err_count);
+	u64 sum = atomic_long_read(&irq_err_count);
 	return sum;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 09/12] x86/irq: Use unsigned long for IRQ counter sum
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (7 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 08/12] x86/irq: Use unsigned long for IRQ counters more Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 10/12] proc/stat: Use unsigned long for "intr" sum Alexei Lozovsky
                       ` (2 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Now that all individual counters consistently use unsigned long, use the
same type for their sum. This ensures correct handling of wrap-around
(which is more important for i386 at this point).

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 arch/x86/include/asm/hardirq.h | 4 ++--
 arch/x86/kernel/irq.c          | 9 ++++-----
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 2dc9c076f611..c05b42c4d8cb 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -54,10 +54,10 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
 
 extern void ack_bad_irq(unsigned int irq);
 
-extern u64 arch_irq_stat_cpu(unsigned int cpu);
+extern unsigned long arch_irq_stat_cpu(unsigned int cpu);
 #define arch_irq_stat_cpu	arch_irq_stat_cpu
 
-extern u64 arch_irq_stat(void);
+extern unsigned long arch_irq_stat(void);
 #define arch_irq_stat		arch_irq_stat
 
 
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6e7c6b4cebc1..2d889e26ae68 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -189,9 +189,9 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 /*
  * /proc/stat helpers
  */
-u64 arch_irq_stat_cpu(unsigned int cpu)
+unsigned long arch_irq_stat_cpu(unsigned int cpu)
 {
-	u64 sum = READ_ONCE(irq_stats(cpu)->__nmi_count);
+	unsigned long sum = READ_ONCE(irq_stats(cpu)->__nmi_count);
 
 #ifdef CONFIG_X86_LOCAL_APIC
 	sum += READ_ONCE(irq_stats(cpu)->apic_timer_irqs);
@@ -219,10 +219,9 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
 	return sum;
 }
 
-u64 arch_irq_stat(void)
+unsigned long arch_irq_stat(void)
 {
-	u64 sum = atomic_long_read(&irq_err_count);
-	return sum;
+	return atomic_long_read(&irq_err_count);
 }
 
 static __always_inline void handle_irq(struct irq_desc *desc,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 10/12] proc/stat: Use unsigned long for "intr" sum
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (8 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 09/12] x86/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 11/12] proc/stat: Use unsigned long for "softirq" sum Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 12/12] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Now that all values that are collected into "sum" are unsigned long,
make the sum itself unsigned int so that it overflows consistently
with individual components and thus retains the monotonicity.

Since seq_put_decimal_ull() is a function, we don't have to explicitly
cast sum into unsigned long long. Integer promotion will take care of
that (and the compiler will issue warnings if the types don't agree).

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 fs/proc/stat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index d9d89d7a959c..3741d671ab0a 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -110,7 +110,7 @@ static int show_stat(struct seq_file *p, void *v)
 	int i, j;
 	u64 user, nice, system, idle, iowait, irq, softirq, steal;
 	u64 guest, guest_nice;
-	u64 sum = 0;
+	unsigned long sum = 0;
 	u64 sum_softirq = 0;
 	unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
 	struct timespec64 boottime;
@@ -192,7 +192,7 @@ static int show_stat(struct seq_file *p, void *v)
 		seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
 		seq_putc(p, '\n');
 	}
-	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
+	seq_put_decimal_ull(p, "intr ", sum);
 
 	show_all_irqs(p);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 11/12] proc/stat: Use unsigned long for "softirq" sum
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (9 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 10/12] proc/stat: Use unsigned long for "intr" sum Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  2021-09-15 17:58     ` [PATCH v2 12/12] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Similarly to "intr" sum value, "softirq" sum is computed by adding up
unsigned long counters for each CPU returned by kstat_softirqs_cpu().
To preserve monotonicity, use the same integer type so that the sum
wraps around consistently. And just like before, this value does not
need to be explicitly casted into unsigned long long for display.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 fs/proc/stat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 3741d671ab0a..4c6122453946 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -111,8 +111,8 @@ static int show_stat(struct seq_file *p, void *v)
 	u64 user, nice, system, idle, iowait, irq, softirq, steal;
 	u64 guest, guest_nice;
 	unsigned long sum = 0;
-	u64 sum_softirq = 0;
-	unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
+	unsigned long sum_softirq = 0;
+	unsigned long per_softirq_sums[NR_SOFTIRQS] = {0};
 	struct timespec64 boottime;
 
 	user = nice = system = idle = iowait =
@@ -208,7 +208,7 @@ static int show_stat(struct seq_file *p, void *v)
 		nr_running(),
 		nr_iowait());
 
-	seq_put_decimal_ull(p, "softirq ", (unsigned long long)sum_softirq);
+	seq_put_decimal_ull(p, "softirq ", sum_softirq);
 
 	for (i = 0; i < NR_SOFTIRQS; i++)
 		seq_put_decimal_ull(p, " ", per_softirq_sums[i]);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 12/12] docs: proc.rst: stat: Note the interrupt counter wrap-around
  2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
                       ` (10 preceding siblings ...)
  2021-09-15 17:58     ` [PATCH v2 11/12] proc/stat: Use unsigned long for "softirq" sum Alexei Lozovsky
@ 2021-09-15 17:58     ` Alexei Lozovsky
  11 siblings, 0 replies; 27+ messages in thread
From: Alexei Lozovsky @ 2021-09-15 17:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Alexey Dobriyan, Christoph Lameter, LKML, linux-fsdevel

Let's make wrap-around documented behavior so that userspace has
no excuses for not handling it properly if they want accurate values.

On 32-bit platforms "intr" and "softirq" counters can and will
wrap-around, given enough time since boot. This can be days or hours,
depending on the load.

On 64-bit platforms these counters use 64-bit values and these are
very unlikely to oveflow before the heat death of the universe,
but it's still technically possible.

Many other counters can wrap-arond too but I'm not going to enumerate
all of them here. The interrupt counters are most likely to overflow.

Signed-off-by: Alexei Lozovsky <me@ilammy.net>
---
 Documentation/filesystems/proc.rst | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 042c418f4090..a33af0074838 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1513,6 +1513,14 @@ interrupts serviced  including  unnumbered  architecture specific  interrupts;
 each  subsequent column is the  total for that particular numbered interrupt.
 Unnumbered interrupts are not shown, only summed into the total.
 
+.. note::
+
+   On 32-bit platforms interrupt counters are 32-bit, including the total
+   count of all interrupts. Depending on the system load, these values will
+   sooner or later wrap around. If you want accurate accounting of the rate
+   and *actual* number of interrupts serviced, you should monitor the value
+   closely and handle wrap-arounds.
+
 The "ctxt" line gives the total number of context switches across all CPUs.
 
 The "btime" line gives  the time at which the  system booted, in seconds since
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-09-15 17:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10  8:53 /proc/stat interrupt counter wrap-around Alexei Lozovsky
2021-09-11  3:48 ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 1/7] genirq: Use unsigned int for irqs_sum Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 2/7] powerpc/irq: arch_irq_stat_cpu() returns unsigned int Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 3/7] x86/irq: " Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 4/7] x86/irq: arch_irq_stat() " Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 5/7] proc/stat: Use unsigned int for "intr" sum Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 6/7] proc/stat: Use unsigned int for "softirq" sum Alexei Lozovsky
2021-09-11  3:48   ` [PATCH 7/7] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky
2021-09-11  3:59     ` Randy Dunlap
2021-09-12  9:30   ` [PATCH 0/7] proc/stat: Maintain monotonicity of "intr" and "softirq" Alexey Dobriyan
2021-09-12 12:37     ` Alexei Lozovsky
2021-09-14 14:11       ` Thomas Gleixner
2021-09-15  4:24         ` Alexei Lozovsky
2021-09-15 17:58   ` [PATCH v2 00/12] " Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 01/12] genirq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 02/12] genirq: Use unsigned long for IRQ counters Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 03/12] powerpc/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 04/12] powerpc/irq: Use unsigned long for IRQ counters Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 05/12] powerpc/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 06/12] x86/irq: Use READ_ONCE for IRQ counter reads Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 07/12] x86/irq: Use unsigned long for IRQ counters Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 08/12] x86/irq: Use unsigned long for IRQ counters more Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 09/12] x86/irq: Use unsigned long for IRQ counter sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 10/12] proc/stat: Use unsigned long for "intr" sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 11/12] proc/stat: Use unsigned long for "softirq" sum Alexei Lozovsky
2021-09-15 17:58     ` [PATCH v2 12/12] docs: proc.rst: stat: Note the interrupt counter wrap-around Alexei Lozovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).