All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics
@ 2019-02-08 13:48 Thomas Gleixner
  2019-02-08 13:48 ` [patch V2 1/2] genriq: Avoid summation loops for /proc/stat Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Thomas Gleixner @ 2019-02-08 13:48 UTC (permalink / raw)
  To: LKML
  Cc: Waiman Long, Matthew Wilcox, Andrew Morton, Alexey Dobriyan,
	Kees Cook, linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

Waiman reported that on large systems with a large amount of interrupts the
readout of /proc/stat takes a long time to sum up the interrupt
statistics. In principle this is not a problem. but for unknown reasons
some enterprise quality software reads /proc/stat with a high frequency.

The reason for this is that interrupt statistics are accounted per cpu. So
the /proc/stat logic has to sum up the interrupt stats for each interrupt.

The following series addresses this by making the interrupt statitics code
in the core generate the sum directly and by making the loop in the
/proc/stat read function smarter.

V1 -> V2: Address review feedback: undo struct layout changes, make
      	  variables unsigned and add test results to the changelog.

Thanks,

        tglx

8<----------------
 fs/proc/stat.c          |   29 ++++++++++++++++++++++++++---
 include/linux/irqdesc.h |    1 +
 kernel/irq/chip.c       |   12 ++++++++++--
 kernel/irq/internals.h  |    8 +++++++-
 kernel/irq/irqdesc.c    |    7 ++++++-
 5 files changed, 50 insertions(+), 7 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch V2 1/2] genriq: Avoid summation loops for /proc/stat
  2019-02-08 13:48 [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Thomas Gleixner
@ 2019-02-08 13:48 ` Thomas Gleixner
  2019-02-08 22:32   ` Andrew Morton
  2019-02-10 20:55   ` [tip:irq/core] genirq: " tip-bot for Thomas Gleixner
  2019-02-08 13:48 ` [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2019-02-08 13:48 UTC (permalink / raw)
  To: LKML
  Cc: Waiman Long, Matthew Wilcox, Andrew Morton, Alexey Dobriyan,
	Kees Cook, linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

Waiman reported that on large systems with a large amount of interrupts the
readout of /proc/stat takes a long time to sum up the interrupt
statistics. In principle this is not a problem. but for unknown reasons
some enterprise quality software reads /proc/stat with a high frequency.

The reason for this is that interrupt statistics are accounted per cpu. So
the /proc/stat logic has to sum up the interrupt stats for each interrupt.

This can be largely avoided for interrupts which are not marked as
'PER_CPU' interrupts by simply adding a per interrupt summation counter
which is incremented along with the per interrupt per cpu counter.

The PER_CPU interrupts need to avoid that and use only per cpu accounting
because they share the interrupt number and the interrupt descriptor and
concurrent updates would conflict or require unwanted synchronization.

Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

8<-------------

v2: Undo the unintentional layout change of struct irq_desc.

 include/linux/irqdesc.h |    1 +
 kernel/irq/chip.c       |   12 ++++++++++--
 kernel/irq/internals.h  |    8 +++++++-
 kernel/irq/irqdesc.c    |    7 ++++++-
 4 files changed, 24 insertions(+), 4 deletions(-)


--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -65,6 +65,7 @@ struct irq_desc {
 	unsigned int		core_internal_state__do_not_mess_with_it;
 	unsigned int		depth;		/* nested irq disables */
 	unsigned int		wake_depth;	/* nested wake enables */
+	unsigned int		tot_count;
 	unsigned int		irq_count;	/* For detecting broken IRQs */
 	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
 	unsigned int		irqs_unhandled;
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -855,7 +855,11 @@ void handle_percpu_irq(struct irq_desc *
 {
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 
-	kstat_incr_irqs_this_cpu(desc);
+	/*
+	 * PER CPU interrupts are not serialized. Do not touch
+	 * desc->tot_count.
+	 */
+	__kstat_incr_irqs_this_cpu(desc);
 
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
@@ -884,7 +888,11 @@ void handle_percpu_devid_irq(struct irq_
 	unsigned int irq = irq_desc_get_irq(desc);
 	irqreturn_t res;
 
-	kstat_incr_irqs_this_cpu(desc);
+	/*
+	 * PER CPU interrupts are not serialized. Do not touch
+	 * desc->tot_count.
+	 */
+	__kstat_incr_irqs_this_cpu(desc);
 
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -242,12 +242,18 @@ static inline void irq_state_set_masked(
 
 #undef __irqd_to_state
 
-static inline void kstat_incr_irqs_this_cpu(struct irq_desc *desc)
+static inline void __kstat_incr_irqs_this_cpu(struct irq_desc *desc)
 {
 	__this_cpu_inc(*desc->kstat_irqs);
 	__this_cpu_inc(kstat.irqs_sum);
 }
 
+static inline void kstat_incr_irqs_this_cpu(struct irq_desc *desc)
+{
+	__kstat_incr_irqs_this_cpu(desc);
+	desc->tot_count++;
+}
+
 static inline int irq_desc_get_node(struct irq_desc *desc)
 {
 	return irq_common_data_get_node(&desc->irq_common_data);
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -119,6 +119,7 @@ static void desc_set_defaults(unsigned i
 	desc->depth = 1;
 	desc->irq_count = 0;
 	desc->irqs_unhandled = 0;
+	desc->tot_count = 0;
 	desc->name = NULL;
 	desc->owner = owner;
 	for_each_possible_cpu(cpu)
@@ -919,11 +920,15 @@ unsigned int kstat_irqs_cpu(unsigned int
 unsigned int kstat_irqs(unsigned int irq)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
-	int cpu;
 	unsigned int sum = 0;
+	int cpu;
 
 	if (!desc || !desc->kstat_irqs)
 		return 0;
+	if (!irq_settings_is_per_cpu_devid(desc) &&
+	    !irq_settings_is_per_cpu(desc))
+	    return desc->tot_count;
+
 	for_each_possible_cpu(cpu)
 		sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
 	return sum;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient
  2019-02-08 13:48 [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Thomas Gleixner
  2019-02-08 13:48 ` [patch V2 1/2] genriq: Avoid summation loops for /proc/stat Thomas Gleixner
@ 2019-02-08 13:48 ` Thomas Gleixner
  2019-02-08 17:01   ` Alexey Dobriyan
  2019-02-10 20:55   ` [tip:irq/core] " tip-bot for Thomas Gleixner
  2019-02-08 15:20 ` [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2019-02-08 13:48 UTC (permalink / raw)
  To: LKML
  Cc: Waiman Long, Matthew Wilcox, Andrew Morton, Alexey Dobriyan,
	Kees Cook, linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

Waiman reported that on large systems with a large amount of interrupts the
readout of /proc/stat takes a long time to sum up the interrupt
statistics. In principle this is not a problem. but for unknown reasons
some enterprise quality software reads /proc/stat with a high frequency.

The reason for this is that interrupt statistics are accounted per cpu. So
the /proc/stat logic has to sum up the interrupt stats for each interrupt.

The interrupt core provides now a per interrupt summary counter which can
be used to avoid the summation loops completely except for interrupts
marked PER_CPU which are only a small fraction of the interrupt space if at
all.

Another simplification is to iterate only over the active interrupts and
skip the potentially large gaps in the interrupt number space and just
print zeros for the gaps without going into the interrupt core in the first
place.

Waiman provided test results from a 4-socket IvyBridge-EX system (60-core
120-thread, 3016 irqs) excuting a test program which reads /proc/stat
50,000 times:

Before:	18.436s (sys 18.380s)
After:   3.769s (sys  3.742s)

Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---

v2: Make variables unsigned int. Add results to changelog.

 fs/proc/stat.c |   29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -79,6 +79,31 @@ static u64 get_iowait_time(int cpu)
 
 #endif
 
+static void show_irq_gap(struct seq_file *p, unsigned int gap)
+{
+	static const char zeros[] = " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0";
+
+	while (gap > 0) {
+		unsigned int inc;
+
+		inc = min_t(unsigned int, gap, ARRAY_SIZE(zeros) / 2);
+		seq_write(p, zeros, 2 * inc);
+		gap -= inc;
+	}
+}
+
+static void show_all_irqs(struct seq_file *p)
+{
+	unsigned int i, next = 0;
+
+	for_each_active_irq(i) {
+		show_irq_gap(p, i - next);
+		seq_put_decimal_ull(p, " ", kstat_irqs_usr(i));
+		next = i + 1;
+	}
+	show_irq_gap(p, nr_irqs - next);
+}
+
 static int show_stat(struct seq_file *p, void *v)
 {
 	int i, j;
@@ -156,9 +181,7 @@ static int show_stat(struct seq_file *p,
 	}
 	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
 
-	/* sum again ? it could be updated? */
-	for_each_irq_nr(j)
-		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
+	show_all_irqs(p);
 
 	seq_printf(p,
 		"\nctxt %llu\n"



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics
  2019-02-08 13:48 [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Thomas Gleixner
  2019-02-08 13:48 ` [patch V2 1/2] genriq: Avoid summation loops for /proc/stat Thomas Gleixner
  2019-02-08 13:48 ` [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient Thomas Gleixner
@ 2019-02-08 15:20 ` Waiman Long
  2019-02-08 17:01 ` Davidlohr Bueso
  2019-02-08 17:40 ` Marc Zyngier
  4 siblings, 0 replies; 14+ messages in thread
From: Waiman Long @ 2019-02-08 15:20 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Matthew Wilcox, Andrew Morton, Alexey Dobriyan, Kees Cook,
	linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

On 02/08/2019 08:48 AM, Thomas Gleixner wrote:
> Waiman reported that on large systems with a large amount of interrupts the
> readout of /proc/stat takes a long time to sum up the interrupt
> statistics. In principle this is not a problem. but for unknown reasons
> some enterprise quality software reads /proc/stat with a high frequency.
>
> The reason for this is that interrupt statistics are accounted per cpu. So
> the /proc/stat logic has to sum up the interrupt stats for each interrupt.
>
> The following series addresses this by making the interrupt statitics code
> in the core generate the sum directly and by making the loop in the
> /proc/stat read function smarter.
>
> V1 -> V2: Address review feedback: undo struct layout changes, make
>       	  variables unsigned and add test results to the changelog.
>
> Thanks,
>
>         tglx
>
> 8<----------------
>  fs/proc/stat.c          |   29 ++++++++++++++++++++++++++---
>  include/linux/irqdesc.h |    1 +
>  kernel/irq/chip.c       |   12 ++++++++++--
>  kernel/irq/internals.h  |    8 +++++++-
>  kernel/irq/irqdesc.c    |    7 ++++++-
>  5 files changed, 50 insertions(+), 7 deletions(-)
>
Thanks for the patch.

Reviewed-by: Waiman Long <longman@redhat.com>

-Longman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient
  2019-02-08 13:48 ` [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient Thomas Gleixner
@ 2019-02-08 17:01   ` Alexey Dobriyan
  2019-02-10 20:55   ` [tip:irq/core] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Alexey Dobriyan @ 2019-02-08 17:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Waiman Long, Matthew Wilcox, Andrew Morton, Kees Cook,
	linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

On Fri, Feb 08, 2019 at 02:48:04PM +0100, Thomas Gleixner wrote:
> -	for_each_irq_nr(j)
> -		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
> +	show_all_irqs(p);

Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics
  2019-02-08 13:48 [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-02-08 15:20 ` [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Waiman Long
@ 2019-02-08 17:01 ` Davidlohr Bueso
  2019-02-08 17:40 ` Marc Zyngier
  4 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2019-02-08 17:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Waiman Long, Matthew Wilcox, Andrew Morton,
	Alexey Dobriyan, Kees Cook, linux-fsdevel, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

On Fri, 08 Feb 2019, Thomas Gleixner wrote:

>Waiman reported that on large systems with a large amount of interrupts the
>readout of /proc/stat takes a long time to sum up the interrupt
>statistics. In principle this is not a problem. but for unknown reasons
>some enterprise quality software reads /proc/stat with a high frequency.

:)

>
>The reason for this is that interrupt statistics are accounted per cpu. So
>the /proc/stat logic has to sum up the interrupt stats for each interrupt.
>
>The following series addresses this by making the interrupt statitics code
>in the core generate the sum directly and by making the loop in the
>/proc/stat read function smarter.
>
>V1 -> V2: Address review feedback: undo struct layout changes, make
>      	  variables unsigned and add test results to the changelog.
>
>Thanks,
>
>        tglx
>
>8<----------------
> fs/proc/stat.c          |   29 ++++++++++++++++++++++++++---
> include/linux/irqdesc.h |    1 +
> kernel/irq/chip.c       |   12 ++++++++++--
> kernel/irq/internals.h  |    8 +++++++-
> kernel/irq/irqdesc.c    |    7 ++++++-
> 5 files changed, 50 insertions(+), 7 deletions(-)

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics
  2019-02-08 13:48 [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-02-08 17:01 ` Davidlohr Bueso
@ 2019-02-08 17:40 ` Marc Zyngier
  4 siblings, 0 replies; 14+ messages in thread
From: Marc Zyngier @ 2019-02-08 17:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Waiman Long, Matthew Wilcox, Andrew Morton, Alexey Dobriyan,
	Kees Cook, linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap

On 08/02/2019 13:48, Thomas Gleixner wrote:
> Waiman reported that on large systems with a large amount of interrupts the
> readout of /proc/stat takes a long time to sum up the interrupt
> statistics. In principle this is not a problem. but for unknown reasons
> some enterprise quality software reads /proc/stat with a high frequency.
> 
> The reason for this is that interrupt statistics are accounted per cpu. So
> the /proc/stat logic has to sum up the interrupt stats for each interrupt.
> 
> The following series addresses this by making the interrupt statitics code
> in the core generate the sum directly and by making the loop in the
> /proc/stat read function smarter.
> 
> V1 -> V2: Address review feedback: undo struct layout changes, make
>       	  variables unsigned and add test results to the changelog.
> 
> Thanks,
> 
>         tglx
> 
> 8<----------------
>  fs/proc/stat.c          |   29 ++++++++++++++++++++++++++---
>  include/linux/irqdesc.h |    1 +
>  kernel/irq/chip.c       |   12 ++++++++++--
>  kernel/irq/internals.h  |    8 +++++++-
>  kernel/irq/irqdesc.c    |    7 ++++++-
>  5 files changed, 50 insertions(+), 7 deletions(-)
> 

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 1/2] genriq: Avoid summation loops for /proc/stat
  2019-02-08 13:48 ` [patch V2 1/2] genriq: Avoid summation loops for /proc/stat Thomas Gleixner
@ 2019-02-08 22:32   ` Andrew Morton
  2019-02-08 22:46     ` Waiman Long
  2019-02-10 20:55   ` [tip:irq/core] genirq: " tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2019-02-08 22:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Waiman Long, Matthew Wilcox, Alexey Dobriyan, Kees Cook,
	linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

On Fri, 08 Feb 2019 14:48:03 +0100 Thomas Gleixner <tglx@linutronix.de> wrote:

> Waiman reported that on large systems with a large amount of interrupts the
> readout of /proc/stat takes a long time to sum up the interrupt
> statistics. In principle this is not a problem. but for unknown reasons
> some enterprise quality software reads /proc/stat with a high frequency.
> 
> The reason for this is that interrupt statistics are accounted per cpu. So
> the /proc/stat logic has to sum up the interrupt stats for each interrupt.
> 
> This can be largely avoided for interrupts which are not marked as
> 'PER_CPU' interrupts by simply adding a per interrupt summation counter
> which is incremented along with the per interrupt per cpu counter.
> 
> The PER_CPU interrupts need to avoid that and use only per cpu accounting
> because they share the interrupt number and the interrupt descriptor and
> concurrent updates would conflict or require unwanted synchronization.
> 
> ...
>
> --- a/include/linux/irqdesc.h
> +++ b/include/linux/irqdesc.h
> @@ -65,6 +65,7 @@ struct irq_desc {
>  	unsigned int		core_internal_state__do_not_mess_with_it;
>  	unsigned int		depth;		/* nested irq disables */
>  	unsigned int		wake_depth;	/* nested wake enables */
> +	unsigned int		tot_count;

Confused.  Isn't this going to quickly overflow?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 1/2] genriq: Avoid summation loops for /proc/stat
  2019-02-08 22:32   ` Andrew Morton
@ 2019-02-08 22:46     ` Waiman Long
  2019-02-08 23:21       ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2019-02-08 22:46 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner
  Cc: LKML, Matthew Wilcox, Alexey Dobriyan, Kees Cook, linux-fsdevel,
	Davidlohr Bueso, Miklos Szeredi, Daniel Colascione, Dave Chinner,
	Randy Dunlap, Marc Zyngier

On 02/08/2019 05:32 PM, Andrew Morton wrote:
> On Fri, 08 Feb 2019 14:48:03 +0100 Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> Waiman reported that on large systems with a large amount of interrupts the
>> readout of /proc/stat takes a long time to sum up the interrupt
>> statistics. In principle this is not a problem. but for unknown reasons
>> some enterprise quality software reads /proc/stat with a high frequency.
>>
>> The reason for this is that interrupt statistics are accounted per cpu. So
>> the /proc/stat logic has to sum up the interrupt stats for each interrupt.
>>
>> This can be largely avoided for interrupts which are not marked as
>> 'PER_CPU' interrupts by simply adding a per interrupt summation counter
>> which is incremented along with the per interrupt per cpu counter.
>>
>> The PER_CPU interrupts need to avoid that and use only per cpu accounting
>> because they share the interrupt number and the interrupt descriptor and
>> concurrent updates would conflict or require unwanted synchronization.
>>
>> ...
>>
>> --- a/include/linux/irqdesc.h
>> +++ b/include/linux/irqdesc.h
>> @@ -65,6 +65,7 @@ struct irq_desc {
>>  	unsigned int		core_internal_state__do_not_mess_with_it;
>>  	unsigned int		depth;		/* nested irq disables */
>>  	unsigned int		wake_depth;	/* nested wake enables */
>> +	unsigned int		tot_count;
> Confused.  Isn't this going to quickly overflow?
>
>
All the current irq count computations for each individual irqs are
using unsigned int type. Only the sum of all the irqs is u64. Yes, it is
possible for an individual irq count to exceed 32 bits given sufficient
uptime.  My PC has an uptime of 36 days and the highest irq count value
is 79,227,699. Given the current rate, the overflow will happen after
about 5 years. A larger server system may have an overflow in much
shorter period. So maybe we should consider changing all the irq counts
to unsigned long then.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 1/2] genriq: Avoid summation loops for /proc/stat
  2019-02-08 22:46     ` Waiman Long
@ 2019-02-08 23:21       ` Andrew Morton
  2019-02-09  3:41         ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2019-02-08 23:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, LKML, Matthew Wilcox, Alexey Dobriyan,
	Kees Cook, linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

On Fri, 8 Feb 2019 17:46:39 -0500 Waiman Long <longman@redhat.com> wrote:

> On 02/08/2019 05:32 PM, Andrew Morton wrote:
> > On Fri, 08 Feb 2019 14:48:03 +0100 Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> >> Waiman reported that on large systems with a large amount of interrupts the
> >> readout of /proc/stat takes a long time to sum up the interrupt
> >> statistics. In principle this is not a problem. but for unknown reasons
> >> some enterprise quality software reads /proc/stat with a high frequency.
> >>
> >> The reason for this is that interrupt statistics are accounted per cpu. So
> >> the /proc/stat logic has to sum up the interrupt stats for each interrupt.
> >>
> >> This can be largely avoided for interrupts which are not marked as
> >> 'PER_CPU' interrupts by simply adding a per interrupt summation counter
> >> which is incremented along with the per interrupt per cpu counter.
> >>
> >> The PER_CPU interrupts need to avoid that and use only per cpu accounting
> >> because they share the interrupt number and the interrupt descriptor and
> >> concurrent updates would conflict or require unwanted synchronization.
> >>
> >> ...
> >>
> >> --- a/include/linux/irqdesc.h
> >> +++ b/include/linux/irqdesc.h
> >> @@ -65,6 +65,7 @@ struct irq_desc {
> >>  	unsigned int		core_internal_state__do_not_mess_with_it;
> >>  	unsigned int		depth;		/* nested irq disables */
> >>  	unsigned int		wake_depth;	/* nested wake enables */
> >> +	unsigned int		tot_count;
> > Confused.  Isn't this going to quickly overflow?
> >
> >
> All the current irq count computations for each individual irqs are
> using unsigned int type. Only the sum of all the irqs is u64. Yes, it is
> possible for an individual irq count to exceed 32 bits given sufficient
> uptime.  My PC has an uptime of 36 days and the highest irq count value
> is 79,227,699. Given the current rate, the overflow will happen after
> about 5 years. A larger server system may have an overflow in much
> shorter period. So maybe we should consider changing all the irq counts
> to unsigned long then.

It sounds like it.  A 10khz interrupt will overflow in 4 days...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch V2 1/2] genriq: Avoid summation loops for /proc/stat
  2019-02-08 23:21       ` Andrew Morton
@ 2019-02-09  3:41         ` Matthew Wilcox
  2019-02-13 15:55           ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2019-02-09  3:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Thomas Gleixner, LKML, Alexey Dobriyan, Kees Cook,
	linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

On Fri, Feb 08, 2019 at 03:21:51PM -0800, Andrew Morton wrote:
> It sounds like it.  A 10khz interrupt will overflow in 4 days...

If you've got a 10kHz interrupt, you have a bigger problem.  Anything
happening 10,000 times a second is going to need interrupt mitigation
to perform acceptably.

More importantly, userspace can (and must) cope with wrapping.  This isn't
anything new from Thomas' patch.  As long as userspace is polling more
often than once a day, it's going to see a wrapped value before it wraps
again, so it won't miss 4 billion interrupts.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:irq/core] genirq: Avoid summation loops for /proc/stat
  2019-02-08 13:48 ` [patch V2 1/2] genriq: Avoid summation loops for /proc/stat Thomas Gleixner
  2019-02-08 22:32   ` Andrew Morton
@ 2019-02-10 20:55   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-02-10 20:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: marc.zyngier, akpm, mingo, dbueso, david, willy, miklos, rdunlap,
	tglx, dave, adobriyan, dancol, keescook, longman, linux-kernel,
	hpa

Commit-ID:  1136b0728969901a091f0471968b2b76ed14d9ad
Gitweb:     https://git.kernel.org/tip/1136b0728969901a091f0471968b2b76ed14d9ad
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 8 Feb 2019 14:48:03 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 10 Feb 2019 21:34:45 +0100

genirq: Avoid summation loops for /proc/stat

Waiman reported that on large systems with a large amount of interrupts the
readout of /proc/stat takes a long time to sum up the interrupt
statistics. In principle this is not a problem. but for unknown reasons
some enterprise quality software reads /proc/stat with a high frequency.

The reason for this is that interrupt statistics are accounted per cpu. So
the /proc/stat logic has to sum up the interrupt stats for each interrupt.

This can be largely avoided for interrupts which are not marked as
'PER_CPU' interrupts by simply adding a per interrupt summation counter
which is incremented along with the per interrupt per cpu counter.

The PER_CPU interrupts need to avoid that and use only per cpu accounting
because they share the interrupt number and the interrupt descriptor and
concurrent updates would conflict or require unwanted synchronization.

Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de


8<-------------

v2: Undo the unintentional layout change of struct irq_desc.

 include/linux/irqdesc.h |    1 +
 kernel/irq/chip.c       |   12 ++++++++++--
 kernel/irq/internals.h  |    8 +++++++-
 kernel/irq/irqdesc.c    |    7 ++++++-
 4 files changed, 24 insertions(+), 4 deletions(-)


---
 include/linux/irqdesc.h |  1 +
 kernel/irq/chip.c       | 12 ++++++++++--
 kernel/irq/internals.h  |  8 +++++++-
 kernel/irq/irqdesc.c    |  7 ++++++-
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index dd1e40ddac7d..875c41b23f20 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -65,6 +65,7 @@ struct irq_desc {
 	unsigned int		core_internal_state__do_not_mess_with_it;
 	unsigned int		depth;		/* nested irq disables */
 	unsigned int		wake_depth;	/* nested wake enables */
+	unsigned int		tot_count;
 	unsigned int		irq_count;	/* For detecting broken IRQs */
 	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
 	unsigned int		irqs_unhandled;
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 34e969069488..e960c4f46ee0 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -855,7 +855,11 @@ void handle_percpu_irq(struct irq_desc *desc)
 {
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 
-	kstat_incr_irqs_this_cpu(desc);
+	/*
+	 * PER CPU interrupts are not serialized. Do not touch
+	 * desc->tot_count.
+	 */
+	__kstat_incr_irqs_this_cpu(desc);
 
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
@@ -884,7 +888,11 @@ void handle_percpu_devid_irq(struct irq_desc *desc)
 	unsigned int irq = irq_desc_get_irq(desc);
 	irqreturn_t res;
 
-	kstat_incr_irqs_this_cpu(desc);
+	/*
+	 * PER CPU interrupts are not serialized. Do not touch
+	 * desc->tot_count.
+	 */
+	__kstat_incr_irqs_this_cpu(desc);
 
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index ca6afa267070..e74e7eea76cf 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -242,12 +242,18 @@ static inline void irq_state_set_masked(struct irq_desc *desc)
 
 #undef __irqd_to_state
 
-static inline void kstat_incr_irqs_this_cpu(struct irq_desc *desc)
+static inline void __kstat_incr_irqs_this_cpu(struct irq_desc *desc)
 {
 	__this_cpu_inc(*desc->kstat_irqs);
 	__this_cpu_inc(kstat.irqs_sum);
 }
 
+static inline void kstat_incr_irqs_this_cpu(struct irq_desc *desc)
+{
+	__kstat_incr_irqs_this_cpu(desc);
+	desc->tot_count++;
+}
+
 static inline int irq_desc_get_node(struct irq_desc *desc)
 {
 	return irq_common_data_get_node(&desc->irq_common_data);
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index ee062b7939d3..f98293d0e173 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -119,6 +119,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
 	desc->depth = 1;
 	desc->irq_count = 0;
 	desc->irqs_unhandled = 0;
+	desc->tot_count = 0;
 	desc->name = NULL;
 	desc->owner = owner;
 	for_each_possible_cpu(cpu)
@@ -919,11 +920,15 @@ unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
 unsigned int kstat_irqs(unsigned int irq)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
-	int cpu;
 	unsigned int sum = 0;
+	int cpu;
 
 	if (!desc || !desc->kstat_irqs)
 		return 0;
+	if (!irq_settings_is_per_cpu_devid(desc) &&
+	    !irq_settings_is_per_cpu(desc))
+	    return desc->tot_count;
+
 	for_each_possible_cpu(cpu)
 		sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
 	return sum;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:irq/core] proc/stat: Make the interrupt statistics more efficient
  2019-02-08 13:48 ` [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient Thomas Gleixner
  2019-02-08 17:01   ` Alexey Dobriyan
@ 2019-02-10 20:55   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-02-10 20:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: keescook, adobriyan, linux-kernel, dave, marc.zyngier, rdunlap,
	dbueso, hpa, david, willy, mingo, miklos, tglx, dancol, akpm,
	longman

Commit-ID:  c2da3f1b711173b72378258496b49f74db7479de
Gitweb:     https://git.kernel.org/tip/c2da3f1b711173b72378258496b49f74db7479de
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 8 Feb 2019 14:48:04 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 10 Feb 2019 21:34:46 +0100

proc/stat: Make the interrupt statistics more efficient

Waiman reported that on large systems with a large amount of interrupts the
readout of /proc/stat takes a long time to sum up the interrupt
statistics. In principle this is not a problem. but for unknown reasons
some enterprise quality software reads /proc/stat with a high frequency.

The reason for this is that interrupt statistics are accounted per cpu. So
the /proc/stat logic has to sum up the interrupt stats for each interrupt.

The interrupt core provides now a per interrupt summary counter which can
be used to avoid the summation loops completely except for interrupts
marked PER_CPU which are only a small fraction of the interrupt space if at
all.

Another simplification is to iterate only over the active interrupts and
skip the potentially large gaps in the interrupt number space and just
print zeros for the gaps without going into the interrupt core in the first
place.

Waiman provided test results from a 4-socket IvyBridge-EX system (60-core
120-thread, 3016 irqs) excuting a test program which reads /proc/stat
50,000 times:

   Before: 18.436s (sys 18.380s)
   After:   3.769s (sys  3.742s)

Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20190208135021.013828701@linutronix.de

---
 fs/proc/stat.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 535eda7857cf..76175211b304 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -79,6 +79,31 @@ static u64 get_iowait_time(int cpu)
 
 #endif
 
+static void show_irq_gap(struct seq_file *p, unsigned int gap)
+{
+	static const char zeros[] = " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0";
+
+	while (gap > 0) {
+		unsigned int inc;
+
+		inc = min_t(unsigned int, gap, ARRAY_SIZE(zeros) / 2);
+		seq_write(p, zeros, 2 * inc);
+		gap -= inc;
+	}
+}
+
+static void show_all_irqs(struct seq_file *p)
+{
+	unsigned int i, next = 0;
+
+	for_each_active_irq(i) {
+		show_irq_gap(p, i - next);
+		seq_put_decimal_ull(p, " ", kstat_irqs_usr(i));
+		next = i + 1;
+	}
+	show_irq_gap(p, nr_irqs - next);
+}
+
 static int show_stat(struct seq_file *p, void *v)
 {
 	int i, j;
@@ -156,9 +181,7 @@ static int show_stat(struct seq_file *p, void *v)
 	}
 	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
 
-	/* sum again ? it could be updated? */
-	for_each_irq_nr(j)
-		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
+	show_all_irqs(p);
 
 	seq_printf(p,
 		"\nctxt %llu\n"

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [patch V2 1/2] genriq: Avoid summation loops for /proc/stat
  2019-02-09  3:41         ` Matthew Wilcox
@ 2019-02-13 15:55           ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2019-02-13 15:55 UTC (permalink / raw)
  To: 'Matthew Wilcox', Andrew Morton
  Cc: Waiman Long, Thomas Gleixner, LKML, Alexey Dobriyan, Kees Cook,
	linux-fsdevel, Davidlohr Bueso, Miklos Szeredi,
	Daniel Colascione, Dave Chinner, Randy Dunlap, Marc Zyngier

From: Matthew Wilcox
> Sent: 09 February 2019 03:41
> 
> On Fri, Feb 08, 2019 at 03:21:51PM -0800, Andrew Morton wrote:
> > It sounds like it.  A 10khz interrupt will overflow in 4 days...
> 
> If you've got a 10kHz interrupt, you have a bigger problem.  Anything
> happening 10,000 times a second is going to need interrupt mitigation
> to perform acceptably.

Not necessarily - you may want the immediate interrupt for each
received ethernet packet.

> More importantly, userspace can (and must) cope with wrapping.  This isn't
> anything new from Thomas' patch.  As long as userspace is polling more
> often than once a day, it's going to see a wrapped value before it wraps
> again, so it won't miss 4 billion interrupts.

If userspace is expected to detect wraps, making the sum 64bit is
pointless, confusing and stupid.
The code would have to mask off the high bits before determining
that the value has decreased and then adding in 2^32.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-02-13 15:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-08 13:48 [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Thomas Gleixner
2019-02-08 13:48 ` [patch V2 1/2] genriq: Avoid summation loops for /proc/stat Thomas Gleixner
2019-02-08 22:32   ` Andrew Morton
2019-02-08 22:46     ` Waiman Long
2019-02-08 23:21       ` Andrew Morton
2019-02-09  3:41         ` Matthew Wilcox
2019-02-13 15:55           ` David Laight
2019-02-10 20:55   ` [tip:irq/core] genirq: " tip-bot for Thomas Gleixner
2019-02-08 13:48 ` [patch V2 2/2] proc/stat: Make the interrupt statistics more efficient Thomas Gleixner
2019-02-08 17:01   ` Alexey Dobriyan
2019-02-10 20:55   ` [tip:irq/core] " tip-bot for Thomas Gleixner
2019-02-08 15:20 ` [patch V2 0/2] genirq, proc: Speedup /proc/stat interrupt statistics Waiman Long
2019-02-08 17:01 ` Davidlohr Bueso
2019-02-08 17:40 ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.