All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem: scaling of /proc/stat on large systems
@ 2010-09-29 12:22 Jack Steiner
  2010-09-30  5:09 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Jack Steiner @ 2010-09-29 12:22 UTC (permalink / raw)
  To: yinghai, mingo, akpm; +Cc: linux-kernel

I'm looking for suggestions on how to fix a scaling problem with access to
/proc/stat.

On a large x86_64 system (4096p, 256 nodes, 5530 IRQs), access to
/proc/stat takes too long -  more than 12 sec:

	# time cat /proc/stat >/dev/null
	real	12.630s
	user	 0.000s
	sys	12.629s

This affects top, ps (some variants), w, glibc (sysconf) and much more.


One of the items reported in /proc/stat is a total count of interrupts that
have been received. This calculation requires summation of the interrupts
received on each cpu (kstat_irqs_cpu()).

The data is kept in per-cpu arrays linked to each irq_desc. On a
4096p/5530IRQ system summing this data requires accessing ~90MB.


Deleting the summation of the kstat_irqs_cpu data eliminates the high
access time but is an API breakage that I assume is unacceptible.

Another possibility would be using delayed work (similar to vmstat_update)
that periodically sums the data into a single array. The disadvantage in
this approach is that there would be a delay between receipt of an
interrupt & it's count appearing /proc/stat. Is this an issue for anyone?
Another disadvantage is that it adds to the overall "noise" introduced by
kernel threads.

Is there a better approach to take?


--- jack

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-09-29 12:22 Problem: scaling of /proc/stat on large systems Jack Steiner
@ 2010-09-30  5:09 ` KAMEZAWA Hiroyuki
  2010-10-04 14:34   ` Jack Steiner
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-30  5:09 UTC (permalink / raw)
  To: Jack Steiner; +Cc: yinghai, mingo, akpm, linux-kernel

On Wed, 29 Sep 2010 07:22:06 -0500
Jack Steiner <steiner@sgi.com> wrote:

> I'm looking for suggestions on how to fix a scaling problem with access to
> /proc/stat.
> 
> On a large x86_64 system (4096p, 256 nodes, 5530 IRQs), access to
> /proc/stat takes too long -  more than 12 sec:
> 
> 	# time cat /proc/stat >/dev/null
> 	real	12.630s
> 	user	 0.000s
> 	sys	12.629s
> 
> This affects top, ps (some variants), w, glibc (sysconf) and much more.
> 
> 
> One of the items reported in /proc/stat is a total count of interrupts that
> have been received. This calculation requires summation of the interrupts
> received on each cpu (kstat_irqs_cpu()).
> 
> The data is kept in per-cpu arrays linked to each irq_desc. On a
> 4096p/5530IRQ system summing this data requires accessing ~90MB.
> 
Wow.

> 
> Deleting the summation of the kstat_irqs_cpu data eliminates the high
> access time but is an API breakage that I assume is unacceptible.
> 
> Another possibility would be using delayed work (similar to vmstat_update)
> that periodically sums the data into a single array. The disadvantage in
> this approach is that there would be a delay between receipt of an
> interrupt & it's count appearing /proc/stat. Is this an issue for anyone?
> Another disadvantage is that it adds to the overall "noise" introduced by
> kernel threads.
> 
> Is there a better approach to take?
> 

Hmm, this ? 
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

/proc/stat shows the total number of all interrupts to each cpu. But when
the number of IRQs are very large, it take very long time and 'cat /proc/stat'
takes more than 10 secs. This is because sum of all irq events are counted
when /proc/stat is read. This patch adds "sum of all irq" counter percpu
and reduce read costs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 fs/proc/stat.c              |    4 +---
 include/linux/kernel_stat.h |   14 ++++++++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

Index: mmotm-0922/fs/proc/stat.c
===================================================================
--- mmotm-0922.orig/fs/proc/stat.c
+++ mmotm-0922/fs/proc/stat.c
@@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
 		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
 		guest_nice = cputime64_add(guest_nice,
 			kstat_cpu(i).cpustat.guest_nice);
-		for_each_irq_nr(j) {
-			sum += kstat_irqs_cpu(j, i);
-		}
+		sum = kstat_cpu_irqs_sum(i);
 		sum += arch_irq_stat_cpu(i);
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
Index: mmotm-0922/include/linux/kernel_stat.h
===================================================================
--- mmotm-0922.orig/include/linux/kernel_stat.h
+++ mmotm-0922/include/linux/kernel_stat.h
@@ -33,6 +33,7 @@ struct kernel_stat {
 #ifndef CONFIG_GENERIC_HARDIRQS
        unsigned int irqs[NR_IRQS];
 #endif
+	unsigned long irqs_sum;
 	unsigned int softirqs[NR_SOFTIRQS];
 };
 
@@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
 					    struct irq_desc *desc)
 {
 	kstat_this_cpu.irqs[irq]++;
+	kstat_this_cpu.irqs_sum++;
 }
 
 static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
@@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
 extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
 #define kstat_irqs_this_cpu(DESC) \
 	((DESC)->kstat_irqs[smp_processor_id()])
-#define kstat_incr_irqs_this_cpu(irqno, DESC) \
-	((DESC)->kstat_irqs[smp_processor_id()]++)
+#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
+	((DESC)->kstat_irqs[smp_processor_id()]++);\
+	kstat_this_cpu.irqs_sum++;} while (0)
 
 #endif
 
@@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
 	return sum;
 }
 
+/*
+ * Number of interrupts per cpu, since bootup
+ */
+static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
+{
+	return kstat_cpu(cpu).irqs_sum;
+}
 
 /*
  * Lock/unlock the current runqueue - to extract task statistics:



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-09-30  5:09 ` KAMEZAWA Hiroyuki
@ 2010-10-04 14:34   ` Jack Steiner
  2010-10-05  1:36     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Jack Steiner @ 2010-10-04 14:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: yinghai, mingo, akpm, linux-kernel

On Thu, Sep 30, 2010 at 02:09:01PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Sep 2010 07:22:06 -0500
> Jack Steiner <steiner@sgi.com> wrote:


I was able to run on the 4096p system over the weekend. The patch is a 
definite improvement & partially fixes the problem:

A "cat /proc/stat >/dev/null" improved:

        OLD:    real    12.627s
        NEW:    real     2.459


A large part of the remaining overhead is in the second summation 
 of irq information:


    static int show_stat(struct seq_file *p, void *v)
        ...
        /* sum again ? it could be updated? */
        for_each_irq_nr(j) {
                per_irq_sum = 0;
                for_each_possible_cpu(i)
                        per_irq_sum += kstat_irqs_cpu(j, i);

                seq_printf(p, " %u", per_irq_sum);
        }

Can this be fixed using the same approach as in the current patch?


--- jack

> 
> > I'm looking for suggestions on how to fix a scaling problem with access to
> > /proc/stat.
> > 
> > On a large x86_64 system (4096p, 256 nodes, 5530 IRQs), access to
> > /proc/stat takes too long -  more than 12 sec:
> > 
> > 	# time cat /proc/stat >/dev/null
> > 	real	12.630s
> > 	user	 0.000s
> > 	sys	12.629s
> > 
> > This affects top, ps (some variants), w, glibc (sysconf) and much more.
> > 
> > 
> > One of the items reported in /proc/stat is a total count of interrupts that
> > have been received. This calculation requires summation of the interrupts
> > received on each cpu (kstat_irqs_cpu()).
> > 
> > The data is kept in per-cpu arrays linked to each irq_desc. On a
> > 4096p/5530IRQ system summing this data requires accessing ~90MB.
> > 
> Wow.
> 
> > 
> > Deleting the summation of the kstat_irqs_cpu data eliminates the high
> > access time but is an API breakage that I assume is unacceptible.
> > 
> > Another possibility would be using delayed work (similar to vmstat_update)
> > that periodically sums the data into a single array. The disadvantage in
> > this approach is that there would be a delay between receipt of an
> > interrupt & it's count appearing /proc/stat. Is this an issue for anyone?
> > Another disadvantage is that it adds to the overall "noise" introduced by
> > kernel threads.
> > 
> > Is there a better approach to take?
> > 
> 
> Hmm, this ? 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> /proc/stat shows the total number of all interrupts to each cpu. But when
> the number of IRQs are very large, it take very long time and 'cat /proc/stat'
> takes more than 10 secs. This is because sum of all irq events are counted
> when /proc/stat is read. This patch adds "sum of all irq" counter percpu
> and reduce read costs.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  fs/proc/stat.c              |    4 +---
>  include/linux/kernel_stat.h |   14 ++++++++++++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> Index: mmotm-0922/fs/proc/stat.c
> ===================================================================
> --- mmotm-0922.orig/fs/proc/stat.c
> +++ mmotm-0922/fs/proc/stat.c
> @@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
>  		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
>  		guest_nice = cputime64_add(guest_nice,
>  			kstat_cpu(i).cpustat.guest_nice);
> -		for_each_irq_nr(j) {
> -			sum += kstat_irqs_cpu(j, i);
> -		}
> +		sum = kstat_cpu_irqs_sum(i);
>  		sum += arch_irq_stat_cpu(i);
>  
>  		for (j = 0; j < NR_SOFTIRQS; j++) {
> Index: mmotm-0922/include/linux/kernel_stat.h
> ===================================================================
> --- mmotm-0922.orig/include/linux/kernel_stat.h
> +++ mmotm-0922/include/linux/kernel_stat.h
> @@ -33,6 +33,7 @@ struct kernel_stat {
>  #ifndef CONFIG_GENERIC_HARDIRQS
>         unsigned int irqs[NR_IRQS];
>  #endif
> +	unsigned long irqs_sum;
>  	unsigned int softirqs[NR_SOFTIRQS];
>  };
>  
> @@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
>  					    struct irq_desc *desc)
>  {
>  	kstat_this_cpu.irqs[irq]++;
> +	kstat_this_cpu.irqs_sum++;
>  }
>  
>  static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
> @@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
>  extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
>  #define kstat_irqs_this_cpu(DESC) \
>  	((DESC)->kstat_irqs[smp_processor_id()])
> -#define kstat_incr_irqs_this_cpu(irqno, DESC) \
> -	((DESC)->kstat_irqs[smp_processor_id()]++)
> +#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
> +	((DESC)->kstat_irqs[smp_processor_id()]++);\
> +	kstat_this_cpu.irqs_sum++;} while (0)
>  
>  #endif
>  
> @@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
>  	return sum;
>  }
>  
> +/*
> + * Number of interrupts per cpu, since bootup
> + */
> +static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
> +{
> +	return kstat_cpu(cpu).irqs_sum;
> +}
>  
>  /*
>   * Lock/unlock the current runqueue - to extract task statistics:
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-10-04 14:34   ` Jack Steiner
@ 2010-10-05  1:36     ` KAMEZAWA Hiroyuki
  2010-10-05  8:19       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  1:36 UTC (permalink / raw)
  To: Jack Steiner; +Cc: yinghai, mingo, akpm, linux-kernel

On Mon, 4 Oct 2010 09:34:15 -0500
Jack Steiner <steiner@sgi.com> wrote:

> On Thu, Sep 30, 2010 at 02:09:01PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 29 Sep 2010 07:22:06 -0500
> > Jack Steiner <steiner@sgi.com> wrote:
> 
> 
> I was able to run on the 4096p system over the weekend. The patch is a 
> definite improvement & partially fixes the problem:
> 
> A "cat /proc/stat >/dev/null" improved:
> 
>         OLD:    real    12.627s
>         NEW:    real     2.459
> 
> 
Thank you.

> A large part of the remaining overhead is in the second summation 
>  of irq information:
> 
> 
>     static int show_stat(struct seq_file *p, void *v)
>         ...
>         /* sum again ? it could be updated? */
>         for_each_irq_nr(j) {
>                 per_irq_sum = 0;
>                 for_each_possible_cpu(i)
>                         per_irq_sum += kstat_irqs_cpu(j, i);
> 
>                 seq_printf(p, " %u", per_irq_sum);
>         }
> 
> Can this be fixed using the same approach as in the current patch?
> 
> 

I guess this requres different approarch as per-cpu counter + threshould.
like vmstat[] or lib/percpu_counter. 
Maybe people don't like to access shared counter in IRQ.

But, this seems to call radixtree-lookup for the # of possible cpus.
I guess impleimenting a call to calculate a sum of irqs in a radix-tree
lookup will reduce overhead. If it's not enough, we'll have to make the
counter not-precise. I'll write an another patch.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-10-05  1:36     ` KAMEZAWA Hiroyuki
@ 2010-10-05  8:19       ` KAMEZAWA Hiroyuki
  2010-10-08 16:35         ` Jack Steiner
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  8:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Jack Steiner, yinghai, mingo, akpm, linux-kernel

On Tue, 5 Oct 2010 10:36:50 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> I guess this requres different approarch as per-cpu counter + threshould.
> like vmstat[] or lib/percpu_counter. 
> Maybe people don't like to access shared counter in IRQ.
> 
> But, this seems to call radixtree-lookup for the # of possible cpus.
> I guess impleimenting a call to calculate a sum of irqs in a radix-tree
> lookup will reduce overhead. If it's not enough, we'll have to make the
> counter not-precise. I'll write an another patch.
> 

How about this ? This is an add-on patch.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

In /proc/stat, the number of per-IRQ event is shown by making a sum
each irq's events on all cpus. But we can make use of kstat_irqs().

kstat_irqs() make a sum of IRQ events per cpu, if !CONFIG_GENERIC_HARDIRQ,
it's not a big cost. (Both of the number of cpus and irqs are small.)

If a system is very big, it does

	for_each_irq()
		for_each_cpu()
			- look up a radix tree
			- read desc->irq_stat[cpu]
This seems not efficient. This patch adds kstat_irqs() for CONFIG_GENRIC_HARDIRQ
and change the calculation as

	for_each_irq()
		look up radix tree
		for_each_cpu()
			- read desc->irq_stat[cpu]

and reduces cost.

Signged-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 fs/proc/stat.c              |    9 ++-------
 include/linux/kernel_stat.h |    5 +++++
 kernel/irq/handle.c         |   16 ++++++++++++++++
 3 files changed, 23 insertions(+), 7 deletions(-)

Index: mmotm-0928/fs/proc/stat.c
===================================================================
--- mmotm-0928.orig/fs/proc/stat.c
+++ mmotm-0928/fs/proc/stat.c
@@ -108,13 +108,8 @@ static int show_stat(struct seq_file *p,
 	seq_printf(p, "intr %llu", (unsigned long long)sum);
 
 	/* sum again ? it could be updated? */
-	for_each_irq_nr(j) {
-		per_irq_sum = 0;
-		for_each_possible_cpu(i)
-			per_irq_sum += kstat_irqs_cpu(j, i);
-
-		seq_printf(p, " %u", per_irq_sum);
-	}
+	for_each_irq_nr(j)
+		seq_printf(p, " %u", kstat_irqs(j));
 
 	seq_printf(p,
 		"\nctxt %llu\n"
Index: mmotm-0928/include/linux/kernel_stat.h
===================================================================
--- mmotm-0928.orig/include/linux/kernel_stat.h
+++ mmotm-0928/include/linux/kernel_stat.h
@@ -62,6 +62,7 @@ static inline unsigned int kstat_irqs_cp
 {
        return kstat_cpu(cpu).irqs[irq];
 }
+
 #else
 #include <linux/irq.h>
 extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
@@ -86,6 +87,7 @@ static inline unsigned int kstat_softirq
 /*
  * Number of interrupts per specific IRQ source, since bootup
  */
+#ifndef CONFIG_GENERIC_HARDIRQS
 static inline unsigned int kstat_irqs(unsigned int irq)
 {
 	unsigned int sum = 0;
@@ -96,6 +98,9 @@ static inline unsigned int kstat_irqs(un
 
 	return sum;
 }
+#else
+extern unsigned int unsigned int kstat_irqs(unsigned int irq);
+#endif
 
 /*
  * Number of interrupts per cpu, since bootup
Index: mmotm-0928/kernel/irq/handle.c
===================================================================
--- mmotm-0928.orig/kernel/irq/handle.c
+++ mmotm-0928/kernel/irq/handle.c
@@ -553,3 +553,19 @@ unsigned int kstat_irqs_cpu(unsigned int
 }
 EXPORT_SYMBOL(kstat_irqs_cpu);
 
+#ifdef CONFIG_GENERIC_HARDIRQS
+unsigned int kstat_irqs(unsigned int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	int cpu;
+	int sum = 0;
+
+	if (!desc)
+		return 0;
+
+	for_each_possible_cpu(cpu)
+		sum += desc->kstat_irqs[cpu];
+	return sum;
+}
+EXPORT_SYMBOL(kstat_irqs);
+#endif


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-10-05  8:19       ` KAMEZAWA Hiroyuki
@ 2010-10-08 16:35         ` Jack Steiner
  2010-10-12  0:09           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Jack Steiner @ 2010-10-08 16:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: yinghai, mingo, akpm, linux-kernel

On Tue, Oct 05, 2010 at 05:19:07PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 5 Oct 2010 10:36:50 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > I guess this requres different approarch as per-cpu counter + threshould.
> > like vmstat[] or lib/percpu_counter. 
> > Maybe people don't like to access shared counter in IRQ.
> > 
> > But, this seems to call radixtree-lookup for the # of possible cpus.
> > I guess impleimenting a call to calculate a sum of irqs in a radix-tree
> > lookup will reduce overhead. If it's not enough, we'll have to make the
> > counter not-precise. I'll write an another patch.
> > 
> 
> How about this ? This is an add-on patch.

Nice!!

The combination of the 2 patches solves the problem.
The timings are (4096p, 256 nodes, 4592 irqs):

	# time cat /proc/stat > /dev/null

	Baseline:		12.627 sec
	Patch1  :		 2.459 sec
	Patch 1 + Patch 2:	  .561 sec


Acked-by: Jack Steiner <steiner@sgi.com>


Thanks!!
--- jack


> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> In /proc/stat, the number of per-IRQ event is shown by making a sum
> each irq's events on all cpus. But we can make use of kstat_irqs().
> 
> kstat_irqs() make a sum of IRQ events per cpu, if !CONFIG_GENERIC_HARDIRQ,
> it's not a big cost. (Both of the number of cpus and irqs are small.)
> 
> If a system is very big, it does
> 
> 	for_each_irq()
> 		for_each_cpu()
> 			- look up a radix tree
> 			- read desc->irq_stat[cpu]
> This seems not efficient. This patch adds kstat_irqs() for CONFIG_GENRIC_HARDIRQ
> and change the calculation as
> 
> 	for_each_irq()
> 		look up radix tree
> 		for_each_cpu()
> 			- read desc->irq_stat[cpu]
> 
> and reduces cost.
> 
> Signged-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  fs/proc/stat.c              |    9 ++-------
>  include/linux/kernel_stat.h |    5 +++++
>  kernel/irq/handle.c         |   16 ++++++++++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> Index: mmotm-0928/fs/proc/stat.c
> ===================================================================
> --- mmotm-0928.orig/fs/proc/stat.c
> +++ mmotm-0928/fs/proc/stat.c
> @@ -108,13 +108,8 @@ static int show_stat(struct seq_file *p,
>  	seq_printf(p, "intr %llu", (unsigned long long)sum);
>  
>  	/* sum again ? it could be updated? */
> -	for_each_irq_nr(j) {
> -		per_irq_sum = 0;
> -		for_each_possible_cpu(i)
> -			per_irq_sum += kstat_irqs_cpu(j, i);
> -
> -		seq_printf(p, " %u", per_irq_sum);
> -	}
> +	for_each_irq_nr(j)
> +		seq_printf(p, " %u", kstat_irqs(j));
>  
>  	seq_printf(p,
>  		"\nctxt %llu\n"
> Index: mmotm-0928/include/linux/kernel_stat.h
> ===================================================================
> --- mmotm-0928.orig/include/linux/kernel_stat.h
> +++ mmotm-0928/include/linux/kernel_stat.h
> @@ -62,6 +62,7 @@ static inline unsigned int kstat_irqs_cp
>  {
>         return kstat_cpu(cpu).irqs[irq];
>  }
> +
>  #else
>  #include <linux/irq.h>
>  extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
> @@ -86,6 +87,7 @@ static inline unsigned int kstat_softirq
>  /*
>   * Number of interrupts per specific IRQ source, since bootup
>   */
> +#ifndef CONFIG_GENERIC_HARDIRQS
>  static inline unsigned int kstat_irqs(unsigned int irq)
>  {
>  	unsigned int sum = 0;
> @@ -96,6 +98,9 @@ static inline unsigned int kstat_irqs(un
>  
>  	return sum;
>  }
> +#else
> +extern unsigned int unsigned int kstat_irqs(unsigned int irq);
> +#endif
>  
>  /*
>   * Number of interrupts per cpu, since bootup
> Index: mmotm-0928/kernel/irq/handle.c
> ===================================================================
> --- mmotm-0928.orig/kernel/irq/handle.c
> +++ mmotm-0928/kernel/irq/handle.c
> @@ -553,3 +553,19 @@ unsigned int kstat_irqs_cpu(unsigned int
>  }
>  EXPORT_SYMBOL(kstat_irqs_cpu);
>  
> +#ifdef CONFIG_GENERIC_HARDIRQS
> +unsigned int kstat_irqs(unsigned int irq)
> +{
> +	struct irq_desc *desc = irq_to_desc(irq);
> +	int cpu;
> +	int sum = 0;
> +
> +	if (!desc)
> +		return 0;
> +
> +	for_each_possible_cpu(cpu)
> +		sum += desc->kstat_irqs[cpu];
> +	return sum;
> +}
> +EXPORT_SYMBOL(kstat_irqs);
> +#endif

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-10-08 16:35         ` Jack Steiner
@ 2010-10-12  0:09           ` KAMEZAWA Hiroyuki
  2010-10-12  0:22             ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-12  0:09 UTC (permalink / raw)
  To: Jack Steiner; +Cc: yinghai, mingo, akpm, linux-kernel

On Fri, 8 Oct 2010 11:35:57 -0500
Jack Steiner <steiner@sgi.com> wrote:

> On Tue, Oct 05, 2010 at 05:19:07PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 5 Oct 2010 10:36:50 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > I guess this requres different approarch as per-cpu counter + threshould.
> > > like vmstat[] or lib/percpu_counter. 
> > > Maybe people don't like to access shared counter in IRQ.
> > > 
> > > But, this seems to call radixtree-lookup for the # of possible cpus.
> > > I guess impleimenting a call to calculate a sum of irqs in a radix-tree
> > > lookup will reduce overhead. If it's not enough, we'll have to make the
> > > counter not-precise. I'll write an another patch.
> > > 
> > 
> > How about this ? This is an add-on patch.
> 
> Nice!!
> 
> The combination of the 2 patches solves the problem.
> The timings are (4096p, 256 nodes, 4592 irqs):
> 
> 	# time cat /proc/stat > /dev/null
> 
> 	Baseline:		12.627 sec
> 	Patch1  :		 2.459 sec
> 	Patch 1 + Patch 2:	  .561 sec
> 
> 
> Acked-by: Jack Steiner <steiner@sgi.com>
> 

Thank you for testing. I'll post again if necessary.

-Kame



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-10-12  0:09           ` KAMEZAWA Hiroyuki
@ 2010-10-12  0:22             ` Andrew Morton
  2010-10-12  1:02               ` KAMEZAWA Hiroyuki
  2010-10-12  2:37               ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2010-10-12  0:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Jack Steiner, yinghai, mingo, linux-kernel

On Tue, 12 Oct 2010 09:09:07 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> > > 
> > > How about this ? This is an add-on patch.
> > 
> > Nice!!
> > 
> > The combination of the 2 patches solves the problem.
> > The timings are (4096p, 256 nodes, 4592 irqs):
> > 
> > 	# time cat /proc/stat > /dev/null
> > 
> > 	Baseline:		12.627 sec
> > 	Patch1  :		 2.459 sec
> > 	Patch 1 + Patch 2:	  .561 sec
> > 
> > 
> > Acked-by: Jack Steiner <steiner@sgi.com>
> > 
> 
> Thank you for testing. I'll post again if necessary.

Yes please.  This has been going on for two weeks so memories need
refreshing.  Also we have no patch title and no changelog which
includes the testing results.

I could stitch all that together of course, but it's best that it all
be done afresh, I think.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem: scaling of /proc/stat on large systems
  2010-10-12  0:22             ` Andrew Morton
@ 2010-10-12  1:02               ` KAMEZAWA Hiroyuki
  2010-10-12  2:37               ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-12  1:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jack Steiner, yinghai, mingo, linux-kernel

On Mon, 11 Oct 2010 17:22:26 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 12 Oct 2010 09:09:07 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > > > 
> > > > How about this ? This is an add-on patch.
> > > 
> > > Nice!!
> > > 
> > > The combination of the 2 patches solves the problem.
> > > The timings are (4096p, 256 nodes, 4592 irqs):
> > > 
> > > 	# time cat /proc/stat > /dev/null
> > > 
> > > 	Baseline:		12.627 sec
> > > 	Patch1  :		 2.459 sec
> > > 	Patch 1 + Patch 2:	  .561 sec
> > > 
> > > 
> > > Acked-by: Jack Steiner <steiner@sgi.com>
> > > 
> > 
> > Thank you for testing. I'll post again if necessary.
> 
> Yes please.  This has been going on for two weeks so memories need
> refreshing.  Also we have no patch title and no changelog which
> includes the testing results.
> 
> I could stitch all that together of course, but it's best that it all
> be done afresh, I think.
> 
ok, will do today.

-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter
  2010-10-12  0:22             ` Andrew Morton
  2010-10-12  1:02               ` KAMEZAWA Hiroyuki
@ 2010-10-12  2:37               ` KAMEZAWA Hiroyuki
  2010-10-12  2:39                 ` [PATCH 2/2] improve footprint of kstat_irqs() for large system's /proc/stat KAMEZAWA Hiroyuki
  2010-10-12  3:05                 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter Yinghai Lu
  1 sibling, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-12  2:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jack Steiner, yinghai, mingo, linux-kernel

Jack Steiner reported slowness of /proc/stat on a large system.
This patch set tries to improve it.

> The combination of the 2 patches solves the problem.
> The timings are (4096p, 256 nodes, 4592 irqs):
> 
> 	# time cat /proc/stat > /dev/null
> 
> 	Baseline:		12.627 sec
> 	Patch1  :		 2.459 sec
> 	Patch 1 + Patch 2:	  .561 sec

please review.

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Problem: 'cat /proc/stat' is too slow on verrry bis system.

/proc/stat shows the total number of all interrupts to each cpu. But when
the number of IRQs are very large, it takes very long time and 'cat /proc/stat'
takes more than 10 secs. This is because sum of all irq events are counted
when /proc/stat is read. This patch adds "sum of all irq" counter percpu
and update it at events.

The cost of reading /proc/stat is important because it's used by major
applications as 'top', 'ps', 'w', etc....

A test on a host (4096cpu, 256 nodes, 4592 irqs) shows

 %time cat /proc/stat > /dev/null
 Before Patch:  12.627 sec
 After  Patch:  2.459 sec

Tested-by: Jack Steiner <steiner@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 fs/proc/stat.c              |    4 +---
 include/linux/kernel_stat.h |   14 ++++++++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/stat.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/stat.c
+++ linux-2.6.36-rc7/fs/proc/stat.c
@@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
 		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
 		guest_nice = cputime64_add(guest_nice,
 			kstat_cpu(i).cpustat.guest_nice);
-		for_each_irq_nr(j) {
-			sum += kstat_irqs_cpu(j, i);
-		}
+		sum = kstat_cpu_irqs_sum(i);
 		sum += arch_irq_stat_cpu(i);
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
Index: linux-2.6.36-rc7/include/linux/kernel_stat.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/kernel_stat.h
+++ linux-2.6.36-rc7/include/linux/kernel_stat.h
@@ -33,6 +33,7 @@ struct kernel_stat {
 #ifndef CONFIG_GENERIC_HARDIRQS
        unsigned int irqs[NR_IRQS];
 #endif
+	unsigned long irqs_sum;
 	unsigned int softirqs[NR_SOFTIRQS];
 };
 
@@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
 					    struct irq_desc *desc)
 {
 	kstat_this_cpu.irqs[irq]++;
+	kstat_this_cpu.irqs_sum++;
 }
 
 static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
@@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
 extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
 #define kstat_irqs_this_cpu(DESC) \
 	((DESC)->kstat_irqs[smp_processor_id()])
-#define kstat_incr_irqs_this_cpu(irqno, DESC) \
-	((DESC)->kstat_irqs[smp_processor_id()]++)
+#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
+	((DESC)->kstat_irqs[smp_processor_id()]++);\
+	kstat_this_cpu.irqs_sum++; } while (0)
 
 #endif
 
@@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
 	return sum;
 }
 
+/*
+ * Number of interrupts per cpu, since bootup
+ */
+static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
+{
+	return kstat_cpu(cpu).irqs_sum;
+}
 
 /*
  * Lock/unlock the current runqueue - to extract task statistics:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] improve footprint of kstat_irqs() for large system's /proc/stat
  2010-10-12  2:37               ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
@ 2010-10-12  2:39                 ` KAMEZAWA Hiroyuki
  2010-10-12  3:05                 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter Yinghai Lu
  1 sibling, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-12  2:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Jack Steiner, yinghai, mingo, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Problem: 'cat /proc/stat' takes too long in verrrry big system.

In /proc/stat, the number of per-IRQ event is shown by making a sum
each irq's events on all cpus. We can make use of kstat_irqs().
This patch replaces it, at first.

Here, performance of kstat_irqs() is a problem.
If !CONFIG_GENERIC_HARDIRQ, it's not very slow, but if CONFIG_GENERIC_HARDIRQ,
it's slow because the logic is not efficient.

If CONFIG_GENERIC_HARDIRQ, it does

	for_each_irq()
		for_each_cpu()
			- look up a radix tree
			- read desc->irq_stat[cpu]

This seems not efficient. This patch adds kstat_irqs() for
CONFIG_GENRIC_HARDIRQ and change the calculation as

	for_each_irq()
		look up radix tree
		for_each_cpu()
			- read desc->irq_stat[cpu]

This reduces cost of scanning.

A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

%time cat /proc/stat > /dev/null

Before Patch:	 2.459 sec
After Patch :	  .561 sec


Tested-by: Jack Steiner <steiner@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 fs/proc/stat.c              |    9 ++-------
 include/linux/kernel_stat.h |    4 ++++
 kernel/irq/handle.c         |   16 ++++++++++++++++
 3 files changed, 22 insertions(+), 7 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/stat.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/stat.c
+++ linux-2.6.36-rc7/fs/proc/stat.c
@@ -108,13 +108,8 @@ static int show_stat(struct seq_file *p,
 	seq_printf(p, "intr %llu", (unsigned long long)sum);
 
 	/* sum again ? it could be updated? */
-	for_each_irq_nr(j) {
-		per_irq_sum = 0;
-		for_each_possible_cpu(i)
-			per_irq_sum += kstat_irqs_cpu(j, i);
-
-		seq_printf(p, " %u", per_irq_sum);
-	}
+	for_each_irq_nr(j)
+		seq_printf(p, " %u", kstat_irqs(j));
 
 	seq_printf(p,
 		"\nctxt %llu\n"
Index: linux-2.6.36-rc7/include/linux/kernel_stat.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/kernel_stat.h
+++ linux-2.6.36-rc7/include/linux/kernel_stat.h
@@ -86,6 +86,7 @@ static inline unsigned int kstat_softirq
 /*
  * Number of interrupts per specific IRQ source, since bootup
  */
+#ifndef CONFIG_GENERIC_HARDIRQS
 static inline unsigned int kstat_irqs(unsigned int irq)
 {
 	unsigned int sum = 0;
@@ -96,6 +97,9 @@ static inline unsigned int kstat_irqs(un
 
 	return sum;
 }
+#else
+extern unsigned int kstat_irqs(unsigned int irq);
+#endif
 
 /*
  * Number of interrupts per cpu, since bootup
Index: linux-2.6.36-rc7/kernel/irq/handle.c
===================================================================
--- linux-2.6.36-rc7.orig/kernel/irq/handle.c
+++ linux-2.6.36-rc7/kernel/irq/handle.c
@@ -554,3 +554,19 @@ unsigned int kstat_irqs_cpu(unsigned int
 }
 EXPORT_SYMBOL(kstat_irqs_cpu);
 
+#ifdef CONFIG_GENERIC_HARDIRQS
+unsigned int kstat_irqs(unsigned int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	int cpu;
+	int sum = 0;
+
+	if (!desc)
+		return 0;
+
+	for_each_possible_cpu(cpu)
+		sum += desc->kstat_irqs[cpu];
+	return sum;
+}
+EXPORT_SYMBOL(kstat_irqs);
+#endif


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter
  2010-10-12  2:37               ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
  2010-10-12  2:39                 ` [PATCH 2/2] improve footprint of kstat_irqs() for large system's /proc/stat KAMEZAWA Hiroyuki
@ 2010-10-12  3:05                 ` Yinghai Lu
  2010-10-12  3:11                   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 13+ messages in thread
From: Yinghai Lu @ 2010-10-12  3:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Andrew Morton, Jack Steiner, mingo, linux-kernel

On 10/11/2010 07:37 PM, KAMEZAWA Hiroyuki wrote:
> Jack Steiner reported slowness of /proc/stat on a large system.
> This patch set tries to improve it.
> 
>> The combination of the 2 patches solves the problem.
>> The timings are (4096p, 256 nodes, 4592 irqs):
>>
>> 	# time cat /proc/stat > /dev/null
>>
>> 	Baseline:		12.627 sec
>> 	Patch1  :		 2.459 sec
>> 	Patch 1 + Patch 2:	  .561 sec
> 
> please review.
> 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Problem: 'cat /proc/stat' is too slow on verrry bis system.
> 
> /proc/stat shows the total number of all interrupts to each cpu. But when
> the number of IRQs are very large, it takes very long time and 'cat /proc/stat'
> takes more than 10 secs. This is because sum of all irq events are counted
> when /proc/stat is read. This patch adds "sum of all irq" counter percpu
> and update it at events.
> 
> The cost of reading /proc/stat is important because it's used by major
> applications as 'top', 'ps', 'w', etc....
> 
> A test on a host (4096cpu, 256 nodes, 4592 irqs) shows
> 
>  %time cat /proc/stat > /dev/null
>  Before Patch:  12.627 sec
>  After  Patch:  2.459 sec
> 
> Tested-by: Jack Steiner <steiner@sgi.com>
> Acked-by: Jack Steiner <steiner@sgi.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  fs/proc/stat.c              |    4 +---
>  include/linux/kernel_stat.h |   14 ++++++++++++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6.36-rc7/fs/proc/stat.c
> ===================================================================
> --- linux-2.6.36-rc7.orig/fs/proc/stat.c
> +++ linux-2.6.36-rc7/fs/proc/stat.c
> @@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
>  		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
>  		guest_nice = cputime64_add(guest_nice,
>  			kstat_cpu(i).cpustat.guest_nice);
> -		for_each_irq_nr(j) {
> -			sum += kstat_irqs_cpu(j, i);
> -		}
> +		sum = kstat_cpu_irqs_sum(i);

should be 
+		sum += kstat_cpu_irqs_sum(i);

>  		sum += arch_irq_stat_cpu(i);
>  
>  		for (j = 0; j < NR_SOFTIRQS; j++) {
> Index: linux-2.6.36-rc7/include/linux/kernel_stat.h
> ===================================================================
> --- linux-2.6.36-rc7.orig/include/linux/kernel_stat.h
> +++ linux-2.6.36-rc7/include/linux/kernel_stat.h
> @@ -33,6 +33,7 @@ struct kernel_stat {
>  #ifndef CONFIG_GENERIC_HARDIRQS
>         unsigned int irqs[NR_IRQS];
>  #endif
> +	unsigned long irqs_sum;
>  	unsigned int softirqs[NR_SOFTIRQS];
>  };
>  
> @@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
>  					    struct irq_desc *desc)
>  {
>  	kstat_this_cpu.irqs[irq]++;
> +	kstat_this_cpu.irqs_sum++;
>  }
>  
>  static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
> @@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
>  extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
>  #define kstat_irqs_this_cpu(DESC) \
>  	((DESC)->kstat_irqs[smp_processor_id()])
> -#define kstat_incr_irqs_this_cpu(irqno, DESC) \
> -	((DESC)->kstat_irqs[smp_processor_id()]++)
> +#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
> +	((DESC)->kstat_irqs[smp_processor_id()]++);\
> +	kstat_this_cpu.irqs_sum++; } while (0)
>  
>  #endif
>  
> @@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
>  	return sum;
>  }
>  
> +/*
> + * Number of interrupts per cpu, since bootup
> + */
> +static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
> +{
> +	return kstat_cpu(cpu).irqs_sum;
> +}
>  
>  /*
>   * Lock/unlock the current runqueue - to extract task statistics:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter
  2010-10-12  3:05                 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter Yinghai Lu
@ 2010-10-12  3:11                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-12  3:11 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Andrew Morton, Jack Steiner, mingo, linux-kernel

On Mon, 11 Oct 2010 20:05:59 -0700
Yinghai Lu <yinghai@kernel.org> wrote:
			kstat_cpu(i).cpustat.guest_nice);
> > -		for_each_irq_nr(j) {
> > -			sum += kstat_irqs_cpu(j, i);
> > -		}
> > +		sum = kstat_cpu_irqs_sum(i);
> 
> should be 
> +		sum += kstat_cpu_irqs_sum(i);
> 
Ouch...thanks.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Problem: 'cat /proc/stat' is too slow on verrry bis system.

/proc/stat shows the total number of all interrupts to each cpu. But when
the number of IRQs are very large, it takes very long time and 'cat /proc/stat'
takes more than 10 secs. This is because sum of all irq events are counted
when /proc/stat is read. This patch adds "sum of all irq" counter percpu
and update it at events.

The cost of reading /proc/stat is important because it's used by major
applications as 'top', 'ps', 'w', etc....

A test on a host (4096cpu, 256 nodes, 4592 irqs) shows

 %time cat /proc/stat > /dev/null
 Before Patch:  12.627 sec
 After  Patch:  2.459 sec

Changelog v1->v2:
 - fixed wrong update of a value "sum".

Tested-by: Jack Steiner <steiner@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 fs/proc/stat.c              |    4 +---
 include/linux/kernel_stat.h |   14 ++++++++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

Index: linux-2.6.36-rc7/fs/proc/stat.c
===================================================================
--- linux-2.6.36-rc7.orig/fs/proc/stat.c
+++ linux-2.6.36-rc7/fs/proc/stat.c
@@ -52,9 +52,7 @@ static int show_stat(struct seq_file *p,
 		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
 		guest_nice = cputime64_add(guest_nice,
 			kstat_cpu(i).cpustat.guest_nice);
-		for_each_irq_nr(j) {
-			sum += kstat_irqs_cpu(j, i);
-		}
+		sum += kstat_cpu_irqs_sum(i);
 		sum += arch_irq_stat_cpu(i);
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
Index: linux-2.6.36-rc7/include/linux/kernel_stat.h
===================================================================
--- linux-2.6.36-rc7.orig/include/linux/kernel_stat.h
+++ linux-2.6.36-rc7/include/linux/kernel_stat.h
@@ -33,6 +33,7 @@ struct kernel_stat {
 #ifndef CONFIG_GENERIC_HARDIRQS
        unsigned int irqs[NR_IRQS];
 #endif
+	unsigned long irqs_sum;
 	unsigned int softirqs[NR_SOFTIRQS];
 };
 
@@ -54,6 +55,7 @@ static inline void kstat_incr_irqs_this_
 					    struct irq_desc *desc)
 {
 	kstat_this_cpu.irqs[irq]++;
+	kstat_this_cpu.irqs_sum++;
 }
 
 static inline unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
@@ -65,8 +67,9 @@ static inline unsigned int kstat_irqs_cp
 extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
 #define kstat_irqs_this_cpu(DESC) \
 	((DESC)->kstat_irqs[smp_processor_id()])
-#define kstat_incr_irqs_this_cpu(irqno, DESC) \
-	((DESC)->kstat_irqs[smp_processor_id()]++)
+#define kstat_incr_irqs_this_cpu(irqno, DESC) do {\
+	((DESC)->kstat_irqs[smp_processor_id()]++);\
+	kstat_this_cpu.irqs_sum++; } while (0)
 
 #endif
 
@@ -94,6 +97,13 @@ static inline unsigned int kstat_irqs(un
 	return sum;
 }
 
+/*
+ * Number of interrupts per cpu, since bootup
+ */
+static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
+{
+	return kstat_cpu(cpu).irqs_sum;
+}
 
 /*
  * Lock/unlock the current runqueue - to extract task statistics:


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-10-12  3:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-29 12:22 Problem: scaling of /proc/stat on large systems Jack Steiner
2010-09-30  5:09 ` KAMEZAWA Hiroyuki
2010-10-04 14:34   ` Jack Steiner
2010-10-05  1:36     ` KAMEZAWA Hiroyuki
2010-10-05  8:19       ` KAMEZAWA Hiroyuki
2010-10-08 16:35         ` Jack Steiner
2010-10-12  0:09           ` KAMEZAWA Hiroyuki
2010-10-12  0:22             ` Andrew Morton
2010-10-12  1:02               ` KAMEZAWA Hiroyuki
2010-10-12  2:37               ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter KAMEZAWA Hiroyuki
2010-10-12  2:39                 ` [PATCH 2/2] improve footprint of kstat_irqs() for large system's /proc/stat KAMEZAWA Hiroyuki
2010-10-12  3:05                 ` [PATCH 1/2] fix slowness of /proc/stat per-cpu IRQ sum calculation on large system by a new counter Yinghai Lu
2010-10-12  3:11                   ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.