linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
@ 2008-03-26 21:11 Alan Mayer
  2008-03-26 21:23 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Mayer @ 2008-03-26 21:11 UTC (permalink / raw)
  To: torvalds, mingo
  Cc: linux-kernel list, Robin Holt, Jack Steiner, Russ Anderson


Subject: [PATCH] x86_64: resize NR_IRQS for large machines

From: Alan Mayer <ajm@sgi.com>

On machines with very large numbers of cpus, tables that are dimensioned
by NR_IRQS get very large, especially the irq_desc table.  They are also
very sparsely used.  When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
to set NR_IRQS, otherwise use NR_CPUS.

Signed-off-by: Alan Mayer <ajm@sgi.com>

Reviewed-by: Christoph Lameter <clameter@sgi.com>

---

===================================================================
--- v2.6.25-rc6.orig/include/asm-x86/irq_64.h	2008-03-19 16:52:52.000000000 -0500
+++ v2.6.25-rc6/include/asm-x86/irq_64.h	2008-03-26 14:02:32.000000000 -0500
@@ -10,6 +10,8 @@
  *	<tomsoft@informatik.tu-chemnitz.de>
  */
 
+#include <asm/apicdef.h>
+
 #define TIMER_IRQ 0
 
 /*
@@ -31,7 +33,11 @@
 
 #define FIRST_SYSTEM_VECTOR	0xef   /* duplicated in hw_irq.h */
 
-#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
+#if NR_CPUS < MAX_IO_APICS
+#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
+#else
+#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
+#endif
 #define NR_IRQ_VECTORS NR_IRQS
 
 static __inline__ int irq_canonicalize(int irq)
Index: v2.6.25-rc6/include/linux/kernel_stat.h
===================================================================
--- v2.6.25-rc6.orig/include/linux/kernel_stat.h	2008-03-19 16:53:00.000000000 -0500
+++ v2.6.25-rc6/include/linux/kernel_stat.h	2008-03-20 11:12:27.000000000 -0500
@@ -1,11 +1,11 @@
 #ifndef _LINUX_KERNEL_STAT_H
 #define _LINUX_KERNEL_STAT_H
 
-#include <asm/irq.h>
 #include <linux/smp.h>
 #include <linux/threads.h>
 #include <linux/percpu.h>
 #include <linux/cpumask.h>
+#include <asm/irq.h>
 #include <asm/cputime.h>
 
 /*


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-26 21:11 [PATCH] x86_64: resize NR_IRQS for large machines (re-submit) Alan Mayer
@ 2008-03-26 21:23 ` Ingo Molnar
  2008-03-26 21:40   ` Alan Mayer
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-03-26 21:23 UTC (permalink / raw)
  To: Alan Mayer
  Cc: torvalds, linux-kernel list, Robin Holt, Jack Steiner, Russ Anderson


* Alan Mayer <ajm@sgi.com> wrote:

> On machines with very large numbers of cpus, tables that are 
> dimensioned by NR_IRQS get very large, especially the irq_desc table.  
> They are also very sparsely used.  When the cpu count is > 
> MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.

thanks Alan, applied this in place of the other patch.

this bit is still ugly:

> -#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
> +#if NR_CPUS < MAX_IO_APICS
> +#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
> +#else
> +#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
> +#endif
>  #define NR_IRQ_VECTORS NR_IRQS

but it doesnt really depart from the current status quo of huge 
[NR_IRQS] arrays either. Patches that make NR_IRQS a variable are 
welcome :)

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-26 21:23 ` Ingo Molnar
@ 2008-03-26 21:40   ` Alan Mayer
  2008-03-26 22:24     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Mayer @ 2008-03-26 21:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Mayer, torvalds, linux-kernel list, Robin Holt,
	Jack Steiner, Russ Anderson

On Wed, 26 Mar 2008, Ingo Molnar wrote:

> 
> * Alan Mayer <ajm@sgi.com> wrote:
> 
> > On machines with very large numbers of cpus, tables that are 
> > dimensioned by NR_IRQS get very large, especially the irq_desc table.  
> > They are also very sparsely used.  When the cpu count is > 
> > MAX_IO_APICS, use MAX_IO_APICS to set NR_IRQS, otherwise use NR_CPUS.
> 
> thanks Alan, applied this in place of the other patch.
> 
> this bit is still ugly:
> 
> > -#define NR_IRQS (NR_VECTORS + (32 *NR_CPUS))
> > +#if NR_CPUS < MAX_IO_APICS
> > +#define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
> > +#else
> > +#define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
> > +#endif
> >  #define NR_IRQ_VECTORS NR_IRQS
> 
> but it doesnt really depart from the current status quo of huge 
> [NR_IRQS] arrays either. Patches that make NR_IRQS a variable are 
> welcome :)
> 
> 	Ingo
> 

Good luck with that.  If i come up with something that's elegant enough
to make it worth the risk, I'll let you know.  Changing NR_IRQS to a variable
touches every arch and a lot of drivers.  Someone is bound to choke
on it, so it has to be something worth fighting for.

		--ajm

Lately it occurs to me,
What a long, strange trip it's been.
--
Alan J. Mayer
SGI
ajm@sgi.com
WORK: 651-683-3131
HOME: 651-407-0134
--


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-26 21:40   ` Alan Mayer
@ 2008-03-26 22:24     ` Ingo Molnar
  2008-03-27 16:16       ` Alan Mayer
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-03-26 22:24 UTC (permalink / raw)
  To: Alan Mayer
  Cc: torvalds, linux-kernel list, Robin Holt, Jack Steiner, Russ Anderson


* Alan Mayer <ajm@sgi.com> wrote:

> > but it doesnt really depart from the current status quo of huge 
> > [NR_IRQS] arrays either. Patches that make NR_IRQS a variable are 
> > welcome :)
> 
> Good luck with that.  If i come up with something that's elegant 
> enough to make it worth the risk, I'll let you know.  Changing NR_IRQS 
> to a variable touches every arch and a lot of drivers.  Someone is 
> bound to choke on it, so it has to be something worth fighting for.

well, i dont it has to be (or it should be) an all or nothing patch, 
given the complexity and risks involved.

- we should first introduce a nr_irqs variable and a Kconfig switch 
  (say CONFIG_ARCH_HAS_DYNAMIC_NR_IRQS) for architectures to toggle. If 
  the switch is toggled, nr_irqs is a variable, otherwise it's a carbon 
  copy of NR_IRQS. Some array-definition, declaration and initialization 
  wrappers are provided as well.

- then the core code, x86 and most drivers can be converted to nr_irqs.
  The switch might initially even be user-selectable if
  CONFIG_DEBUG_KERNEL, to ease regression testing.

- other architectures will follow one by one, fixing their
  arch-dependent drivers as well in the process

- finally we get rid of the wrappers.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-26 22:24     ` Ingo Molnar
@ 2008-03-27 16:16       ` Alan Mayer
  2008-03-27 16:33         ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Mayer @ 2008-03-27 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Mayer, torvalds, linux-kernel list, Robin Holt,
	Jack Steiner, Russ Anderson

> well, i dont it has to be (or it should be) an all or nothing patch, 
> given the complexity and risks involved.
> 
> - we should first introduce a nr_irqs variable and a Kconfig switch 
>   (say CONFIG_ARCH_HAS_DYNAMIC_NR_IRQS) for architectures to toggle. If 
>   the switch is toggled, nr_irqs is a variable, otherwise it's a carbon 
>   copy of NR_IRQS. Some array-definition, declaration and initialization 
>   wrappers are provided as well.
> 
> - then the core code, x86 and most drivers can be converted to nr_irqs.
>   The switch might initially even be user-selectable if
>   CONFIG_DEBUG_KERNEL, to ease regression testing.
> 
> - other architectures will follow one by one, fixing their
>   arch-dependent drivers as well in the process
> 
> - finally we get rid of the wrappers.
> 
> 	Ingo
> 

Okay, let's see if I understand this.

First patch introduces a config switch and a variable, nr_irqs that is
set to NR_IRQS.  It also dynamically allocates the currently staticly
allocated arrays that are dimensioned by NR_IRQS.  It also initializes
these dynamically allocated data structures.  This is all done under
the config switch, initially off by default.  

Second patch changes core code, x86 and most drivers to use nr_irqs.
This patch will also introduce a calculation of nr_irqs, based on
interrupt sources, that is a better estimate of the number of irqs
in the running system than just picking a guaranteed not-to-exceed
value that may be too big.
Is there a way to identify which drivers need to be addressed?

Then, test the crap out of it.  

Other architectures will follow, with the work being done by people
familiar with those architectures.

Clean up anything that's left over that's now been made unnecessary by
the conversion by everyone.  Including the config option?

Do I have the gist of it?

		--ajm

We are star dust,
We are golden,
We are caught in the Devil's bargain.
--
Alan J. Mayer
SGI
ajm@sgi.com
WORK: 651-683-3131
HOME: 651-407-0134
--


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-27 16:16       ` Alan Mayer
@ 2008-03-27 16:33         ` Ingo Molnar
  2008-03-27 16:53           ` Alan Mayer
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-03-27 16:33 UTC (permalink / raw)
  To: Alan Mayer
  Cc: torvalds, linux-kernel list, Robin Holt, Jack Steiner, Russ Anderson


* Alan Mayer <ajm@sgi.com> wrote:

> > well, i dont it has to be (or it should be) an all or nothing patch, 
> > given the complexity and risks involved.
> > 
> > - we should first introduce a nr_irqs variable and a Kconfig switch 
> >   (say CONFIG_ARCH_HAS_DYNAMIC_NR_IRQS) for architectures to toggle. If 
> >   the switch is toggled, nr_irqs is a variable, otherwise it's a carbon 
> >   copy of NR_IRQS. Some array-definition, declaration and initialization 
> >   wrappers are provided as well.
> > 
> > - then the core code, x86 and most drivers can be converted to nr_irqs.
> >   The switch might initially even be user-selectable if
> >   CONFIG_DEBUG_KERNEL, to ease regression testing.
> > 
> > - other architectures will follow one by one, fixing their
> >   arch-dependent drivers as well in the process
> > 
> > - finally we get rid of the wrappers.
> > 
> > 	Ingo
> > 
> 
> Okay, let's see if I understand this.
> 
> First patch introduces a config switch and a variable, nr_irqs that is
> set to NR_IRQS.  It also dynamically allocates the currently staticly
> allocated arrays that are dimensioned by NR_IRQS.  It also initializes
> these dynamically allocated data structures.  This is all done under
> the config switch, initially off by default.  
> 
> Second patch changes core code, x86 and most drivers to use nr_irqs.
> This patch will also introduce a calculation of nr_irqs, based on
> interrupt sources, that is a better estimate of the number of irqs
> in the running system than just picking a guaranteed not-to-exceed
> value that may be too big.
> Is there a way to identify which drivers need to be addressed?
> 
> Then, test the crap out of it.  
> 
> Other architectures will follow, with the work being done by people
> familiar with those architectures.
> 
> Clean up anything that's left over that's now been made unnecessary by
> the conversion by everyone.  Including the config option?
> 
> Do I have the gist of it?

i think you got it right, yes. But ... this is just a quick first-look 
suggestion from me, YMMV. Maybe you find a way to do it much easier to 
just convert everything at once. I tend to do things more gradually, in 
my experience it's very hard and time-consuming to change the world all 
at once - it's hard both to you the developer (you dont know whether it 
works until you have a very substantial amount of code written - while 
in a more gradual approach it can be converted one by one perhaps) - and 
it's hard for users and fellow kernel hackers.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-27 16:33         ` Ingo Molnar
@ 2008-03-27 16:53           ` Alan Mayer
  2008-03-28  9:32             ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Mayer @ 2008-03-27 16:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Mayer, torvalds, linux-kernel list, Robin Holt,
	Jack Steiner, Russ Anderson

> i think you got it right, yes. But ... this is just a quick first-look 
> suggestion from me, YMMV. Maybe you find a way to do it much easier to 
> just convert everything at once. I tend to do things more gradually, in 
> my experience it's very hard and time-consuming to change the world all 
> at once - it's hard both to you the developer (you dont know whether it 
> works until you have a very substantial amount of code written - while 
> in a more gradual approach it can be converted one by one perhaps) - and 
> it's hard for users and fellow kernel hackers.
> 
> 	Ingo
> 

I think I'll take a crack at it.  Doing it in phases means I can invest
a little less time and still give everyone a chance to see if they like
it.  And, the initial stuff seems like the area where people will be the
most likely to have problems and/or suggestions.  I have no idea how
long it will take to get something out there.  Stay tuned.

		--ajm


Lately it occurs to me,
What a long, strange trip it's been.
--
Alan J. Mayer
SGI
ajm@sgi.com
WORK: 651-683-3131
HOME: 651-407-0134
--


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: resize NR_IRQS for large machines (re-submit)
  2008-03-27 16:53           ` Alan Mayer
@ 2008-03-28  9:32             ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2008-03-28  9:32 UTC (permalink / raw)
  To: Alan Mayer
  Cc: torvalds, linux-kernel list, Robin Holt, Jack Steiner, Russ Anderson


* Alan Mayer <ajm@sgi.com> wrote:

> > i think you got it right, yes. But ... this is just a quick 
> > first-look suggestion from me, YMMV. Maybe you find a way to do it 
> > much easier to just convert everything at once. I tend to do things 
> > more gradually, in my experience it's very hard and time-consuming 
> > to change the world all at once - it's hard both to you the 
> > developer (you dont know whether it works until you have a very 
> > substantial amount of code written - while in a more gradual 
> > approach it can be converted one by one perhaps) - and it's hard for 
> > users and fellow kernel hackers.
> 
> I think I'll take a crack at it.  Doing it in phases means I can 
> invest a little less time and still give everyone a chance to see if 
> they like it.  And, the initial stuff seems like the area where people 
> will be the most likely to have problems and/or suggestions.  I have 
> no idea how long it will take to get something out there.  Stay tuned.

great! If your initial target for this is x86 (which has certainly the 
most twisted IRQ architecture of all architectures) then we'd be glad to 
host and test your patches in x86.git/latest - even if they touch 
drivers and other core code. Once it's proven enough it could be 
rehosted to -mm.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-03-28  9:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-26 21:11 [PATCH] x86_64: resize NR_IRQS for large machines (re-submit) Alan Mayer
2008-03-26 21:23 ` Ingo Molnar
2008-03-26 21:40   ` Alan Mayer
2008-03-26 22:24     ` Ingo Molnar
2008-03-27 16:16       ` Alan Mayer
2008-03-27 16:33         ` Ingo Molnar
2008-03-27 16:53           ` Alan Mayer
2008-03-28  9:32             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).