linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] i386: Per node IDT
       [not found] <Pine.LNX.4.61.0507101617240.16055@montezuma.fsmlabs.com.suse.lists.linux.kernel>
@ 2005-07-11  1:59 ` Andi Kleen
  2005-07-11  4:02   ` Arjan van de Ven
  2005-07-11 13:34   ` Zwane Mwaikambo
  0 siblings, 2 replies; 18+ messages in thread
From: Andi Kleen @ 2005-07-11  1:59 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: linux-kernel


Why per node? Why not go the whole way and make it per CPU?

I would also not define it statically, but allocate it at boot time
in node local memory.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11  1:59 ` [RFC][PATCH] i386: Per node IDT Andi Kleen
@ 2005-07-11  4:02   ` Arjan van de Ven
  2005-07-11  4:08     ` Andi Kleen
  2005-07-11 14:09     ` Zwane Mwaikambo
  2005-07-11 13:34   ` Zwane Mwaikambo
  1 sibling, 2 replies; 18+ messages in thread
From: Arjan van de Ven @ 2005-07-11  4:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Zwane Mwaikambo, linux-kernel

On Mon, 2005-07-11 at 03:59 +0200, Andi Kleen wrote:
> Why per node? Why not go the whole way and make it per CPU?

Agreed, for two reasons even
1) Per cpu allows for even more devices and cache locality
2) While few people have a NUMA system, many have an SMP system so you
get a lot more testing.


> I would also not define it statically, but allocate it at boot time
> in node local memory.

this is probably more tricky so I would suggest doing this in a second
step.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11  4:02   ` Arjan van de Ven
@ 2005-07-11  4:08     ` Andi Kleen
  2005-07-11 14:09     ` Zwane Mwaikambo
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2005-07-11  4:08 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andi Kleen, Zwane Mwaikambo, linux-kernel

> > I would also not define it statically, but allocate it at boot time
> > in node local memory.
> 
> this is probably more tricky so I would suggest doing this in a second
> step.

Not sure it's that tricky.  Otherwise he'll waste a lot of memory.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11  1:59 ` [RFC][PATCH] i386: Per node IDT Andi Kleen
  2005-07-11  4:02   ` Arjan van de Ven
@ 2005-07-11 13:34   ` Zwane Mwaikambo
  2005-07-11 15:03     ` Brian Gerst
  1 sibling, 1 reply; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-07-11 13:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Sun, 11 Jul 2005, Andi Kleen wrote:

> Why per node? Why not go the whole way and make it per CPU?
> 
> I would also not define it statically, but allocate it at boot time
> in node local memory.

I went per node so that it would be minimal/zero impact for the no-node 
case, it would also simplify hotplug cpu since once a cpu in a node goes 
down, we still have other participating processors capable of handling 
its devices without having to do too much migration work. I'll definitely 
incorporate the node local allocations however, for some i386 systems we 
might be forced to stick some additional IDTs on node 0 since the IDTR 
will only take 32bit addresses and we could end up with only highmem on 
some nodes.

Thanks for the feedback,
	Zwane

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 14:09     ` Zwane Mwaikambo
@ 2005-07-11 14:05       ` Arjan van de Ven
  2005-07-11 15:17       ` Kenji Kaneshige
  1 sibling, 0 replies; 18+ messages in thread
From: Arjan van de Ven @ 2005-07-11 14:05 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andi Kleen, linux-kernel

On Mon, 2005-07-11 at 08:09 -0600, Zwane Mwaikambo wrote:
> On Mon, 11 Jul 2005, Arjan van de Ven wrote:
> 
> > On Mon, 2005-07-11 at 03:59 +0200, Andi Kleen wrote:
> > > Why per node? Why not go the whole way and make it per CPU?
> > 
> > Agreed, for two reasons even
> > 1) Per cpu allows for even more devices and cache locality
> > 2) While few people have a NUMA system, many have an SMP system so you
> > get a lot more testing.
> 
> Agreed, the first version was a per cpu one simply so that i could test it 
> on a normal SMP system. Andi seems to be of the same opinion, what do you 
> think of the hotplug cpu case (explained in previous email)?

you need to cope with hotplug of entire nodes anyway, or hotunplug of
the last cpu of a node. In fact I bet that the administration needed
will be LESS in the per cpu case (since you know it's the only one)
compared to the per node case.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11  4:02   ` Arjan van de Ven
  2005-07-11  4:08     ` Andi Kleen
@ 2005-07-11 14:09     ` Zwane Mwaikambo
  2005-07-11 14:05       ` Arjan van de Ven
  2005-07-11 15:17       ` Kenji Kaneshige
  1 sibling, 2 replies; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-07-11 14:09 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andi Kleen, linux-kernel

On Mon, 11 Jul 2005, Arjan van de Ven wrote:

> On Mon, 2005-07-11 at 03:59 +0200, Andi Kleen wrote:
> > Why per node? Why not go the whole way and make it per CPU?
> 
> Agreed, for two reasons even
> 1) Per cpu allows for even more devices and cache locality
> 2) While few people have a NUMA system, many have an SMP system so you
> get a lot more testing.

Agreed, the first version was a per cpu one simply so that i could test it 
on a normal SMP system. Andi seems to be of the same opinion, what do you 
think of the hotplug cpu case (explained in previous email)?

Thanks Arjan,
	Zwane


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 13:34   ` Zwane Mwaikambo
@ 2005-07-11 15:03     ` Brian Gerst
  2005-07-11 15:21       ` Zwane Mwaikambo
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Gerst @ 2005-07-11 15:03 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andi Kleen, linux-kernel

Zwane Mwaikambo wrote:
> On Sun, 11 Jul 2005, Andi Kleen wrote:
> 
> 
>>Why per node? Why not go the whole way and make it per CPU?
>>
>>I would also not define it statically, but allocate it at boot time
>>in node local memory.
> 
> 
> I went per node so that it would be minimal/zero impact for the no-node 
> case, it would also simplify hotplug cpu since once a cpu in a node goes 
> down, we still have other participating processors capable of handling 
> its devices without having to do too much migration work. I'll definitely 
> incorporate the node local allocations however, for some i386 systems we 
> might be forced to stick some additional IDTs on node 0 since the IDTR 
> will only take 32bit addresses and we could end up with only highmem on 
> some nodes.

Doesn't the IDTR take a virtual address?  It has to or else the f00f bug 
fix wouldn't work.

--
				Brian Gerst

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 14:09     ` Zwane Mwaikambo
  2005-07-11 14:05       ` Arjan van de Ven
@ 2005-07-11 15:17       ` Kenji Kaneshige
  1 sibling, 0 replies; 18+ messages in thread
From: Kenji Kaneshige @ 2005-07-11 15:17 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Arjan van de Ven, Andi Kleen, linux-kernel

Hi,

> Agreed, the first version was a per cpu one simply so that i could test it 
> on a normal SMP system. Andi seems to be of the same opinion, what do you 
> think of the hotplug cpu case (explained in previous email)?

I think we need to migrate interrupts to the other CPU
in the hotplug CPU case. Even when we use per node approach,
we need to consider interrupt migration between nodes
because all CPUs on the node could be hot-removed.

Thanks,
Kenji Kaneshige

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 15:03     ` Brian Gerst
@ 2005-07-11 15:21       ` Zwane Mwaikambo
  2005-07-11 16:39         ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-07-11 15:21 UTC (permalink / raw)
  To: Brian Gerst; +Cc: Andi Kleen, linux-kernel

On Mon, 11 Jul 2005, Brian Gerst wrote:

> Zwane Mwaikambo wrote:
> > On Sun, 11 Jul 2005, Andi Kleen wrote:
> > 
> > 
> > > Why per node? Why not go the whole way and make it per CPU?
> > > 
> > > I would also not define it statically, but allocate it at boot time
> > > in node local memory.
> > 
> > 
> > I went per node so that it would be minimal/zero impact for the no-node
> > case, it would also simplify hotplug cpu since once a cpu in a node goes
> > down, we still have other participating processors capable of handling its
> > devices without having to do too much migration work. I'll definitely
> > incorporate the node local allocations however, for some i386 systems we
> > might be forced to stick some additional IDTs on node 0 since the IDTR will
> > only take 32bit addresses and we could end up with only highmem on some
> > nodes.
> 
> Doesn't the IDTR take a virtual address?  It has to or else the f00f bug fix
> wouldn't work.

Yes you're right, i wasn't quite awake when i replied, thanks for 
correcting that.

	Zwane


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 15:21       ` Zwane Mwaikambo
@ 2005-07-11 16:39         ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2005-07-11 16:39 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Brian Gerst, Andi Kleen, linux-kernel

> Yes you're right, i wasn't quite awake when i replied, thanks for 
> correcting that.

You would need to allocate it using vmalloc if you wanted
to put it node local, eating up precious TLB entries.

Anyways, i386 NUMA is so broken anyways regarding all that that I wouldn't
worry about node local allocation. Just allocate it somewhere, just
not statically. 

I only worried about static memory consumption of a kernel. An IDT is quite
big and NR_CPUS tends to be too.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-08-07  1:13       ` Zwane Mwaikambo
@ 2005-08-07 10:47         ` Oleg Nesterov
  0 siblings, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2005-08-07 10:47 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Linux Kernel

Zwane Mwaikambo wrote:
> 
> On Mon, 11 Jul 2005, Oleg Nesterov wrote:
> 
> > Please note that entry.S:BUILD_INTERRUPT() also does this trick:
> >       pushl $nr-256;
> >
> > so it should be changed as well.
> 
> I was making these changes and noticed that those were for the various SMP
> interrupts so they are real vectors. These will always remain within the
> 256 range.

Yes, you are right. I suggested it for consistency only.
Afaics, none of the smp_xxx_interrupt uses this value, so
it it safe to change the pushed number.

But it is very minor issue indeed, and probably you are
right not doing so.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 15:19     ` Oleg Nesterov
@ 2005-08-07  1:13       ` Zwane Mwaikambo
  2005-08-07 10:47         ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-08-07  1:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Linux Kernel

On Mon, 11 Jul 2005, Oleg Nesterov wrote:

> Oleg Nesterov wrote:
> > 
> > Probably it makes sense to change it to
> >         pushl $vector - 0xFFFF - 1
> > 
> 
> Please note that entry.S:BUILD_INTERRUPT() also does this trick:
> 	pushl $nr-256;
> 
> so it should be changed as well.

I was making these changes and noticed that those were for the various SMP 
interrupts so they are real vectors. These will always remain within the 
256 range.

	Zwane


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 14:44   ` Oleg Nesterov
  2005-07-11 15:05     ` Zwane Mwaikambo
@ 2005-07-11 15:19     ` Oleg Nesterov
  2005-08-07  1:13       ` Zwane Mwaikambo
  1 sibling, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2005-07-11 15:19 UTC (permalink / raw)
  To: Zwane Mwaikambo, linux-kernel, Andi Kleen, Arjan van de Ven

Oleg Nesterov wrote:
> 
> Probably it makes sense to change it to
>         pushl $vector - 0xFFFF - 1
> 

Please note that entry.S:BUILD_INTERRUPT() also does this trick:
	pushl $nr-256;

so it should be changed as well.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 14:44   ` Oleg Nesterov
@ 2005-07-11 15:05     ` Zwane Mwaikambo
  2005-07-11 15:19     ` Oleg Nesterov
  1 sibling, 0 replies; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-07-11 15:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Andi Kleen, Arjan van de Ven

Hi Oleg,

On Mon, 11 Jul 2005, Oleg Nesterov wrote:

> > The change is so that we can send IRQs higher than 256 to do_IRQ. That 
> > looks like it tries to check if we came in via system_call since we'd save 
> > the system call number as orig_eax. Now that i think about it, doesn't 
> > that path always get taken when we interrupt userspace and have pending 
> > signals on return from interrupt?
> 
> As far as I can see, we always have orig_eax < 0 on interrupt, because
> 
> irq_entries_start:
> 	pushl $vector-256	<-----  orig_eax
> 	jmp common_interrupt
> 
> and NR_IRQS < 256. So if we have pending signals on return from interrupt,
> do_signal() will not corrupt userspace registers when regs->eax == -ERESTART...
> accidentally.
> 
> Probably it makes sense to change it to
> 	pushl $vector - 0xFFFF - 1
> 
> and in do_IRQ()
> 	int irq = regs->orig_eax & 0xFFFF
> 
> if you need to send IRQs higher than 256 to do_IRQ.

Good catch, thanks i'll change that!

	Zwane


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 14:03 ` Zwane Mwaikambo
@ 2005-07-11 14:44   ` Oleg Nesterov
  2005-07-11 15:05     ` Zwane Mwaikambo
  2005-07-11 15:19     ` Oleg Nesterov
  0 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2005-07-11 14:44 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: linux-kernel, Andi Kleen, Arjan van de Ven

Hello Zwane,

Zwane Mwaikambo wrote:
>
> > On Mon, 11 Jul 2005, Oleg Nesterov wrote:
> >
> > Could you explain this change? I think it breaks do_signal/handle_signal,
> > they check orig_eax >= 0 to handle -ERESTARTSYS:
> >
> > 	/* Are we from a system call? */
> > 	if (regs->orig_eax >= 0) {
> > 		/* If so, check system call restarting.. */
> > 		switch (regs->eax) {
> > 		        case -ERESTART_RESTARTBLOCK:
> > 			case -ERESTARTNOHAND:
>
> The change is so that we can send IRQs higher than 256 to do_IRQ. That 
> looks like it tries to check if we came in via system_call since we'd save 
> the system call number as orig_eax. Now that i think about it, doesn't 
> that path always get taken when we interrupt userspace and have pending 
> signals on return from interrupt?

As far as I can see, we always have orig_eax < 0 on interrupt, because

irq_entries_start:
	pushl $vector-256	<-----  orig_eax
	jmp common_interrupt

and NR_IRQS < 256. So if we have pending signals on return from interrupt,
do_signal() will not corrupt userspace registers when regs->eax == -ERESTART...
accidentally.

Probably it makes sense to change it to
	pushl $vector - 0xFFFF - 1

and in do_IRQ()
	int irq = regs->orig_eax & 0xFFFF

if you need to send IRQs higher than 256 to do_IRQ.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
  2005-07-11 12:28 Oleg Nesterov
@ 2005-07-11 14:03 ` Zwane Mwaikambo
  2005-07-11 14:44   ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-07-11 14:03 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Andi Kleen, Arjan van de Ven

Hello Oleg,

On Mon, 11 Jul 2005, Oleg Nesterov wrote:

> Zwane Mwaikambo wrote:
> >
> > --- linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S	3 Jul 2005 13:20:43 -0000	1.1.1.1
> > +++ linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S	10 Jul 2005 22:33:37 -0000
> > -
> > +/* Build the IRQ entry stubs */
> >  vector=0
> > -ENTRY(irq_entries_start)
> > +	.align IRQ_STUB_SIZE,0x90
> > +ENTRY(interrupt)
> >  .rept NR_IRQS
> >  	ALIGN
> > -1:	pushl $vector-256
> > +	pushl $vector
> >  	jmp common_interrupt
> >
> >  [...snip...]
> >
> > --- linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c	3 Jul 2005 13:20:43 -0000	1.1.1.1
> > +++ linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c	4 Jul 2005 21:39:56 -0000
> > @@ -53,8 +53,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
> >   */
> >  fastcall unsigned int do_IRQ(struct pt_regs *regs)
> >  {	
> > -	/* high bits used in ret_from_ code */
> > -	int irq = regs->orig_eax & 0xff;
> > +	int irq = regs->orig_eax;
> 
> Could you explain this change? I think it breaks do_signal/handle_signal,
> they check orig_eax >= 0 to handle -ERESTARTSYS:
> 
> 	/* Are we from a system call? */
> 	if (regs->orig_eax >= 0) {
> 		/* If so, check system call restarting.. */
> 		switch (regs->eax) {
> 		        case -ERESTART_RESTARTBLOCK:
> 			case -ERESTARTNOHAND:

The change is so that we can send IRQs higher than 256 to do_IRQ. That 
looks like it tries to check if we came in via system_call since we'd save 
the system call number as orig_eax. Now that i think about it, doesn't 
that path always get taken when we interrupt userspace and have pending 
signals on return from interrupt?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] i386: Per node IDT
@ 2005-07-11 12:28 Oleg Nesterov
  2005-07-11 14:03 ` Zwane Mwaikambo
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2005-07-11 12:28 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: linux-kernel, Andi Kleen, Arjan van de Ven

Zwane Mwaikambo wrote:
>
> --- linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S	3 Jul 2005 13:20:43 -0000	1.1.1.1
> +++ linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S	10 Jul 2005 22:33:37 -0000
> -
> +/* Build the IRQ entry stubs */
>  vector=0
> -ENTRY(irq_entries_start)
> +	.align IRQ_STUB_SIZE,0x90
> +ENTRY(interrupt)
>  .rept NR_IRQS
>  	ALIGN
> -1:	pushl $vector-256
> +	pushl $vector
>  	jmp common_interrupt
>
>  [...snip...]
>
> --- linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c	3 Jul 2005 13:20:43 -0000	1.1.1.1
> +++ linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c	4 Jul 2005 21:39:56 -0000
> @@ -53,8 +53,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
>   */
>  fastcall unsigned int do_IRQ(struct pt_regs *regs)
>  {	
> -	/* high bits used in ret_from_ code */
> -	int irq = regs->orig_eax & 0xff;
> +	int irq = regs->orig_eax;

Could you explain this change? I think it breaks do_signal/handle_signal,
they check orig_eax >= 0 to handle -ERESTARTSYS:

	/* Are we from a system call? */
	if (regs->orig_eax >= 0) {
		/* If so, check system call restarting.. */
		switch (regs->eax) {
		        case -ERESTART_RESTARTBLOCK:
			case -ERESTARTNOHAND:

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH] i386: Per node IDT
@ 2005-07-10 22:41 Zwane Mwaikambo
  0 siblings, 0 replies; 18+ messages in thread
From: Zwane Mwaikambo @ 2005-07-10 22:41 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: Raj, Ashok, Linux Kernel, Andi Kleen, Bjorn Helgaas, Len Brown

As most are aware there is a growing need for more devices on i386/x86_64 
based platforms and with that, support for interrupt servicing for all 
these devices. The proliferation of MSI based devices will also drive that 
requirement higher due to some devices requiring multiple vectors. Natalie 
and others have worked on ways of alleviating this recently, but i'd like 
to put the following forward as well, which should be able to work with 
other methodologies in place.

The general idea behind it is to setup an IDT per node to be shared 
between all processors on that node, with the definition of 'node' 
currently based on the NUMA topology. This could, of course, be changed in 
future to some form of interrupt handling domain/node for finer control 
over the number of participating cpus in a node. The following patch is a 
functioning proof of concept, tested on 32 processor, 8 node NUMA system 
with 320 irq lines and i believe Natalie tested it with 576 interrupts.

There is basic MSI support (it'll boot) although i haven't added node 
awareness to it yet. I'd like to collect opinions on general approach. The 
patch is currently i386 only, but adding x86_64 for example, should be 
easy.

Thanks

 arch/i386/kernel/cpu/common.c                      |   31 +++++
 arch/i386/kernel/entry.S                           |   19 ---
 arch/i386/kernel/head.S                            |   12 +-
 arch/i386/kernel/i8259.c                           |    2
 arch/i386/kernel/io_apic.c                         |  112 +++++++++++++--------
 arch/i386/kernel/irq.c                             |    3
 arch/i386/kernel/smpboot.c                         |    2
 arch/i386/kernel/traps.c                           |   41 +++++--
 arch/i386/mm/fault.c                               |    6 -
 drivers/pci/msi.c                                  |    6 -
 drivers/pci/msi.h                                  |    1
 include/asm-i386/cpu.h                             |    3
 include/asm-i386/desc.h                            |    5
 include/asm-i386/hw_irq.h                          |    2
 include/asm-i386/io_apic.h                         |    2
 include/asm-i386/mach-default/irq_vectors_limits.h |    8 +
 include/asm-i386/mach-visws/irq_vectors.h          |    3
 include/asm-i386/mach-voyager/irq_vectors.h        |    3
 include/asm-i386/segment.h                         |    2
 19 files changed, 176 insertions(+), 87 deletions(-)

Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 entry.S
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S	3 Jul 2005 13:20:43 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S	10 Jul 2005 22:33:37 -0000
@@ -407,27 +407,18 @@ syscall_badsys:
 	FIXUP_ESPFIX_STACK \
 28:	popl %eax;
 
-/*
- * Build the entry stubs and pointer table with
- * some assembler magic.
- */
-.data
-ENTRY(interrupt)
-.text
-
+/* Build the IRQ entry stubs */
 vector=0
-ENTRY(irq_entries_start)
+	.align IRQ_STUB_SIZE,0x90
+ENTRY(interrupt)
 .rept NR_IRQS
 	ALIGN
-1:	pushl $vector-256
+	pushl $vector
 	jmp common_interrupt
-.data
-	.long 1b
-.text
+	.align IRQ_STUB_SIZE,0x90
 vector=vector+1
 .endr
 
-	ALIGN
 common_interrupt:
 	SAVE_ALL
 	movl %esp,%eax
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/head.S
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/head.S,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 head.S
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/head.S	3 Jul 2005 13:20:43 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/head.S	4 Jul 2005 21:39:56 -0000
@@ -11,6 +11,7 @@
 #include <linux/config.h>
 #include <linux/threads.h>
 #include <linux/linkage.h>
+#include <linux/numa.h>
 #include <asm/segment.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -300,7 +301,7 @@ is386:	movl $2,%ecx		# set MP
 
 	call check_x87
 	lgdt cpu_gdt_descr
-	lidt idt_descr
+	lidt node_idt_descr		# we switch to per node IDTs later
 	ljmp $(__KERNEL_CS),$1f
 1:	movl $(__KERNEL_DS),%eax	# reload all the segment registers
 	movl %eax,%ss			# after changing gdt.
@@ -366,7 +367,7 @@ setup_idt:
 	movw %dx,%ax		/* selector = 0x0010 = cs */
 	movw $0x8E00,%dx	/* interrupt gate - dpl=0, present */
 
-	lea idt_table,%edi
+	lea node_idt_table,%edi
 	mov $256,%ecx
 rp_sidt:
 	movl %eax,(%edi)
@@ -441,7 +442,7 @@ int_msg:
  */
 
 .globl boot_gdt_descr
-.globl idt_descr
+.globl node_idt_descr
 .globl cpu_gdt_descr
 
 	ALIGN
@@ -452,9 +453,10 @@ boot_gdt_descr:
 	.long boot_gdt_table - __PAGE_OFFSET
 
 	.word 0				# 32-bit align idt_desc.address
-idt_descr:
+node_idt_descr:
 	.word IDT_ENTRIES*8-1		# idt contains 256 entries
-	.long idt_table
+	.long node_idt_table
+	.fill MAX_NUMNODES-1,8,0
 
 # boot GDT descriptor (later on used by CPU#0):
 	.word 0				# 32 bit align gdt_desc.address
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/i8259.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/i8259.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 i8259.c
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/i8259.c	3 Jul 2005 13:20:44 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/i8259.c	10 Jul 2005 21:21:44 -0000
@@ -412,7 +412,7 @@ void __init init_IRQ(void)
 	 * us. (some of these will be overridden and become
 	 * 'special' SMP interrupts)
 	 */
-	for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
+	for (i = 0; i < (NR_DEVICE_VECTORS); i++) {
 		int vector = FIRST_EXTERNAL_VECTOR + i;
 		if (i >= NR_IRQS)
 			break;
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 io_apic.c
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/io_apic.c	3 Jul 2005 13:20:43 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/io_apic.c	10 Jul 2005 21:20:50 -0000
@@ -78,12 +78,13 @@ static struct irq_pin_list {
 	int apic, pin, next;
 } irq_2_pin[PIN_MAP_SIZE];
 
-int vector_irq[NR_VECTORS] = { [0 ... NR_VECTORS - 1] = -1};
+int vector_irq[MAX_NUMNODES][NR_VECTORS] =
+	{ [0 ... MAX_NUMNODES-1][0 ... NR_VECTORS - 1] = -1 };
 #ifdef CONFIG_PCI_MSI
-#define vector_to_irq(vector) 	\
-	(platform_legacy_irq(vector) ? vector : vector_irq[vector])
+#define vector_to_irq(node, vector) 	\
+	(platform_legacy_irq(vector) ? vector : vector_irq[node][vector])
 #else
-#define vector_to_irq(vector)	(vector)
+#define vector_to_irq(node, vector)	(vector)
 #endif
 
 /*
@@ -1120,31 +1121,43 @@ static inline int IO_APIC_irq_trigger(in
 
 /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */
 u8 irq_vector[NR_IRQ_VECTORS] = { FIRST_DEVICE_VECTOR , 0 };
+u8 vector_allocated[MAX_NUMNODES][FIRST_SYSTEM_VECTOR];
 
-int assign_irq_vector(int irq)
+int assign_irq_vector(int irq, int node)
 {
-	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
+	static u8 current_vector[MAX_NUMNODES] = {[0 ... MAX_NUMNODES-1] =
+		FIRST_DEVICE_VECTOR};
+	static int offset[MAX_NUMNODES];
+	int vector;
 
-	BUG_ON(irq >= NR_IRQ_VECTORS);
-	if (irq != AUTO_ASSIGN && IO_APIC_VECTOR(irq) > 0)
-		return IO_APIC_VECTOR(irq);
+	vector = IO_APIC_VECTOR(irq);
+	if ((vector > 0) && (irq != AUTO_ASSIGN)) {
+		vector_allocated[node][vector] = 1;
+		return vector;
+	}
 next:
-	current_vector += 8;
-	if (current_vector == SYSCALL_VECTOR)
+	current_vector[node] += 8;
+	if (current_vector[node] == SYSCALL_VECTOR)
 		goto next;
-
-	if (current_vector >= FIRST_SYSTEM_VECTOR) {
-		offset++;
-		if (!(offset%8))
-			return -ENOSPC;
-		current_vector = FIRST_DEVICE_VECTOR + offset;
+	
+	if (current_vector[node] >= FIRST_SYSTEM_VECTOR) {
+		offset[node] = (offset[node] + 1) & 7;
+		current_vector[node] = FIRST_DEVICE_VECTOR + offset[node];
 	}
 
-	vector_irq[current_vector] = irq;
+	if (current_vector[node] == FIRST_SYSTEM_VECTOR)
+		return -ENOSPC;
+
+	vector = current_vector[node];
+	vector_irq[node][vector] = irq;
+	if (vector_allocated[node][vector])
+		goto next;
+	
+	vector_allocated[node][vector] = 1;
 	if (irq != AUTO_ASSIGN)
-		IO_APIC_VECTOR(irq) = current_vector;
+		IO_APIC_VECTOR(irq) = vector;
 
-	return current_vector;
+	return vector;
 }
 
 static struct hw_interrupt_type ioapic_level_type;
@@ -1154,7 +1167,7 @@ static struct hw_interrupt_type ioapic_e
 #define IOAPIC_EDGE	0
 #define IOAPIC_LEVEL	1
 
-static inline void ioapic_register_intr(int irq, int vector, unsigned long trigger)
+static inline void ioapic_register_intr(int node, int irq, int vector, unsigned long trigger)
 {
 	if (use_pci_vector() && !platform_legacy_irq(irq)) {
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
@@ -1162,21 +1175,21 @@ static inline void ioapic_register_intr(
 			irq_desc[vector].handler = &ioapic_level_type;
 		else
 			irq_desc[vector].handler = &ioapic_edge_type;
-		set_intr_gate(vector, interrupt[vector]);
+		node_set_intr_gate(node, vector, interrupt[vector]);
 	} else	{
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
 				trigger == IOAPIC_LEVEL)
 			irq_desc[irq].handler = &ioapic_level_type;
 		else
 			irq_desc[irq].handler = &ioapic_edge_type;
-		set_intr_gate(vector, interrupt[irq]);
+		node_set_intr_gate(node, vector, interrupt[irq]);
 	}
 }
 
 static void __init setup_IO_APIC_irqs(void)
 {
 	struct IO_APIC_route_entry entry;
-	int apic, pin, idx, irq, first_notcon = 1, vector;
+	int apic, pin, idx, irq, first_notcon = 1, vector, bus, node;
 	unsigned long flags;
 
 	apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
@@ -1192,8 +1205,6 @@ static void __init setup_IO_APIC_irqs(vo
 		entry.delivery_mode = INT_DELIVERY_MODE;
 		entry.dest_mode = INT_DEST_MODE;
 		entry.mask = 0;				/* enable IRQ */
-		entry.dest.logical.logical_dest = 
-					cpu_mask_to_apicid(TARGET_CPUS);
 
 		idx = find_irq_entry(apic,pin,mp_INT);
 		if (idx == -1) {
@@ -1212,12 +1223,22 @@ static void __init setup_IO_APIC_irqs(vo
 		entry.trigger = irq_trigger(idx);
 		entry.polarity = irq_polarity(idx);
 
+		bus = mp_irqs[idx].mpc_srcbus;
+		node = mp_bus_id_to_node[bus];
+		entry.dest.logical.logical_dest = cpu_mask_to_apicid(node_to_cpumask(node));
+
 		if (irq_trigger(idx)) {
 			entry.trigger = 1;
 			entry.mask = 1;
 		}
 
 		irq = pin_2_irq(idx, apic, pin);
+		if (irq >= NR_IRQS) {
+			apic_printk(APIC_VERBOSE, KERN_DEBUG
+				"IO-APIC: out of IRQS node%d/bus%d/ioapic%d/irq%d\n",
+					node, bus, apic, irq);
+			continue;
+		}
 		/*
 		 * skip adding the timer int on secondary nodes, which causes
 		 * a small but painful rift in the time-space continuum
@@ -1231,9 +1252,12 @@ static void __init setup_IO_APIC_irqs(vo
 			continue;
 
 		if (IO_APIC_IRQ(irq)) {
-			vector = assign_irq_vector(irq);
+			vector = assign_irq_vector(irq, node);
+			if (vector < 0)
+				continue;
+
 			entry.vector = vector;
-			ioapic_register_intr(irq, vector, IOAPIC_AUTO);
+			ioapic_register_intr(node, irq, vector, IOAPIC_AUTO);
 		
 			if (!apic && (irq < 16))
 				disable_8259A_irq(irq);
@@ -1928,14 +1952,14 @@ static void end_level_ioapic_irq (unsign
 #ifdef CONFIG_PCI_MSI
 static unsigned int startup_edge_ioapic_vector(unsigned int vector)
 {
-	int irq = vector_to_irq(vector);
+	int irq = vector_to_irq(cpu_to_node(smp_processor_id()), vector);
 
 	return startup_edge_ioapic_irq(irq);
 }
 
 static void ack_edge_ioapic_vector(unsigned int vector)
 {
-	int irq = vector_to_irq(vector);
+	int irq = vector_to_irq(cpu_to_node(smp_processor_id()), vector);
 
 	move_irq(vector);
 	ack_edge_ioapic_irq(irq);
@@ -1943,14 +1967,14 @@ static void ack_edge_ioapic_vector(unsig
 
 static unsigned int startup_level_ioapic_vector (unsigned int vector)
 {
-	int irq = vector_to_irq(vector);
+	int irq = vector_to_irq(cpu_to_node(smp_processor_id()), vector);
 
 	return startup_level_ioapic_irq (irq);
 }
 
 static void end_level_ioapic_vector (unsigned int vector)
 {
-	int irq = vector_to_irq(vector);
+	int irq = vector_to_irq(cpu_to_node(smp_processor_id()), vector);
 
 	move_irq(vector);
 	end_level_ioapic_irq(irq);
@@ -1958,14 +1982,14 @@ static void end_level_ioapic_vector (uns
 
 static void mask_IO_APIC_vector (unsigned int vector)
 {
-	int irq = vector_to_irq(vector);
+	int irq = vector_to_irq(cpu_to_node(smp_processor_id()), vector);
 
 	mask_IO_APIC_irq(irq);
 }
 
 static void unmask_IO_APIC_vector (unsigned int vector)
 {
-	int irq = vector_to_irq(vector);
+	int irq = vector_to_irq(cpu_to_node(smp_processor_id()), vector);
 
 	unmask_IO_APIC_irq(irq);
 }
@@ -1974,7 +1998,8 @@ static void unmask_IO_APIC_vector (unsig
 static void set_ioapic_affinity_vector (unsigned int vector,
 					cpumask_t cpu_mask)
 {
-	int irq = vector_to_irq(vector);
+	int node = cpu_to_node(first_cpu(cpu_mask));
+	int irq = vector_to_irq(node, vector);
 
 	set_native_irq_info(vector, cpu_mask);
 	set_ioapic_affinity_irq(irq, cpu_mask);
@@ -2035,7 +2060,7 @@ static inline void init_IO_APIC_traps(vo
 		int tmp = irq;
 		if (use_pci_vector()) {
 			if (!platform_legacy_irq(tmp))
-				if ((tmp = vector_to_irq(tmp)) == -1)
+				if ((tmp = vector_to_irq(0, tmp)) == -1) /* FIXME - zwane */
 					continue;
 		}
 		if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) {
@@ -2181,7 +2206,8 @@ static inline void check_timer(void)
 	 * get/set the timer IRQ vector:
 	 */
 	disable_8259A_irq(0);
-	vector = assign_irq_vector(0);
+	vector = assign_irq_vector(0, cpu_to_node(smp_processor_id()));
+	/* This gets reserved on all nodes as FIRST_DEVICE_VECTOR */
 	set_intr_gate(vector, interrupt[0]);
 
 	/*
@@ -2528,6 +2554,7 @@ int io_apic_set_pci_routing (int ioapic,
 {
 	struct IO_APIC_route_entry entry;
 	unsigned long flags;
+	int node, bus;
 
 	if (!IO_APIC_IRQ(irq)) {
 		printk(KERN_ERR "IOAPIC[%d]: Invalid reference to IRQ 0\n",
@@ -2545,7 +2572,6 @@ int io_apic_set_pci_routing (int ioapic,
 
 	entry.delivery_mode = INT_DELIVERY_MODE;
 	entry.dest_mode = INT_DEST_MODE;
-	entry.dest.logical.logical_dest = cpu_mask_to_apicid(TARGET_CPUS);
 	entry.trigger = edge_level;
 	entry.polarity = active_high_low;
 	entry.mask  = 1;
@@ -2555,15 +2581,19 @@ int io_apic_set_pci_routing (int ioapic,
 	 */
 	if (irq >= 16)
 		add_pin_to_irq(irq, ioapic, pin);
-
-	entry.vector = assign_irq_vector(irq);
+	bus = mp_irqs[pin].mpc_srcbus;
+	node = mp_bus_id_to_node[bus];
+	entry.dest.logical.logical_dest = cpu_mask_to_apicid(node_to_cpumask(node));
+	entry.vector = assign_irq_vector(irq, node);
+	if (entry.vector < 0)
+		return -ENOSPC;
 
 	apic_printk(APIC_DEBUG, KERN_DEBUG "IOAPIC[%d]: Set PCI routing entry "
 		"(%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i)\n", ioapic,
 		mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq,
 		edge_level, active_high_low);
 
-	ioapic_register_intr(irq, entry.vector, edge_level);
+	ioapic_register_intr(node, irq, entry.vector, edge_level);
 
 	if (!ioapic && (irq < 16))
 		disable_8259A_irq(irq);
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq.c
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c	3 Jul 2005 13:20:43 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c	4 Jul 2005 21:39:56 -0000
@@ -53,8 +53,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
  */
 fastcall unsigned int do_IRQ(struct pt_regs *regs)
 {	
-	/* high bits used in ret_from_ code */
-	int irq = regs->orig_eax & 0xff;
+	int irq = regs->orig_eax;
 #ifdef CONFIG_4KSTACKS
 	union irq_ctx *curctx, *irqctx;
 	u32 *isp;
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/smpboot.c	3 Jul 2005 13:20:43 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/smpboot.c	10 Jul 2005 21:23:15 -0000
@@ -53,6 +53,7 @@
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
 #include <asm/arch_hooks.h>
+#include <asm/cpu.h>
 
 #include <mach_apic.h>
 #include <mach_wakecpu.h>
@@ -483,6 +484,7 @@ static void __devinit start_secondary(vo
 	 */
 	cpu_init();
 	smp_callin();
+	setup_cpu_idt();
 	while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
 		rep_nop();
 	setup_secondary_APIC_clock();
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/traps.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/traps.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 traps.c
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/traps.c	3 Jul 2005 13:20:43 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/traps.c	4 Jul 2005 21:39:56 -0000
@@ -51,6 +51,7 @@
 #include <asm/smp.h>
 #include <asm/arch_hooks.h>
 #include <asm/kdebug.h>
+#include <asm/cpu.h>
 
 #include <linux/irq.h>
 #include <linux/module.h>
@@ -70,7 +71,8 @@ char ignore_fpu_irq = 0;
  * F0 0F bug workaround.. We have a special link segment
  * for this.
  */
-struct desc_struct idt_table[256] __attribute__((__section__(".data.idt"))) = { {0, 0}, };
+struct desc_struct node_idt_table[MAX_NUMNODES][IDT_ENTRIES]
+	__attribute__((__section__(".data.idt"))) = {[0 ... MAX_NUMNODES-1] = {{0, 0}, }};
 
 asmlinkage void divide_error(void);
 asmlinkage void debug(void);
@@ -1085,14 +1087,16 @@ asmlinkage void math_emulate(long arg)
 #ifdef CONFIG_X86_F00F_BUG
 void __init trap_init_f00f_bug(void)
 {
-	__set_fixmap(FIX_F00F_IDT, __pa(&idt_table), PAGE_KERNEL_RO);
+	int node = cpu_to_node(smp_processor_id());
+
+	__set_fixmap(FIX_F00F_IDT, __pa(&node_idt_table[node]), PAGE_KERNEL_RO);
 
 	/*
 	 * Update the IDT descriptor and reload the IDT so that
 	 * it uses the read-only mapped virtual address.
 	 */
-	idt_descr.address = fix_to_virt(FIX_F00F_IDT);
-	__asm__ __volatile__("lidt %0" : : "m" (idt_descr));
+	node_idt_descr.address = fix_to_virt(FIX_F00F_IDT);
+	__asm__ __volatile__("lidt %0" : : "m" (node_idt_descr[node]));
 }
 #endif
 
@@ -1111,14 +1115,21 @@ do { \
 
 
 /*
- * This needs to use 'idt_table' rather than 'idt', and
+ * This needs to use 'node_idt_table' rather than 'idt', and
  * thus use the _nonmapped_ version of the IDT, as the
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
+void node_set_intr_gate(unsigned int node, unsigned int n, void *addr)
+{
+	_set_gate(&node_idt_table[node][n],14,0,addr,__KERNEL_CS);
+}
+
 void set_intr_gate(unsigned int n, void *addr)
 {
-	_set_gate(idt_table+n,14,0,addr,__KERNEL_CS);
+	int node;
+	for (node = 0; node < MAX_NUMNODES; node++)
+		node_set_intr_gate(node, n, addr);
 }
 
 /*
@@ -1126,22 +1137,30 @@ void set_intr_gate(unsigned int n, void 
  */
 static inline void set_system_intr_gate(unsigned int n, void *addr)
 {
-	_set_gate(idt_table+n, 14, 3, addr, __KERNEL_CS);
+	int node;
+	for (node = 0; node < MAX_NUMNODES; node++)
+		_set_gate(&node_idt_table[node][n], 14, 3, addr, __KERNEL_CS);
 }
 
 static void __init set_trap_gate(unsigned int n, void *addr)
 {
-	_set_gate(idt_table+n,15,0,addr,__KERNEL_CS);
+	int node;
+	for (node = 0; node < MAX_NUMNODES; node++)
+		_set_gate(&node_idt_table[node][n],15,0,addr,__KERNEL_CS);
 }
 
 static void __init set_system_gate(unsigned int n, void *addr)
 {
-	_set_gate(idt_table+n,15,3,addr,__KERNEL_CS);
+	int node;
+	for (node = 0; node < MAX_NUMNODES; node++)
+		_set_gate(&node_idt_table[node][n],15,3,addr,__KERNEL_CS);
 }
 
 static void __init set_task_gate(unsigned int n, unsigned int gdt_entry)
 {
-	_set_gate(idt_table+n,5,0,0,(gdt_entry<<3));
+	int node;
+	for (node = 0; node < MAX_NUMNODES; node++)
+		_set_gate(&node_idt_table[node][n],5,0,0,(gdt_entry<<3));
 }
 #ifdef CONFIG_KGDB
 void set_intr_usr_gate(unsigned int n, void *addr)
@@ -1194,6 +1213,8 @@ void __init trap_init(void)
 
 	set_system_gate(SYSCALL_VECTOR,&system_call);
 
+	setup_node_idts();
+
 	/*
 	 * Should be a barrier for any external CPU state.
 	 */
Index: linux-2.6.13-rc1-mm1/arch/i386/kernel/cpu/common.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/kernel/cpu/common.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 common.c
--- linux-2.6.13-rc1-mm1/arch/i386/kernel/cpu/common.c	3 Jul 2005 13:20:44 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/kernel/cpu/common.c	4 Jul 2005 21:42:08 -0000
@@ -562,10 +562,38 @@ void __init early_cpu_init(void)
 	disable_pse = 1;
 #endif
 }
+
+/*
+ * copy over the boot node idt across all nodes, we currently only have
+ * non-unique idt entries for device io interrupts.
+ */
+void __devinit setup_node_idts(void)
+{
+	int node = MAX_NUMNODES;
+
+	/* we can skip setting up node0 since it's done in head.S */
+	while (--node) {
+		memcpy(node_idt_table[node], node_idt_table[0], IDT_SIZE);
+		node_idt_descr[node].size = IDT_SIZE - 1;
+		node_idt_descr[node].address = (unsigned long)node_idt_table[node];
+	}
+}
+
+void __devinit setup_cpu_idt(void)
+{
+	int cpu = smp_processor_id(), node =  cpu_to_node(cpu);
+
+	printk(KERN_DEBUG "CPU%d IDT at 0x%08lx\n", 
+		cpu, node_idt_descr[node].address);
+
+	/* reload the idt on all processors as they come up */
+	__asm__ __volatile__("lidt %0": "=m" (node_idt_descr[node]));
+}
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
- * and IDT. We reload them nevertheless, this function acts as a
+ * We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
  */
 void __devinit cpu_init(void)
@@ -614,7 +642,6 @@ void __devinit cpu_init(void)
 		GDT_ENTRY_TLS_ENTRIES * 8);
 
 	__asm__ __volatile__("lgdt %0" : : "m" (cpu_gdt_descr[cpu]));
-	__asm__ __volatile__("lidt %0" : : "m" (idt_descr));
 
 	/*
 	 * Delete NT
Index: linux-2.6.13-rc1-mm1/arch/i386/mm/fault.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/arch/i386/mm/fault.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 fault.c
--- linux-2.6.13-rc1-mm1/arch/i386/mm/fault.c	3 Jul 2005 13:20:44 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/arch/i386/mm/fault.c	4 Jul 2005 21:39:56 -0000
@@ -400,9 +400,9 @@ bad_area_nosemaphore:
 	 * Pentium F0 0F C7 C8 bug workaround.
 	 */
 	if (boot_cpu_data.f00f_bug) {
-		unsigned long nr;
-		
-		nr = (address - idt_descr.address) >> 3;
+		unsigned long nr, node;
+		node = cpu_to_node(smp_processor_id());
+		nr = (address - node_idt_descr[node].address) >> 3;
 
 		if (nr == 6) {
 			do_invalid_op(regs, 0);
Index: linux-2.6.13-rc1-mm1/drivers/pci/msi.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/drivers/pci/msi.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 msi.c
--- linux-2.6.13-rc1-mm1/drivers/pci/msi.c	3 Jul 2005 13:20:28 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/drivers/pci/msi.c	10 Jul 2005 21:24:44 -0000
@@ -330,7 +330,7 @@ static int assign_msi_vector(void)
 
 		return free_vector;
 	}
-	vector = assign_irq_vector(AUTO_ASSIGN);
+	vector = assign_irq_vector(AUTO_ASSIGN, 0); /* FIXME - Zwane */
 	last_alloc_vector = vector;
 	if (vector  == LAST_DEVICE_VECTOR)
 		new_vector_avail = 0;
@@ -344,7 +344,7 @@ static int get_new_vector(void)
 	int vector;
 
 	if ((vector = assign_msi_vector()) > 0)
-		set_intr_gate(vector, interrupt[vector]);
+		set_intr_gate(vector, interrupt[vector]); /* FIXME - Zwane */
 
 	return vector;
 }
@@ -368,7 +368,7 @@ static int msi_init(void)
 		printk(KERN_WARNING "PCI: MSI cache init failed\n");
 		return status;
 	}
-	last_alloc_vector = assign_irq_vector(AUTO_ASSIGN);
+	last_alloc_vector = assign_irq_vector(AUTO_ASSIGN, 0); /* FIXME - Zwane */
 	if (last_alloc_vector < 0) {
 		pci_msi_enable = 0;
 		printk(KERN_WARNING "PCI: No interrupt vectors available for MSI\n");
Index: linux-2.6.13-rc1-mm1/drivers/pci/msi.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/drivers/pci/msi.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 msi.h
--- linux-2.6.13-rc1-mm1/drivers/pci/msi.h	3 Jul 2005 13:20:28 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/drivers/pci/msi.h	10 Jul 2005 21:20:18 -0000
@@ -19,7 +19,6 @@
 #define NR_HP_RESERVED_VECTORS 	20
 
 extern int vector_irq[NR_VECTORS];
-extern void (*interrupt[NR_IRQS])(void);
 extern int pci_vector_resources(int last, int nr_released);
 
 #ifdef CONFIG_SMP
Index: linux-2.6.13-rc1-mm1/include/asm-i386/cpu.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/cpu.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 cpu.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/cpu.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/cpu.h	4 Jul 2005 21:43:58 -0000
@@ -17,5 +17,8 @@ extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
 #endif
 
+extern void __devinit setup_cpu_idt(void);
+extern void __devinit setup_node_idts(void);
+
 DECLARE_PER_CPU(int, cpu_state);
 #endif /* _ASM_I386_CPU_H_ */
Index: linux-2.6.13-rc1-mm1/include/asm-i386/desc.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/desc.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 desc.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/desc.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/desc.h	4 Jul 2005 21:39:57 -0000
@@ -2,6 +2,7 @@
 #define __ARCH_DESC_H
 
 #include <asm/ldt.h>
+#include <asm/numnodes.h>
 #include <asm/segment.h>
 
 #define CPU_16BIT_STACK_SIZE 1024
@@ -15,6 +16,7 @@
 #include <asm/mmu.h>
 
 extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
+extern struct desc_struct node_idt_table[MAX_NUMNODES][IDT_ENTRIES];
 DECLARE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]);
 
 DECLARE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
@@ -25,7 +27,7 @@ struct Xgt_desc_struct {
 	unsigned short pad;
 } __attribute__ ((packed));
 
-extern struct Xgt_desc_struct idt_descr, cpu_gdt_descr[NR_CPUS];
+extern struct Xgt_desc_struct node_idt_descr[MAX_NUMNODES], cpu_gdt_descr[NR_CPUS];
 
 #define load_TR_desc() __asm__ __volatile__("ltr %%ax"::"a" (GDT_ENTRY_TSS*8))
 #define load_LDT_desc() __asm__ __volatile__("lldt %%ax"::"a" (GDT_ENTRY_LDT*8))
@@ -36,6 +38,7 @@ extern struct Xgt_desc_struct idt_descr,
  */
 extern struct desc_struct default_ldt[];
 extern void set_intr_gate(unsigned int irq, void * addr);
+extern void node_set_intr_gate(unsigned int node, unsigned int vector, void * addr);
 
 #define _set_tssldt_desc(n,addr,limit,type) \
 __asm__ __volatile__ ("movw %w3,0(%2)\n\t" \
Index: linux-2.6.13-rc1-mm1/include/asm-i386/hw_irq.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/hw_irq.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 hw_irq.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/hw_irq.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/hw_irq.h	10 Jul 2005 21:16:51 -0000
@@ -29,7 +29,7 @@ extern u8 irq_vector[NR_IRQ_VECTORS];
 #define IO_APIC_VECTOR(irq)	(irq_vector[irq])
 #define AUTO_ASSIGN		-1
 
-extern void (*interrupt[NR_IRQS])(void);
+extern char interrupt[NR_IRQS][IRQ_STUB_SIZE];
 
 #ifdef CONFIG_SMP
 fastcall void reschedule_interrupt(void);
Index: linux-2.6.13-rc1-mm1/include/asm-i386/io_apic.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/io_apic.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 io_apic.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/io_apic.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/io_apic.h	4 Jul 2005 21:39:57 -0000
@@ -208,6 +208,6 @@ extern int (*ioapic_renumber_irq)(int io
 #define io_apic_assign_pci_irqs 0
 #endif
 
-extern int assign_irq_vector(int irq);
+extern int assign_irq_vector(int irq, int node);
 
 #endif
Index: linux-2.6.13-rc1-mm1/include/asm-i386/segment.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/segment.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 segment.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/segment.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/segment.h	4 Jul 2005 21:39:57 -0000
@@ -97,5 +97,5 @@
  * of tasks we can have..
  */
 #define IDT_ENTRIES 256
-
+#define IDT_SIZE (IDT_ENTRIES * 8)
 #endif
Index: linux-2.6.13-rc1-mm1/include/asm-i386/mach-default/irq_vectors_limits.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/mach-default/irq_vectors_limits.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq_vectors_limits.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/mach-default/irq_vectors_limits.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/mach-default/irq_vectors_limits.h	10 Jul 2005 20:13:50 -0000
@@ -2,11 +2,13 @@
 #define _ASM_IRQ_VECTORS_LIMITS_H
 
 #ifdef CONFIG_PCI_MSI
-#define NR_IRQS FIRST_SYSTEM_VECTOR
+#define NR_IRQS 224
+#define IRQ_STUB_SIZE 16
 #define NR_IRQ_VECTORS NR_IRQS
 #else
 #ifdef CONFIG_X86_IO_APIC
 #define NR_IRQS 224
+#define IRQ_STUB_SIZE 16
 # if (224 >= 32 * NR_CPUS)
 # define NR_IRQ_VECTORS NR_IRQS
 # else
@@ -14,8 +16,12 @@
 # endif
 #else
 #define NR_IRQS 16
+#define IRQ_STUB_SIZE 8
 #define NR_IRQ_VECTORS NR_IRQS
 #endif
 #endif
 
+/* number of vectors available for external interrupts in Linux */
+#define NR_DEVICE_VECTORS	190
+
 #endif /* _ASM_IRQ_VECTORS_LIMITS_H */
Index: linux-2.6.13-rc1-mm1/include/asm-i386/mach-visws/irq_vectors.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/mach-visws/irq_vectors.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq_vectors.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/mach-visws/irq_vectors.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/mach-visws/irq_vectors.h	4 Jul 2005 21:39:57 -0000
@@ -52,7 +52,10 @@
  */
 #define NR_VECTORS 256
 #define NR_IRQS 224
+#define IRQ_STUB_SIZE 16
 #define NR_IRQ_VECTORS NR_IRQS
+/* number of vectors available for external interrupts in Linux */
+#define NR_DEVICE_VECTORS	190
 
 #define FPU_IRQ			13
 
Index: linux-2.6.13-rc1-mm1/include/asm-i386/mach-voyager/irq_vectors.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.13-rc1-mm1/include/asm-i386/mach-voyager/irq_vectors.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq_vectors.h
--- linux-2.6.13-rc1-mm1/include/asm-i386/mach-voyager/irq_vectors.h	3 Jul 2005 13:21:15 -0000	1.1.1.1
+++ linux-2.6.13-rc1-mm1/include/asm-i386/mach-voyager/irq_vectors.h	4 Jul 2005 21:39:57 -0000
@@ -57,7 +57,10 @@
 
 #define NR_VECTORS 256
 #define NR_IRQS 224
+#define IRQ_STUB_SIZE 16
 #define NR_IRQ_VECTORS NR_IRQS
+/* number of vectors available for external interrupts in Linux */
+#define NR_DEVICE_VECTORS	190
 
 #define FPU_IRQ				13
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-08-07 10:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.61.0507101617240.16055@montezuma.fsmlabs.com.suse.lists.linux.kernel>
2005-07-11  1:59 ` [RFC][PATCH] i386: Per node IDT Andi Kleen
2005-07-11  4:02   ` Arjan van de Ven
2005-07-11  4:08     ` Andi Kleen
2005-07-11 14:09     ` Zwane Mwaikambo
2005-07-11 14:05       ` Arjan van de Ven
2005-07-11 15:17       ` Kenji Kaneshige
2005-07-11 13:34   ` Zwane Mwaikambo
2005-07-11 15:03     ` Brian Gerst
2005-07-11 15:21       ` Zwane Mwaikambo
2005-07-11 16:39         ` Andi Kleen
2005-07-11 12:28 Oleg Nesterov
2005-07-11 14:03 ` Zwane Mwaikambo
2005-07-11 14:44   ` Oleg Nesterov
2005-07-11 15:05     ` Zwane Mwaikambo
2005-07-11 15:19     ` Oleg Nesterov
2005-08-07  1:13       ` Zwane Mwaikambo
2005-08-07 10:47         ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2005-07-10 22:41 Zwane Mwaikambo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).