RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 19:32 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-04-27 19:32 UTC (permalink / raw)
  To: Andi Kleen, Protasevich, Natalie
  Cc: sergio, Kimball Murray, linux-kernel, akpm, kmurray, linux-acpi

>But I guess using GSI/vector internally only would be fine.

The last time I tried to name a variable "gsi" instead of "irq",
Linus launched into a tirade that "GSI" doesn't mean anything to him,
or anybody else that googles it.  On the other hand "IRQ" means
something
to everybody, and if you google it you find all kinds of interesting
interrupt-related things.

My point was that "IRQ" means so many "interrupt related" things to
different people in different contexts, that it is effectively
meaningless.

But Linus was not swayed.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 19:32 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-04-27 19:32 UTC (permalink / raw)
  To: Andi Kleen, Protasevich, Natalie
  Cc: sergio, Kimball Murray, linux-kernel, akpm, kmurray, linux-acpi

>But I guess using GSI/vector internally only would be fine.

The last time I tried to name a variable "gsi" instead of "irq",
Linus launched into a tirade that "GSI" doesn't mean anything to him,
or anybody else that googles it.  On the other hand "IRQ" means
something
to everybody, and if you google it you find all kinds of interesting
interrupt-related things.

My point was that "IRQ" means so many "interrupt related" things to
different people in different contexts, that it is effectively
meaningless.

But Linus was not swayed.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* hello
  2006-04-27 19:32 ` Brown, Len
  (?)
@ 2006-04-28  8:06 ` Teo
  -1 siblings, 0 replies; 67+ messages in thread
From: Teo @ 2006-04-28  8:06 UTC (permalink / raw)
  To: Brown, Len; +Cc: linux-acpi

Hi there!

I signed to linux-acpi mailing list a day ago, because
on my machine kernel was in panic. The main reason is
somewhere in ACPI package. If I disable ACPI then it
works.

Although I am relatively new to Linux world I might be
able to help testing.

In case you find my post off-topic or for any reason
don't want seeing me around here please let me know.

Kind regards,
Teo

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-09  5:14 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-09  5:14 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

>> >I have tested this algorithm and it worked just fine for me... I 
> >> >used the following compression code in mp_register_gsi():
> >> >
> >> >
> >> >int     irqs_used = 0;
> >> >int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
> >> >...
> >> >
> >> >        if (triggering == ACPI_LEVEL_SENSITIVE) {
> >> >                if (gsi > NR_IRQS) {
> >> >                        int i;
> >> >                        printk("NBP: looking for unused IRQ\n");
> >> >                        for (i = nr_ioapic_registers[0]; i 
> >< NR_IRQS;
> >> i++) {
> >> >                                if (gsi_to_irq[i] == -1) {
> >> >                                        gsi_to_irq[i] = gsi;
> >> >                                        gsi = i;
> >> >                                        break;
> >> >                                }
> >> >                        }
> >> >                        if (i >= NR_IRQS) {
> >> >                                printk(KERN_ERR "GSI %u is 
> >> too high\n",
> >> ...
> >> >                                return gsi;
> >> >                        }
> >> >                } else
> >> >                        gsi_to_irq[gsi] = gsi;
> >> >        }
> >> 
> >> the problem with this code as it stands is that 
> >> acpi/mp_register_gsi() can be called with gsi in any order.  
> >> So it is possible for the compression code above to select 
> >> gsi_to_irq[n] and later for the non-compression path to 
> >> over-write gsi_to_irq[n].
> >> 
> >
> >It should always find the entry it's done the first one around 
> >actually, the first thing in mp_register_gsi() will check for it I
> think.
> 
> No, the top of mp_register_gsi() is based on the real GSI
> (before re-numbering) and the real IOAPIC pin number.
> It prevents re-programming the real IOAPIC pin (a dubious 
> optimization,
> IMHO).
> Yes, if it finds a 2nd call results in the same pin, it returns
> gsi_to_irq[i],
> but that isn't the failure case here.
> 
> a failure case is like so:
> say the 1st IOAPIC has 24 pins, so the 0th RTE of the 2nd APIC is #24,
> and this happens:
> 
> mp_register_gsi(NR_IRQS+1);
> 	gsi_to_irq[24] is free, so it will get claimed
> 	by the IRQ that wants to talk to GSI NR_IRQS+1
> 
> mp_register_gsi(24);
> 	gsi_to_irq[24] was not -1 here, it was NR_IRQS+1,
> 	but that will get over-written to be 24.
> 
> So now you have multiple independent interrupt sources that think
> they own IRQ 24.

But if the first one was say 280, then gsi_to_irq[24]=280. The other one
will be looking for gsi_to_irq[XX]=24. That should make differentiation
I hope. Don't you think? I will go through this some more...

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-09  5:14 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-09  5:14 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

>> >I have tested this algorithm and it worked just fine for me... I 
> >> >used the following compression code in mp_register_gsi():
> >> >
> >> >
> >> >int     irqs_used = 0;
> >> >int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
> >> >...
> >> >
> >> >        if (triggering == ACPI_LEVEL_SENSITIVE) {
> >> >                if (gsi > NR_IRQS) {
> >> >                        int i;
> >> >                        printk("NBP: looking for unused IRQ\n");
> >> >                        for (i = nr_ioapic_registers[0]; i 
> >< NR_IRQS;
> >> i++) {
> >> >                                if (gsi_to_irq[i] == -1) {
> >> >                                        gsi_to_irq[i] = gsi;
> >> >                                        gsi = i;
> >> >                                        break;
> >> >                                }
> >> >                        }
> >> >                        if (i >= NR_IRQS) {
> >> >                                printk(KERN_ERR "GSI %u is 
> >> too high\n",
> >> ...
> >> >                                return gsi;
> >> >                        }
> >> >                } else
> >> >                        gsi_to_irq[gsi] = gsi;
> >> >        }
> >> 
> >> the problem with this code as it stands is that 
> >> acpi/mp_register_gsi() can be called with gsi in any order.  
> >> So it is possible for the compression code above to select 
> >> gsi_to_irq[n] and later for the non-compression path to 
> >> over-write gsi_to_irq[n].
> >> 
> >
> >It should always find the entry it's done the first one around 
> >actually, the first thing in mp_register_gsi() will check for it I
> think.
> 
> No, the top of mp_register_gsi() is based on the real GSI
> (before re-numbering) and the real IOAPIC pin number.
> It prevents re-programming the real IOAPIC pin (a dubious 
> optimization,
> IMHO).
> Yes, if it finds a 2nd call results in the same pin, it returns
> gsi_to_irq[i],
> but that isn't the failure case here.
> 
> a failure case is like so:
> say the 1st IOAPIC has 24 pins, so the 0th RTE of the 2nd APIC is #24,
> and this happens:
> 
> mp_register_gsi(NR_IRQS+1);
> 	gsi_to_irq[24] is free, so it will get claimed
> 	by the IRQ that wants to talk to GSI NR_IRQS+1
> 
> mp_register_gsi(24);
> 	gsi_to_irq[24] was not -1 here, it was NR_IRQS+1,
> 	but that will get over-written to be 24.
> 
> So now you have multiple independent interrupt sources that think
> they own IRQ 24.

But if the first one was say 280, then gsi_to_irq[24]=280. The other one
will be looking for gsi_to_irq[XX]=24. That should make differentiation
I hope. Don't you think? I will go through this some more...

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-09  4:25 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-09  4:25 UTC (permalink / raw)
  To: Protasevich, Natalie, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

>> >I have tested this algorithm and it worked just fine for 
>> >me... I used 
>> >the following compression code in mp_register_gsi():
>> >
>> >
>> >int     irqs_used = 0;
>> >int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
>> >...
>> >
>> >        if (triggering == ACPI_LEVEL_SENSITIVE) {
>> >                if (gsi > NR_IRQS) {
>> >                        int i;
>> >                        printk("NBP: looking for unused IRQ\n");
>> >                        for (i = nr_ioapic_registers[0]; i 
>< NR_IRQS;
>> i++) {
>> >                                if (gsi_to_irq[i] == -1) {
>> >                                        gsi_to_irq[i] = gsi;
>> >                                        gsi = i;
>> >                                        break;
>> >                                }
>> >                        }
>> >                        if (i >= NR_IRQS) {
>> >                                printk(KERN_ERR "GSI %u is 
>> too high\n",
>> ...
>> >                                return gsi;
>> >                        }
>> >                } else
>> >                        gsi_to_irq[gsi] = gsi;
>> >        }
>> 
>> the problem with this code as it stands is that 
>> acpi/mp_register_gsi() can be called with gsi in any order.  
>> So it is possible for the compression code above to select 
>> gsi_to_irq[n] and later for the non-compression path to 
>> over-write gsi_to_irq[n].
>> 
>
>It should always find the entry it's done the first one around 
>actually, the first thing in mp_register_gsi() will check for it I
think.

No, the top of mp_register_gsi() is based on the real GSI
(before re-numbering) and the real IOAPIC pin number.
It prevents re-programming the real IOAPIC pin (a dubious optimization,
IMHO).
Yes, if it finds a 2nd call results in the same pin, it returns
gsi_to_irq[i],
but that isn't the failure case here.

a failure case is like so:
say the 1st IOAPIC has 24 pins, so the 0th RTE of the 2nd APIC is #24,
and this happens:

mp_register_gsi(NR_IRQS+1);
	gsi_to_irq[24] is free, so it will get claimed
	by the IRQ that wants to talk to GSI NR_IRQS+1

mp_register_gsi(24);
	gsi_to_irq[24] was not -1 here, it was NR_IRQS+1,
	but that will get over-written to be 24.

So now you have multiple independent interrupt sources that think
they own IRQ 24.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-09  4:25 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-09  4:25 UTC (permalink / raw)
  To: Protasevich, Natalie, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

>> >I have tested this algorithm and it worked just fine for 
>> >me... I used 
>> >the following compression code in mp_register_gsi():
>> >
>> >
>> >int     irqs_used = 0;
>> >int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
>> >...
>> >
>> >        if (triggering == ACPI_LEVEL_SENSITIVE) {
>> >                if (gsi > NR_IRQS) {
>> >                        int i;
>> >                        printk("NBP: looking for unused IRQ\n");
>> >                        for (i = nr_ioapic_registers[0]; i 
>< NR_IRQS;
>> i++) {
>> >                                if (gsi_to_irq[i] == -1) {
>> >                                        gsi_to_irq[i] = gsi;
>> >                                        gsi = i;
>> >                                        break;
>> >                                }
>> >                        }
>> >                        if (i >= NR_IRQS) {
>> >                                printk(KERN_ERR "GSI %u is 
>> too high\n",
>> ...
>> >                                return gsi;
>> >                        }
>> >                } else
>> >                        gsi_to_irq[gsi] = gsi;
>> >        }
>> 
>> the problem with this code as it stands is that 
>> acpi/mp_register_gsi() can be called with gsi in any order.  
>> So it is possible for the compression code above to select 
>> gsi_to_irq[n] and later for the non-compression path to 
>> over-write gsi_to_irq[n].
>> 
>
>It should always find the entry it's done the first one around 
>actually, the first thing in mp_register_gsi() will check for it I
think.

No, the top of mp_register_gsi() is based on the real GSI
(before re-numbering) and the real IOAPIC pin number.
It prevents re-programming the real IOAPIC pin (a dubious optimization,
IMHO).
Yes, if it finds a 2nd call results in the same pin, it returns
gsi_to_irq[i],
but that isn't the failure case here.

a failure case is like so:
say the 1st IOAPIC has 24 pins, so the 0th RTE of the 2nd APIC is #24,
and this happens:

mp_register_gsi(NR_IRQS+1);
	gsi_to_irq[24] is free, so it will get claimed
	by the IRQ that wants to talk to GSI NR_IRQS+1

mp_register_gsi(24);
	gsi_to_irq[24] was not -1 here, it was NR_IRQS+1,
	but that will get over-written to be 24.

So now you have multiple independent interrupt sources that think
they own IRQ 24.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-09  3:10 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-09  3:10 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> <6>IOAPIC[64]: apic_id 127, version 32, address 0xfec49000, GSI
> 1728-1751
> 
> wow, big box!

Yea, it's decent :) I asked for the big one since this way we can test
IRQs "wrapping  around".

> 
> >I have tested this algorithm and it worked just fine for 
> me... I used 
> >the following compression code in mp_register_gsi():
> >
> >
> >int     irqs_used = 0;
> >int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
> >...
> >
> >        if (triggering == ACPI_LEVEL_SENSITIVE) {
> >                if (gsi > NR_IRQS) {
> >                        int i;
> >                        printk("NBP: looking for unused IRQ\n");
> >                        for (i = nr_ioapic_registers[0]; i < NR_IRQS;
> i++) {
> >                                if (gsi_to_irq[i] == -1) {
> >                                        gsi_to_irq[i] = gsi;
> >                                        gsi = i;
> >                                        break;
> >                                }
> >                        }
> >                        if (i >= NR_IRQS) {
> >                                printk(KERN_ERR "GSI %u is 
> too high\n",
> ...
> >                                return gsi;
> >                        }
> >                } else
> >                        gsi_to_irq[gsi] = gsi;
> >        }
> 
> the problem with this code as it stands is that 
> acpi/mp_register_gsi() can be called with gsi in any order.  
> So it is possible for the compression code above to select 
> gsi_to_irq[n] and later for the non-compression path to 
> over-write gsi_to_irq[n].
> 

It should always find the entry it's done the first one around actually,
the first thing in mp_register_gsi() will check for it I think.

> Also, I would prefer that this code be in 
> ioapic_renumber_irq(), as I think it is unnecessarily complex 
> to re-number, and then re-number again.
> (gsi_irq_sharing() is a separate discussion)

Sure - for i386, but Len, we would have to bring it into x86_64 and make
this thing above (or something like this) to be a default handler, yes. 

> 
> I think this should be model-specific, but if you feel that 
> handling (gsi > NR_IRQS) here is important in the generic 
> case, then I'm fine with this being a default 
> ioapic_renumber_irq() handler that a platform can augment/override.
>

OK.. besides note that only pretty large system will wrap around to get
unused IRQs, the rest of the world is getting pretty much default
behavior, pre-compression time so to speak :)
 
> Re: returning error
> At one point acpi_register_gsi() was able to return an error.
> However, that was broken when acpi_gsi_to_irq() was created, 
> and then broken worse when that routine was modified with 
> gsi_irq_sharing().  if you BUG() on i > NR_IRQS, that would 
> be consistent wth gsi_irq_sharing().

Ok, sounds good. I will put something together according to the above
and hopefully test it soon.

Thanks,
--Natalie
 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-09  3:10 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-09  3:10 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> <6>IOAPIC[64]: apic_id 127, version 32, address 0xfec49000, GSI
> 1728-1751
> 
> wow, big box!

Yea, it's decent :) I asked for the big one since this way we can test
IRQs "wrapping  around".

> 
> >I have tested this algorithm and it worked just fine for 
> me... I used 
> >the following compression code in mp_register_gsi():
> >
> >
> >int     irqs_used = 0;
> >int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
> >...
> >
> >        if (triggering == ACPI_LEVEL_SENSITIVE) {
> >                if (gsi > NR_IRQS) {
> >                        int i;
> >                        printk("NBP: looking for unused IRQ\n");
> >                        for (i = nr_ioapic_registers[0]; i < NR_IRQS;
> i++) {
> >                                if (gsi_to_irq[i] == -1) {
> >                                        gsi_to_irq[i] = gsi;
> >                                        gsi = i;
> >                                        break;
> >                                }
> >                        }
> >                        if (i >= NR_IRQS) {
> >                                printk(KERN_ERR "GSI %u is 
> too high\n",
> ...
> >                                return gsi;
> >                        }
> >                } else
> >                        gsi_to_irq[gsi] = gsi;
> >        }
> 
> the problem with this code as it stands is that 
> acpi/mp_register_gsi() can be called with gsi in any order.  
> So it is possible for the compression code above to select 
> gsi_to_irq[n] and later for the non-compression path to 
> over-write gsi_to_irq[n].
> 

It should always find the entry it's done the first one around actually,
the first thing in mp_register_gsi() will check for it I think.

> Also, I would prefer that this code be in 
> ioapic_renumber_irq(), as I think it is unnecessarily complex 
> to re-number, and then re-number again.
> (gsi_irq_sharing() is a separate discussion)

Sure - for i386, but Len, we would have to bring it into x86_64 and make
this thing above (or something like this) to be a default handler, yes. 

> 
> I think this should be model-specific, but if you feel that 
> handling (gsi > NR_IRQS) here is important in the generic 
> case, then I'm fine with this being a default 
> ioapic_renumber_irq() handler that a platform can augment/override.
>

OK.. besides note that only pretty large system will wrap around to get
unused IRQs, the rest of the world is getting pretty much default
behavior, pre-compression time so to speak :)
 
> Re: returning error
> At one point acpi_register_gsi() was able to return an error.
> However, that was broken when acpi_gsi_to_irq() was created, 
> and then broken worse when that routine was modified with 
> gsi_irq_sharing().  if you BUG() on i > NR_IRQS, that would 
> be consistent wth gsi_irq_sharing().

Ok, sounds good. I will put something together according to the above
and hopefully test it soon.

Thanks,
--Natalie
 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-08 21:51 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-08 21:51 UTC (permalink / raw)
  To: Protasevich, Natalie, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

<6>IOAPIC[64]: apic_id 127, version 32, address 0xfec49000, GSI
1728-1751

wow, big box!

>I have tested this algorithm and it worked just fine for me... I used
>the following compression code in mp_register_gsi():
>
>
>int     irqs_used = 0;
>int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
>...
>
>        if (triggering == ACPI_LEVEL_SENSITIVE) {
>                if (gsi > NR_IRQS) {
>                        int i;
>                        printk("NBP: looking for unused IRQ\n");
>                        for (i = nr_ioapic_registers[0]; i < NR_IRQS;
i++) {
>                                if (gsi_to_irq[i] == -1) {
>                                        gsi_to_irq[i] = gsi;
>                                        gsi = i;
>                                        break;
>                                }
>                        }
>                        if (i >= NR_IRQS) {
>                                printk(KERN_ERR "GSI %u is too high\n",
...
>                                return gsi;
>                        }
>                } else
>                        gsi_to_irq[gsi] = gsi;
>        }

the problem with this code as it stands is that acpi/mp_register_gsi()
can be called with gsi in any order.  So it is possible
for the compression code above to select gsi_to_irq[n]
and later for the non-compression path to over-write gsi_to_irq[n].

Also, I would prefer that this code be in ioapic_renumber_irq(), as I
think
it is unnecessarily complex to re-number, and then re-number again.
(gsi_irq_sharing() is a separate discussion)

I think this should be model-specific, but if you feel that handling
(gsi > NR_IRQS) here is important in the generic case, then I'm
fine with this being a default ioapic_renumber_irq() handler
that a platform can augment/override.

Re: returning error
At one point acpi_register_gsi() was able to return an error.
However, that was broken when acpi_gsi_to_irq() was created,
and then broken worse when that routine was modified
with gsi_irq_sharing().  if you BUG() on i > NR_IRQS,
that would be consistent wth gsi_irq_sharing().

thanks,
-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-08 21:51 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-08 21:51 UTC (permalink / raw)
  To: Protasevich, Natalie, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

<6>IOAPIC[64]: apic_id 127, version 32, address 0xfec49000, GSI
1728-1751

wow, big box!

>I have tested this algorithm and it worked just fine for me... I used
>the following compression code in mp_register_gsi():
>
>
>int     irqs_used = 0;
>int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
>...
>
>        if (triggering == ACPI_LEVEL_SENSITIVE) {
>                if (gsi > NR_IRQS) {
>                        int i;
>                        printk("NBP: looking for unused IRQ\n");
>                        for (i = nr_ioapic_registers[0]; i < NR_IRQS;
i++) {
>                                if (gsi_to_irq[i] == -1) {
>                                        gsi_to_irq[i] = gsi;
>                                        gsi = i;
>                                        break;
>                                }
>                        }
>                        if (i >= NR_IRQS) {
>                                printk(KERN_ERR "GSI %u is too high\n",
...
>                                return gsi;
>                        }
>                } else
>                        gsi_to_irq[gsi] = gsi;
>        }

the problem with this code as it stands is that acpi/mp_register_gsi()
can be called with gsi in any order.  So it is possible
for the compression code above to select gsi_to_irq[n]
and later for the non-compression path to over-write gsi_to_irq[n].

Also, I would prefer that this code be in ioapic_renumber_irq(), as I
think
it is unnecessarily complex to re-number, and then re-number again.
(gsi_irq_sharing() is a separate discussion)

I think this should be model-specific, but if you feel that handling
(gsi > NR_IRQS) here is important in the generic case, then I'm
fine with this being a default ioapic_renumber_irq() handler
that a platform can augment/override.

Re: returning error
At one point acpi_register_gsi() was able to return an error.
However, that was broken when acpi_gsi_to_irq() was created,
and then broken worse when that routine was modified
with gsi_irq_sharing().  if you BUG() on i > NR_IRQS,
that would be consistent wth gsi_irq_sharing().

thanks,
-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-08 18:37 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-08 18:37 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> > Natalie,
> > Regarding the "IRQ compression" in mp_register_gsi()....
> > 
> > Some time ago we invented ioapic_renumber_irq() to handle the case 
> > where the ES7000 BIOS is missing the INTERRUPT_OVERRIDES needed to 
> > tell that legacy IRQs used pins > 15, and PCI interrupts
> used pins <
> > 16.
> > 
> > In that case, the ES7000 uses es7000_rename_gsi() to simply
> re-number
> > the IRQs for the lower pins to some unused numbers above
> the highest
> > GSI in the system -- emulating what the BIOS should have done.
> > 
> > Later, GSI compression was also added to mp_register_gsi():
> > 
> > if (triggering == ACPI_LEVEL_SENSITIVE) {
> > 	gsi = pci_irq++;
> > 	...
> > }
> > 
> > This GSI "compression" was intended to handle systems which have a 
> > large number of sparsely populated IOAPICs.  On these systems, the 
> > maximum GSI is well above NR_IRQS.  While there can be a
> large number
> > of devices on these boxes, the total in use is actually still below 
> > NR_IRQs -- so squeezing the GSIs into the IRQ numbers works.
> > 
> > The problem with doing this is that it also compresses the
> IRQ numbers
> > for the other 99.99% of the systems on earth, causing
> complexity and
> > bugs, such as several mentioned on this thread.
> > 
> > I'm thinking that it would be simpler for the big iron that
> has GSI's
> > > NR_IRQs, to simply use the existing
> > ioapci_renumber_irq() hook to do whatever compression or re-naming 
> > they fancy.
> > This will also allow for simpler systems to remain simple and use 
> > identity irq = gsi numbering.
> > (don't get me started on CONFIG_PCI_MSI, that is an
> independent issue)
> > 
> > Also, as we discussed, it would probably be cleaner even on those 
> > large systems if the compression algorithm did not re-name
> every GSI,
> > but only did re-names when it really has to (GSI > NR_IRQS).  eg.
> > 
> > < gsi = pci_irq++
> > ---
> > > if (GSI < NR_IRQS)
> > >    gsi = irq;
> > > else
> > >    BUG_ON(irq_used >= NR_IRQS)
> > >    gsi = platform specific algorithm to grab the unused IRQs that
> > >    can never have any real devices attached.
> > > irqs_used++;
> > 
> > If we had done this, then we wouldn't have needed kimbal's patch.
> > If we do this, then we can delete that workaround on top of 
> > workaround.
> > 
> > thoughts?
> 
> Sure sounds like a great idea. I just signed up for a large partition 
> (> 1700 GSIs) to try this over the weekend.
> The problem with oapic_renumber_gsi() is that it didn't actually 
> migrate to x86_64 because it doesn't have sub-archs so far. But using 
> actual number of interrupts vs. range is definitely promising idea.

I have tested this algorithm and it worked just fine for me... I used
the following compression code in mp_register_gsi():


int     irqs_used = 0;
int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
...

        if (triggering == ACPI_LEVEL_SENSITIVE) {
                if (gsi > NR_IRQS) {
                        int i;
                        printk("NBP: looking for unused IRQ\n");
                        for (i = nr_ioapic_registers[0]; i < NR_IRQS;
i++) {
                                if (gsi_to_irq[i] == -1) {
                                        gsi_to_irq[i] = gsi;
                                        gsi = i;
                                        break;
                                }
                        }
                        if (i >= NR_IRQS) {
                                printk(KERN_ERR "GSI %u is too high\n",
gsi); <== could ne BUG here, 
 
but Im not sure if you want 
 
to panic the system or just 
 
not register those extra devices
                                return gsi;
                        }
                } else
                        gsi_to_irq[gsi] = gsi;
        }

- only starting looking for unused IRQ after getting large IRQ numbers,
and not looking for free ones on the first io-apic avoiding all dangers
of it :) (and plus some code in io_apic.c to record used entries in
gsi_to_irq[]).
Would that be better interim solution? It so, I would also like Kimball
and someone with VIA chipset handy to test this of course...

Thanks,
--Natalie

> I was also thinking that all the
> complexity is actually located on the first IO-APIC, this may have 
> some potential for compression only coming into play on non-zero 
> IO-APICs. But using the total number of IRQs used is definitely the 
> most logical thing to try...I will test this shortly - Thanks, 
> --Natalie
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-08 18:37 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-08 18:37 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> > Natalie,
> > Regarding the "IRQ compression" in mp_register_gsi()....
> > 
> > Some time ago we invented ioapic_renumber_irq() to handle the case 
> > where the ES7000 BIOS is missing the INTERRUPT_OVERRIDES needed to 
> > tell that legacy IRQs used pins > 15, and PCI interrupts
> used pins <
> > 16.
> > 
> > In that case, the ES7000 uses es7000_rename_gsi() to simply
> re-number
> > the IRQs for the lower pins to some unused numbers above
> the highest
> > GSI in the system -- emulating what the BIOS should have done.
> > 
> > Later, GSI compression was also added to mp_register_gsi():
> > 
> > if (triggering == ACPI_LEVEL_SENSITIVE) {
> > 	gsi = pci_irq++;
> > 	...
> > }
> > 
> > This GSI "compression" was intended to handle systems which have a 
> > large number of sparsely populated IOAPICs.  On these systems, the 
> > maximum GSI is well above NR_IRQS.  While there can be a
> large number
> > of devices on these boxes, the total in use is actually still below 
> > NR_IRQs -- so squeezing the GSIs into the IRQ numbers works.
> > 
> > The problem with doing this is that it also compresses the
> IRQ numbers
> > for the other 99.99% of the systems on earth, causing
> complexity and
> > bugs, such as several mentioned on this thread.
> > 
> > I'm thinking that it would be simpler for the big iron that
> has GSI's
> > > NR_IRQs, to simply use the existing
> > ioapci_renumber_irq() hook to do whatever compression or re-naming 
> > they fancy.
> > This will also allow for simpler systems to remain simple and use 
> > identity irq = gsi numbering.
> > (don't get me started on CONFIG_PCI_MSI, that is an
> independent issue)
> > 
> > Also, as we discussed, it would probably be cleaner even on those 
> > large systems if the compression algorithm did not re-name
> every GSI,
> > but only did re-names when it really has to (GSI > NR_IRQS).  eg.
> > 
> > < gsi = pci_irq++
> > ---
> > > if (GSI < NR_IRQS)
> > >    gsi = irq;
> > > else
> > >    BUG_ON(irq_used >= NR_IRQS)
> > >    gsi = platform specific algorithm to grab the unused IRQs that
> > >    can never have any real devices attached.
> > > irqs_used++;
> > 
> > If we had done this, then we wouldn't have needed kimbal's patch.
> > If we do this, then we can delete that workaround on top of 
> > workaround.
> > 
> > thoughts?
> 
> Sure sounds like a great idea. I just signed up for a large partition 
> (> 1700 GSIs) to try this over the weekend.
> The problem with oapic_renumber_gsi() is that it didn't actually 
> migrate to x86_64 because it doesn't have sub-archs so far. But using 
> actual number of interrupts vs. range is definitely promising idea.

I have tested this algorithm and it worked just fine for me... I used
the following compression code in mp_register_gsi():


int     irqs_used = 0;
int     gsi_to_irq[NR_IRQS] = { [0 ... NR_IRQS-1] = -1 };
...

        if (triggering == ACPI_LEVEL_SENSITIVE) {
                if (gsi > NR_IRQS) {
                        int i;
                        printk("NBP: looking for unused IRQ\n");
                        for (i = nr_ioapic_registers[0]; i < NR_IRQS;
i++) {
                                if (gsi_to_irq[i] == -1) {
                                        gsi_to_irq[i] = gsi;
                                        gsi = i;
                                        break;
                                }
                        }
                        if (i >= NR_IRQS) {
                                printk(KERN_ERR "GSI %u is too high\n",
gsi); <== could ne BUG here, 
 
but Im not sure if you want 
 
to panic the system or just 
 
not register those extra devices
                                return gsi;
                        }
                } else
                        gsi_to_irq[gsi] = gsi;
        }

- only starting looking for unused IRQ after getting large IRQ numbers,
and not looking for free ones on the first io-apic avoiding all dangers
of it :) (and plus some code in io_apic.c to record used entries in
gsi_to_irq[]).
Would that be better interim solution? It so, I would also like Kimball
and someone with VIA chipset handy to test this of course...

Thanks,
--Natalie

> I was also thinking that all the
> complexity is actually located on the first IO-APIC, this may have 
> some potential for compression only coming into play on non-zero 
> IO-APICs. But using the total number of IRQs used is definitely the 
> most logical thing to try...I will test this shortly - Thanks, 
> --Natalie
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-06  6:42 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-06  6:42 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> Natalie,
> Regarding the "IRQ compression" in mp_register_gsi()....
> 
> Some time ago we invented ioapic_renumber_irq() to handle the 
> case where the ES7000 BIOS is missing the INTERRUPT_OVERRIDES 
> needed to tell that legacy IRQs used pins > 15, and PCI 
> interrupts used pins < 16.
> 
> In that case, the ES7000 uses es7000_rename_gsi() to simply 
> re-number the IRQs for the lower pins to some unused numbers 
> above the highest GSI in the system -- emulating what the 
> BIOS should have done.
> 
> Later, GSI compression was also added to mp_register_gsi():
> 
> if (triggering == ACPI_LEVEL_SENSITIVE)
> {
> 	gsi = pci_irq++;
> 	...
> }
> 
> This GSI "compression" was intended to handle systems which 
> have a large number of sparsely populated IOAPICs.  On these 
> systems, the maximum GSI is well above NR_IRQS.  While there 
> can be a large number of devices on these boxes, the total in 
> use is actually still below NR_IRQs -- so squeezing the GSIs 
> into the IRQ numbers works.
> 
> The problem with doing this is that it also compresses the 
> IRQ numbers for the other 99.99% of the systems on earth, 
> causing complexity and bugs, such as several mentioned on this thread.
> 
> I'm thinking that it would be simpler for the big iron that 
> has GSI's > NR_IRQs, to simply use the existing 
> ioapci_renumber_irq() hook to do whatever compression or 
> re-naming they fancy.
> This will also allow for simpler systems to remain simple and 
> use identity irq = gsi numbering.
> (don't get me started on CONFIG_PCI_MSI, that is an independent issue)
> 
> Also, as we discussed, it would probably be cleaner even on 
> those large systems if the compression algorithm did not 
> re-name every GSI, but only did re-names when it really has 
> to (GSI > NR_IRQS).  eg.
> 
> < gsi = pci_irq++
> ---
> > if (GSI < NR_IRQS)
> >    gsi = irq;
> > else
> >    BUG_ON(irq_used >= NR_IRQS)
> >    gsi = platform specific algorithm to grab the unused IRQs that
> >    can never have any real devices attached.
> > irqs_used++;
> 
> If we had done this, then we wouldn't have needed kimbal's patch.
> If we do this, then we can delete that workaround on top of 
> workaround.
> 
> thoughts?

Sure sounds like a great idea. I just signed up for a large partition (>
1700 GSIs) to try this over the weekend.
The problem with oapic_renumber_gsi() is that it didn't actually migrate
to x86_64 because it doesn't have sub-archs so far. But using actual
number of interrupts vs. range is definitely promising idea. I was also
thinking that all the complexity is actually located on the first
IO-APIC, this may have some potential for compression only coming into
play on non-zero IO-APICs. But using the total number of IRQs used is
definitely the most logical thing to try...I will test this shortly -
Thanks, --Natalie 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-06  6:42 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-06  6:42 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> Natalie,
> Regarding the "IRQ compression" in mp_register_gsi()....
> 
> Some time ago we invented ioapic_renumber_irq() to handle the 
> case where the ES7000 BIOS is missing the INTERRUPT_OVERRIDES 
> needed to tell that legacy IRQs used pins > 15, and PCI 
> interrupts used pins < 16.
> 
> In that case, the ES7000 uses es7000_rename_gsi() to simply 
> re-number the IRQs for the lower pins to some unused numbers 
> above the highest GSI in the system -- emulating what the 
> BIOS should have done.
> 
> Later, GSI compression was also added to mp_register_gsi():
> 
> if (triggering == ACPI_LEVEL_SENSITIVE)
> {
> 	gsi = pci_irq++;
> 	...
> }
> 
> This GSI "compression" was intended to handle systems which 
> have a large number of sparsely populated IOAPICs.  On these 
> systems, the maximum GSI is well above NR_IRQS.  While there 
> can be a large number of devices on these boxes, the total in 
> use is actually still below NR_IRQs -- so squeezing the GSIs 
> into the IRQ numbers works.
> 
> The problem with doing this is that it also compresses the 
> IRQ numbers for the other 99.99% of the systems on earth, 
> causing complexity and bugs, such as several mentioned on this thread.
> 
> I'm thinking that it would be simpler for the big iron that 
> has GSI's > NR_IRQs, to simply use the existing 
> ioapci_renumber_irq() hook to do whatever compression or 
> re-naming they fancy.
> This will also allow for simpler systems to remain simple and 
> use identity irq = gsi numbering.
> (don't get me started on CONFIG_PCI_MSI, that is an independent issue)
> 
> Also, as we discussed, it would probably be cleaner even on 
> those large systems if the compression algorithm did not 
> re-name every GSI, but only did re-names when it really has 
> to (GSI > NR_IRQS).  eg.
> 
> < gsi = pci_irq++
> ---
> > if (GSI < NR_IRQS)
> >    gsi = irq;
> > else
> >    BUG_ON(irq_used >= NR_IRQS)
> >    gsi = platform specific algorithm to grab the unused IRQs that
> >    can never have any real devices attached.
> > irqs_used++;
> 
> If we had done this, then we wouldn't have needed kimbal's patch.
> If we do this, then we can delete that workaround on top of 
> workaround.
> 
> thoughts?

Sure sounds like a great idea. I just signed up for a large partition (>
1700 GSIs) to try this over the weekend.
The problem with oapic_renumber_gsi() is that it didn't actually migrate
to x86_64 because it doesn't have sub-archs so far. But using actual
number of interrupts vs. range is definitely promising idea. I was also
thinking that all the complexity is actually located on the first
IO-APIC, this may have some potential for compression only coming into
play on non-zero IO-APICs. But using the total number of IRQs used is
definitely the most logical thing to try...I will test this shortly -
Thanks, --Natalie 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-06  6:18 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-06  6:18 UTC (permalink / raw)
  To: Protasevich, Natalie, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

Natalie,
Regarding the "IRQ compression" in mp_register_gsi()....

Some time ago we invented ioapic_renumber_irq() to handle the case
where the ES7000 BIOS is missing the INTERRUPT_OVERRIDES needed
to tell that legacy IRQs used pins > 15, and PCI interrupts used
pins < 16.

In that case, the ES7000 uses es7000_rename_gsi() to simply re-number
the IRQs for the lower pins to some unused numbers above the highest
GSI in the system -- emulating what the BIOS should have done.

Later, GSI compression was also added to mp_register_gsi():

if (triggering == ACPI_LEVEL_SENSITIVE)
{
	gsi = pci_irq++;
	...
}

This GSI "compression" was intended to handle systems which have a large
number of sparsely populated IOAPICs.  On these systems, the maximum
GSI is well above NR_IRQS.  While there can be a large number of devices
on these boxes, the total in use is actually still below NR_IRQs --
so squeezing the GSIs into the IRQ numbers works.

The problem with doing this is that it also compresses the IRQ numbers
for the other 99.99% of the systems on earth, causing complexity
and bugs, such as several mentioned on this thread.

I'm thinking that it would be simpler for the big iron that has
GSI's > NR_IRQs, to simply use the existing ioapci_renumber_irq() hook
to do whatever compression or re-naming they fancy.
This will also allow for simpler systems to remain simple
and use identity irq = gsi numbering.
(don't get me started on CONFIG_PCI_MSI, that is an independent issue)

Also, as we discussed, it would probably be cleaner even on those large
systems if the compression algorithm did not re-name every GSI, but only
did re-names when it really has to (GSI > NR_IRQS).  eg.

< gsi = pci_irq++
---
> if (GSI < NR_IRQS)
>    gsi = irq;
> else
>    BUG_ON(irq_used >= NR_IRQS)
>    gsi = platform specific algorithm to grab the unused IRQs that
>    can never have any real devices attached.
> irqs_used++;

If we had done this, then we wouldn't have needed kimbal's patch.
If we do this, then we can delete that workaround on top of workaround.

thoughts?
-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-06  6:18 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-06  6:18 UTC (permalink / raw)
  To: Protasevich, Natalie, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

Natalie,
Regarding the "IRQ compression" in mp_register_gsi()....

Some time ago we invented ioapic_renumber_irq() to handle the case
where the ES7000 BIOS is missing the INTERRUPT_OVERRIDES needed
to tell that legacy IRQs used pins > 15, and PCI interrupts used
pins < 16.

In that case, the ES7000 uses es7000_rename_gsi() to simply re-number
the IRQs for the lower pins to some unused numbers above the highest
GSI in the system -- emulating what the BIOS should have done.

Later, GSI compression was also added to mp_register_gsi():

if (triggering == ACPI_LEVEL_SENSITIVE)
{
	gsi = pci_irq++;
	...
}

This GSI "compression" was intended to handle systems which have a large
number of sparsely populated IOAPICs.  On these systems, the maximum
GSI is well above NR_IRQS.  While there can be a large number of devices
on these boxes, the total in use is actually still below NR_IRQs --
so squeezing the GSIs into the IRQ numbers works.

The problem with doing this is that it also compresses the IRQ numbers
for the other 99.99% of the systems on earth, causing complexity
and bugs, such as several mentioned on this thread.

I'm thinking that it would be simpler for the big iron that has
GSI's > NR_IRQs, to simply use the existing ioapci_renumber_irq() hook
to do whatever compression or re-naming they fancy.
This will also allow for simpler systems to remain simple
and use identity irq = gsi numbering.
(don't get me started on CONFIG_PCI_MSI, that is an independent issue)

Also, as we discussed, it would probably be cleaner even on those large
systems if the compression algorithm did not re-name every GSI, but only
did re-names when it really has to (GSI > NR_IRQS).  eg.

< gsi = pci_irq++
---
> if (GSI < NR_IRQS)
>    gsi = irq;
> else
>    BUG_ON(irq_used >= NR_IRQS)
>    gsi = platform specific algorithm to grab the unused IRQs that
>    can never have any real devices attached.
> irqs_used++;

If we had done this, then we wouldn't have needed kimbal's patch.
If we do this, then we can delete that workaround on top of workaround.

thoughts?
-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 16:42 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-04 16:42 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> 
> I think what we can do in the short term is to make these 
> workarounds not have any effect on the systems which don't 
> need them.  This means searching like gsi_irq_sharing() does, 
> instead of always compressing like mp_register_gsi() does.  
> It also means not printing dmesg about vector sharing when no 
> sharing is actually happening.
> 
OK that means Kimball should test the kernel with gsi_irq_sharing() and
without the compression code in mp_register_gsi() which should work for
him (and certainly for ES7000). I am not sure about VIA though, since
they still want PCI IRQs below 16. That means moving the VIA workaround
(and subsequently the one for Stratus :) to gsi_irq_sharing() I suspect.

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 16:42 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-04 16:42 UTC (permalink / raw)
  To: Brown, Len, Eric W. Biederman
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> 
> I think what we can do in the short term is to make these 
> workarounds not have any effect on the systems which don't 
> need them.  This means searching like gsi_irq_sharing() does, 
> instead of always compressing like mp_register_gsi() does.  
> It also means not printing dmesg about vector sharing when no 
> sharing is actually happening.
> 
OK that means Kimball should test the kernel with gsi_irq_sharing() and
without the compression code in mp_register_gsi() which should work for
him (and certainly for ES7000). I am not sure about VIA though, since
they still want PCI IRQs below 16. That means moving the VIA workaround
(and subsequently the one for Stratus :) to gsi_irq_sharing() I suspect.

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 16:04 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-04 16:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

 
>>>>> We should never have had a problem with un-connected 
>interrupt lines
>>>>> consuming vectors, as the vectors are handed out at run-time
>>>>> only when interrupts are requested by drivers.
>>>>
>>>>Incorrect.  By being requested by drivers I assume you mean by
>>>>request_irq.  assign_irq_vector is called long before request_irq
>>>>and in fact there is currently not a call from request_irq to
>>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>>io_apic.c
>>
>> Hmmm, on Natalie's ping, I looked at this again.
>>
>> setup_IO_APIC_irqs() actually consumes only 16 vectors.
>> This is because it only sets up the IRQs for pins
>> found with find_irq_entry(), which searches mp_irqs[]
>> which at this point contains only the legacy identify mappings.
>
>In the case of ACPI.  I think the mptable case has all of information
>in mp_irqs at that point.

Agreed, I just sent a note on this, but apparently it "crossed
in the mail" with yours.  The key point about MPS is that MPS
should not describe pins that can never be connected -- so that isn't
quite as bad as handing out a vector for every RTE, which is what the
code appears to do on first read...

>> The PNP code then takes a swing at things, and it registers
>> a handful of gsi's, but they are all duplicates of the legacy
>> mappings already set-up, so no additional vectors are consumed.
>>
>> So on a big system, the large quantity of vectors will not 
>be consumed
>> at
>> IOAPIC initialization time, but later at device probe time.
>> Not via request_irq(), but via pci_enable_device():
>>
>> pci_enable_device()
>>  pci_enable_device_bars()
>>   pcibios_enable_device()
>>    pcibios_enable_irq() -> acpi_pci_irq_enable()
>>     acpi_register_gsi()
>>      mp_register_gsi()
>>       io_apic_set_pci_routing()
>>        entry.vector = assign_irq_vector(irq)
>>
>> So except for the legacy IRQs, we are already allocating the vectors
>> on-demand, and that doesn't need to be fixed.
>
>I agree with the fact, we do allocate the vectors on-demand.
>Since the allocation is not allowed to fail, and because
>it seems to be an accident of the implementation rather
>than a deliberate implementation detail I still think it
>needs to be fixed so the code is less brittle.

Yeah, it isn't clear that this has any advantage over assigning
the vector at request_irq() time where one would expect to see it.
Though some might consider "currently working" an advantage:-)

>But if we are not afraid of breaking machines with more
>that 243 interrupt sources (which currently force ioapic/pin
>combinations to share irqs today) it does mean we can move
>the removal of the irq to gsi mapping up, in the patch series.  We
>first need to raise the limit on the number of IRQs on x86.

No, I don't think we have the license to intentionally break big
machines
that are currently working.  In the long run, these two big-machine
hacks should go away:

mp_register_gsi()
	/*
	 * For PCI devices, assign IRQs in order, avoiding gaps
	 * due to unused I/O APIC pins.
	 */
	...

io_apic_set_pci_routing() (x86_64 only upstream, i386 too on SuSE)
	irq = gsi_irq_sharing(irq)


I think what we can do in the short term is to make these workarounds
not have any effect on the systems which don't need them.  This means
searching like gsi_irq_sharing() does, instead of always compressing
like mp_register_gsi() does.  It also means not printing dmesg
about vector sharing when no sharing is actually happening.


>We can't raise the limit on the number of IRQs when msi is
>enabled but enabling CONFIG_PCI_MSI already has problem in this
>area and that is an independent fix.
>
>Since we unnecessarily break big machines I think this needs to sit
>in -mm until the window for 2.6.18 opens up.  By that time we should
>be able to get all of the pieces in to keep the largest of machines
>working well into the future.
>
>That includes fixing msi to allocate irqs, fixing the irq affinity
>calls to be race free, and making the vector allocation to be
>a per cpu allocation.
>
>I still think we need to do everything I outlined previously if
>for no other reason than to leave us with a maintainable code
>base that works for obvious reasons.  But since the dependencies
>are not as bad as they previously appeared, it does mean we can
>take the patches in a different order.

Based on past history of the un-intended impact of interrupt changes,
(eg. what started this thread)
I would suggest that only the simplest things go into 2.6.18
and that the larger changes stay in -mm for all of 2.6.18
and targtet 2.6.19.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 16:04 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-04 16:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

 
>>>>> We should never have had a problem with un-connected 
>interrupt lines
>>>>> consuming vectors, as the vectors are handed out at run-time
>>>>> only when interrupts are requested by drivers.
>>>>
>>>>Incorrect.  By being requested by drivers I assume you mean by
>>>>request_irq.  assign_irq_vector is called long before request_irq
>>>>and in fact there is currently not a call from request_irq to
>>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>>io_apic.c
>>
>> Hmmm, on Natalie's ping, I looked at this again.
>>
>> setup_IO_APIC_irqs() actually consumes only 16 vectors.
>> This is because it only sets up the IRQs for pins
>> found with find_irq_entry(), which searches mp_irqs[]
>> which at this point contains only the legacy identify mappings.
>
>In the case of ACPI.  I think the mptable case has all of information
>in mp_irqs at that point.

Agreed, I just sent a note on this, but apparently it "crossed
in the mail" with yours.  The key point about MPS is that MPS
should not describe pins that can never be connected -- so that isn't
quite as bad as handing out a vector for every RTE, which is what the
code appears to do on first read...

>> The PNP code then takes a swing at things, and it registers
>> a handful of gsi's, but they are all duplicates of the legacy
>> mappings already set-up, so no additional vectors are consumed.
>>
>> So on a big system, the large quantity of vectors will not 
>be consumed
>> at
>> IOAPIC initialization time, but later at device probe time.
>> Not via request_irq(), but via pci_enable_device():
>>
>> pci_enable_device()
>>  pci_enable_device_bars()
>>   pcibios_enable_device()
>>    pcibios_enable_irq() -> acpi_pci_irq_enable()
>>     acpi_register_gsi()
>>      mp_register_gsi()
>>       io_apic_set_pci_routing()
>>        entry.vector = assign_irq_vector(irq)
>>
>> So except for the legacy IRQs, we are already allocating the vectors
>> on-demand, and that doesn't need to be fixed.
>
>I agree with the fact, we do allocate the vectors on-demand.
>Since the allocation is not allowed to fail, and because
>it seems to be an accident of the implementation rather
>than a deliberate implementation detail I still think it
>needs to be fixed so the code is less brittle.

Yeah, it isn't clear that this has any advantage over assigning
the vector at request_irq() time where one would expect to see it.
Though some might consider "currently working" an advantage:-)

>But if we are not afraid of breaking machines with more
>that 243 interrupt sources (which currently force ioapic/pin
>combinations to share irqs today) it does mean we can move
>the removal of the irq to gsi mapping up, in the patch series.  We
>first need to raise the limit on the number of IRQs on x86.

No, I don't think we have the license to intentionally break big
machines
that are currently working.  In the long run, these two big-machine
hacks should go away:

mp_register_gsi()
	/*
	 * For PCI devices, assign IRQs in order, avoiding gaps
	 * due to unused I/O APIC pins.
	 */
	...

io_apic_set_pci_routing() (x86_64 only upstream, i386 too on SuSE)
	irq = gsi_irq_sharing(irq)


I think what we can do in the short term is to make these workarounds
not have any effect on the systems which don't need them.  This means
searching like gsi_irq_sharing() does, instead of always compressing
like mp_register_gsi() does.  It also means not printing dmesg
about vector sharing when no sharing is actually happening.


>We can't raise the limit on the number of IRQs when msi is
>enabled but enabling CONFIG_PCI_MSI already has problem in this
>area and that is an independent fix.
>
>Since we unnecessarily break big machines I think this needs to sit
>in -mm until the window for 2.6.18 opens up.  By that time we should
>be able to get all of the pieces in to keep the largest of machines
>working well into the future.
>
>That includes fixing msi to allocate irqs, fixing the irq affinity
>calls to be race free, and making the vector allocation to be
>a per cpu allocation.
>
>I still think we need to do everything I outlined previously if
>for no other reason than to leave us with a maintainable code
>base that works for obvious reasons.  But since the dependencies
>are not as bad as they previously appeared, it does mean we can
>take the patches in a different order.

Based on past history of the un-intended impact of interrupt changes,
(eg. what started this thread)
I would suggest that only the simplest things go into 2.6.18
and that the larger changes stay in -mm for all of 2.6.18
and targtet 2.6.19.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-04 16:04 ` Brown, Len
@ 2006-05-04 22:41   ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-04 22:41 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>In the case of ACPI.  I think the mptable case has all of information
>>in mp_irqs at that point.
>
> Agreed, I just sent a note on this, but apparently it "crossed
> in the mail" with yours.  The key point about MPS is that MPS
> should not describe pins that can never be connected -- so that isn't
> quite as bad as handing out a vector for every RTE, which is what the
> code appears to do on first read...

Makes sense.

>>I agree with the fact, we do allocate the vectors on-demand.
>>Since the allocation is not allowed to fail, and because
>>it seems to be an accident of the implementation rather
>>than a deliberate implementation detail I still think it
>>needs to be fixed so the code is less brittle.
>
> Yeah, it isn't clear that this has any advantage over assigning
> the vector at request_irq() time where one would expect to see it.
> Though some might consider "currently working" an advantage:-)

Right.  This will have to wait until I can start sending patches.

>>But if we are not afraid of breaking machines with more
>>that 243 interrupt sources (which currently force ioapic/pin
>>combinations to share irqs today) it does mean we can move
>>the removal of the irq to gsi mapping up, in the patch series.  We
>>first need to raise the limit on the number of IRQs on x86.
>
> No, I don't think we have the license to intentionally break big
> machines
> that are currently working.

No but it may be ok. At an intermediate step in a series of patches.
Although ideally even that would never happen.

> In the long run, these two big-machine
> hacks should go away:
>
> mp_register_gsi()
> 	/*
> 	 * For PCI devices, assign IRQs in order, avoiding gaps
> 	 * due to unused I/O APIC pins.
> 	 */
> 	...
>
> io_apic_set_pci_routing() (x86_64 only upstream, i386 too on SuSE)
> 	irq = gsi_irq_sharing(irq)
>
>
> I think what we can do in the short term is to make these workarounds
> not have any effect on the systems which don't need them.  This means
> searching like gsi_irq_sharing() does, instead of always compressing
> like mp_register_gsi() does.  It also means not printing dmesg
> about vector sharing when no sharing is actually happening.

I'm a long run kind of guy :)
As soon as I clean up my proof of concept code and send it
out I will have both of those killed.

> Based on past history of the un-intended impact of interrupt changes,
> (eg. what started this thread)
> I would suggest that only the simplest things go into 2.6.18
> and that the larger changes stay in -mm for all of 2.6.18
> and targtet 2.6.19.

That makes sense.  I'm in no hurry :)  Mostly my intention was
that this is not 2.6.17 material and whatever short term hacks
are needed to make 2.6.17 work  need to go in now.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 22:41   ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-04 22:41 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>In the case of ACPI.  I think the mptable case has all of information
>>in mp_irqs at that point.
>
> Agreed, I just sent a note on this, but apparently it "crossed
> in the mail" with yours.  The key point about MPS is that MPS
> should not describe pins that can never be connected -- so that isn't
> quite as bad as handing out a vector for every RTE, which is what the
> code appears to do on first read...

Makes sense.

>>I agree with the fact, we do allocate the vectors on-demand.
>>Since the allocation is not allowed to fail, and because
>>it seems to be an accident of the implementation rather
>>than a deliberate implementation detail I still think it
>>needs to be fixed so the code is less brittle.
>
> Yeah, it isn't clear that this has any advantage over assigning
> the vector at request_irq() time where one would expect to see it.
> Though some might consider "currently working" an advantage:-)

Right.  This will have to wait until I can start sending patches.

>>But if we are not afraid of breaking machines with more
>>that 243 interrupt sources (which currently force ioapic/pin
>>combinations to share irqs today) it does mean we can move
>>the removal of the irq to gsi mapping up, in the patch series.  We
>>first need to raise the limit on the number of IRQs on x86.
>
> No, I don't think we have the license to intentionally break big
> machines
> that are currently working.

No but it may be ok. At an intermediate step in a series of patches.
Although ideally even that would never happen.

> In the long run, these two big-machine
> hacks should go away:
>
> mp_register_gsi()
> 	/*
> 	 * For PCI devices, assign IRQs in order, avoiding gaps
> 	 * due to unused I/O APIC pins.
> 	 */
> 	...
>
> io_apic_set_pci_routing() (x86_64 only upstream, i386 too on SuSE)
> 	irq = gsi_irq_sharing(irq)
>
>
> I think what we can do in the short term is to make these workarounds
> not have any effect on the systems which don't need them.  This means
> searching like gsi_irq_sharing() does, instead of always compressing
> like mp_register_gsi() does.  It also means not printing dmesg
> about vector sharing when no sharing is actually happening.

I'm a long run kind of guy :)
As soon as I clean up my proof of concept code and send it
out I will have both of those killed.

> Based on past history of the un-intended impact of interrupt changes,
> (eg. what started this thread)
> I would suggest that only the simplest things go into 2.6.18
> and that the larger changes stay in -mm for all of 2.6.18
> and targtet 2.6.19.

That makes sense.  I'm in no hurry :)  Mostly my intention was
that this is not 2.6.17 material and whatever short term hacks
are needed to make 2.6.17 work  need to go in now.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 15:33 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-04 15:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

>>>> We should never have had a problem with un-connected 
>interrupt lines
>>>> consuming vectors, as the vectors are handed out at run-time
>>>> only when interrupts are requested by drivers.
>>>
>>>Incorrect.  By being requested by drivers I assume you mean by
>>>request_irq.  assign_irq_vector is called long before request_irq
>>>and in fact there is currently not a call from request_irq to
>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>io_apic.c
>
>Hmmm, on Natalie's ping, I looked at this again.
>
>setup_IO_APIC_irqs() actually consumes only 16 vectors.
>This is because it only sets up the IRQs for pins
>found with find_irq_entry(), which searches mp_irqs[]
>which at this point contains only the legacy identify mappings.

I should clarify, since Natalie pointed out that some systems
run in MPS mode...

If the system is booted with "acpi=off", then mp_irqs[] includes
everything advertised by the MPS tables, and so setup_IO_APIC_irqs()
does hand out vectors to un-occupied IOAPIC RTEs.  However, if the
MPS tables are accurate enough to skip RTEs that are not used
on that machine, then it hands out vectors for RTEs which
_could_ have devices on them.

On the ES7000 there are potentially lots of IOAPICS with pins
that can't possibly be used.
If only low-numbered RTEs were used on these IOAPICs,
then it would be simple to put in a hook to pretend the
upper entries do not exist, and compress the GSI name space.
But apparently they use some high numbered RTEs, so
we go ahead and allocate the full size of the IOAPIC
in what becomes a very sparse and > NR_IRQS GSI name space.

Of course if one is going to compress this name space,
it would be possible to put a platform dependent hook in
where a GSI is translated into an apic:pin pair...
This would be much cleaner than the current compression...

>The PNP code then takes a swing at things, and it registers
>a handful of gsi's, but they are all duplicates of the legacy
>mappings already set-up, so no additional vectors are consumed.
>
>So on a big system, the large quantity of vectors will not be 
>consumed at
>IOAPIC initialization time, but later at device probe time.
>Not via request_irq(), but via pci_enable_device():
>
>pci_enable_device()
> pci_enable_device_bars()
>  pcibios_enable_device()
>   pcibios_enable_irq() -> acpi_pci_irq_enable()
>    acpi_register_gsi()
>     mp_register_gsi()
>      io_apic_set_pci_routing()
>       entry.vector = assign_irq_vector(irq)

I should qualify this...
If the system is booted with "acpi=off" and MPS is used,
then mp_irqs[] includes everything that is in MPS, and thus
setup_IO_APIC_irqs()
>So except for the legacy IRQs, we are already allocating the vectors
>on-demand, and that doesn't need to be fixed.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 15:33 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-04 15:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

>>>> We should never have had a problem with un-connected 
>interrupt lines
>>>> consuming vectors, as the vectors are handed out at run-time
>>>> only when interrupts are requested by drivers.
>>>
>>>Incorrect.  By being requested by drivers I assume you mean by
>>>request_irq.  assign_irq_vector is called long before request_irq
>>>and in fact there is currently not a call from request_irq to
>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>io_apic.c
>
>Hmmm, on Natalie's ping, I looked at this again.
>
>setup_IO_APIC_irqs() actually consumes only 16 vectors.
>This is because it only sets up the IRQs for pins
>found with find_irq_entry(), which searches mp_irqs[]
>which at this point contains only the legacy identify mappings.

I should clarify, since Natalie pointed out that some systems
run in MPS mode...

If the system is booted with "acpi=off", then mp_irqs[] includes
everything advertised by the MPS tables, and so setup_IO_APIC_irqs()
does hand out vectors to un-occupied IOAPIC RTEs.  However, if the
MPS tables are accurate enough to skip RTEs that are not used
on that machine, then it hands out vectors for RTEs which
_could_ have devices on them.

On the ES7000 there are potentially lots of IOAPICS with pins
that can't possibly be used.
If only low-numbered RTEs were used on these IOAPICs,
then it would be simple to put in a hook to pretend the
upper entries do not exist, and compress the GSI name space.
But apparently they use some high numbered RTEs, so
we go ahead and allocate the full size of the IOAPIC
in what becomes a very sparse and > NR_IRQS GSI name space.

Of course if one is going to compress this name space,
it would be possible to put a platform dependent hook in
where a GSI is translated into an apic:pin pair...
This would be much cleaner than the current compression...

>The PNP code then takes a swing at things, and it registers
>a handful of gsi's, but they are all duplicates of the legacy
>mappings already set-up, so no additional vectors are consumed.
>
>So on a big system, the large quantity of vectors will not be 
>consumed at
>IOAPIC initialization time, but later at device probe time.
>Not via request_irq(), but via pci_enable_device():
>
>pci_enable_device()
> pci_enable_device_bars()
>  pcibios_enable_device()
>   pcibios_enable_irq() -> acpi_pci_irq_enable()
>    acpi_register_gsi()
>     mp_register_gsi()
>      io_apic_set_pci_routing()
>       entry.vector = assign_irq_vector(irq)

I should qualify this...
If the system is booted with "acpi=off" and MPS is used,
then mp_irqs[] includes everything that is in MPS, and thus
setup_IO_APIC_irqs()
>So except for the legacy IRQs, we are already allocating the vectors
>on-demand, and that doesn't need to be fixed.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04  5:07 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-04  5:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

>>> We should never have had a problem with un-connected interrupt lines
>>> consuming vectors, as the vectors are handed out at run-time
>>> only when interrupts are requested by drivers.
>>
>>Incorrect.  By being requested by drivers I assume you mean by
>>request_irq.  assign_irq_vector is called long before request_irq
>>and in fact there is currently not a call from request_irq to
>>assign_irq_vector.  Look where assign_irq_vector is called in 
>io_apic.c

Hmmm, on Natalie's ping, I looked at this again.

setup_IO_APIC_irqs() actually consumes only 16 vectors.
This is because it only sets up the IRQs for pins
found with find_irq_entry(), which searches mp_irqs[]
which at this point contains only the legacy identify mappings.

The PNP code then takes a swing at things, and it registers
a handful of gsi's, but they are all duplicates of the legacy
mappings already set-up, so no additional vectors are consumed.

So on a big system, the large quantity of vectors will not be consumed
at
IOAPIC initialization time, but later at device probe time.
Not via request_irq(), but via pci_enable_device():

pci_enable_device()
 pci_enable_device_bars()
  pcibios_enable_device()
   pcibios_enable_irq() -> acpi_pci_irq_enable()
    acpi_register_gsi()
     mp_register_gsi()
      io_apic_set_pci_routing()
       entry.vector = assign_irq_vector(irq)

So except for the legacy IRQs, we are already allocating the vectors
on-demand, and that doesn't need to be fixed.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04  5:07 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-04  5:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

>>> We should never have had a problem with un-connected interrupt lines
>>> consuming vectors, as the vectors are handed out at run-time
>>> only when interrupts are requested by drivers.
>>
>>Incorrect.  By being requested by drivers I assume you mean by
>>request_irq.  assign_irq_vector is called long before request_irq
>>and in fact there is currently not a call from request_irq to
>>assign_irq_vector.  Look where assign_irq_vector is called in 
>io_apic.c

Hmmm, on Natalie's ping, I looked at this again.

setup_IO_APIC_irqs() actually consumes only 16 vectors.
This is because it only sets up the IRQs for pins
found with find_irq_entry(), which searches mp_irqs[]
which at this point contains only the legacy identify mappings.

The PNP code then takes a swing at things, and it registers
a handful of gsi's, but they are all duplicates of the legacy
mappings already set-up, so no additional vectors are consumed.

So on a big system, the large quantity of vectors will not be consumed
at
IOAPIC initialization time, but later at device probe time.
Not via request_irq(), but via pci_enable_device():

pci_enable_device()
 pci_enable_device_bars()
  pcibios_enable_device()
   pcibios_enable_irq() -> acpi_pci_irq_enable()
    acpi_register_gsi()
     mp_register_gsi()
      io_apic_set_pci_routing()
       entry.vector = assign_irq_vector(irq)

So except for the legacy IRQs, we are already allocating the vectors
on-demand, and that doesn't need to be fixed.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-04  5:07 ` Brown, Len
@ 2006-05-04 15:31   ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-04 15:31 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>>> We should never have had a problem with un-connected interrupt lines
>>>> consuming vectors, as the vectors are handed out at run-time
>>>> only when interrupts are requested by drivers.
>>>
>>>Incorrect.  By being requested by drivers I assume you mean by
>>>request_irq.  assign_irq_vector is called long before request_irq
>>>and in fact there is currently not a call from request_irq to
>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>io_apic.c
>
> Hmmm, on Natalie's ping, I looked at this again.
>
> setup_IO_APIC_irqs() actually consumes only 16 vectors.
> This is because it only sets up the IRQs for pins
> found with find_irq_entry(), which searches mp_irqs[]
> which at this point contains only the legacy identify mappings.

In the case of ACPI.  I think the mptable case has all of information
in mp_irqs at that point.

> The PNP code then takes a swing at things, and it registers
> a handful of gsi's, but they are all duplicates of the legacy
> mappings already set-up, so no additional vectors are consumed.
>
> So on a big system, the large quantity of vectors will not be consumed
> at
> IOAPIC initialization time, but later at device probe time.
> Not via request_irq(), but via pci_enable_device():
>
> pci_enable_device()
>  pci_enable_device_bars()
>   pcibios_enable_device()
>    pcibios_enable_irq() -> acpi_pci_irq_enable()
>     acpi_register_gsi()
>      mp_register_gsi()
>       io_apic_set_pci_routing()
>        entry.vector = assign_irq_vector(irq)
>
> So except for the legacy IRQs, we are already allocating the vectors
> on-demand, and that doesn't need to be fixed.

I agree with the fact, we do allocate the vectors on-demand.
Since the allocation is not allowed to fail, and because
it seems to be an accident of the implementation rather
than a deliberate implementation detail I still think it
needs to be fixed so the code is less brittle.

But if we are not afraid of breaking machines with more
that 243 interrupt sources (which currently force ioapic/pin
combinations to share irqs today) it does mean we can move
the removal of the irq to gsi mapping up, in the patch series.  We
first need to raise the limit on the number of IRQs on x86.

We can't raise the limit on the number of IRQs when msi is
enabled but enabling CONFIG_PCI_MSI already has problem in this
area and that is an independent fix.

Since we unnecessarily break big machines I think this needs to sit
in -mm until the window for 2.6.18 opens up.  By that time we should
be able to get all of the pieces in to keep the largest of machines
working well into the future.

That includes fixing msi to allocate irqs, fixing the irq affinity
calls to be race free, and making the vector allocation to be
a per cpu allocation.

I still think we need to do everything I outlined previously if
for no other reason than to leave us with a maintainable code
base that works for obvious reasons.  But since the dependencies
are not as bad as they previously appeared, it does mean we can
take the patches in a different order.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-04 15:31   ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-04 15:31 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>>> We should never have had a problem with un-connected interrupt lines
>>>> consuming vectors, as the vectors are handed out at run-time
>>>> only when interrupts are requested by drivers.
>>>
>>>Incorrect.  By being requested by drivers I assume you mean by
>>>request_irq.  assign_irq_vector is called long before request_irq
>>>and in fact there is currently not a call from request_irq to
>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>io_apic.c
>
> Hmmm, on Natalie's ping, I looked at this again.
>
> setup_IO_APIC_irqs() actually consumes only 16 vectors.
> This is because it only sets up the IRQs for pins
> found with find_irq_entry(), which searches mp_irqs[]
> which at this point contains only the legacy identify mappings.

In the case of ACPI.  I think the mptable case has all of information
in mp_irqs at that point.

> The PNP code then takes a swing at things, and it registers
> a handful of gsi's, but they are all duplicates of the legacy
> mappings already set-up, so no additional vectors are consumed.
>
> So on a big system, the large quantity of vectors will not be consumed
> at
> IOAPIC initialization time, but later at device probe time.
> Not via request_irq(), but via pci_enable_device():
>
> pci_enable_device()
>  pci_enable_device_bars()
>   pcibios_enable_device()
>    pcibios_enable_irq() -> acpi_pci_irq_enable()
>     acpi_register_gsi()
>      mp_register_gsi()
>       io_apic_set_pci_routing()
>        entry.vector = assign_irq_vector(irq)
>
> So except for the legacy IRQs, we are already allocating the vectors
> on-demand, and that doesn't need to be fixed.

I agree with the fact, we do allocate the vectors on-demand.
Since the allocation is not allowed to fail, and because
it seems to be an accident of the implementation rather
than a deliberate implementation detail I still think it
needs to be fixed so the code is less brittle.

But if we are not afraid of breaking machines with more
that 243 interrupt sources (which currently force ioapic/pin
combinations to share irqs today) it does mean we can move
the removal of the irq to gsi mapping up, in the patch series.  We
first need to raise the limit on the number of IRQs on x86.

We can't raise the limit on the number of IRQs when msi is
enabled but enabling CONFIG_PCI_MSI already has problem in this
area and that is an independent fix.

Since we unnecessarily break big machines I think this needs to sit
in -mm until the window for 2.6.18 opens up.  By that time we should
be able to get all of the pieces in to keep the largest of machines
working well into the future.

That includes fixing msi to allocate irqs, fixing the irq affinity
calls to be race free, and making the vector allocation to be
a per cpu allocation.

I still think we need to do everything I outlined previously if
for no other reason than to leave us with a maintainable code
base that works for obvious reasons.  But since the dependencies
are not as bad as they previously appeared, it does mean we can
take the patches in a different order.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-02 23:52 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-02 23:52 UTC (permalink / raw)
  To: Eric W. Biederman, Andi Kleen, Brown, Len, akpm
  Cc: sergio, Kimball Murray, linux-kernel, kmurray, linux-acpi

> On Tuesday 02 May 2006 09:41, Brown, Len wrote:
> 
> > You are right.  This code is wrong.
> > It makes absolutely no sense to reserve vectors in advance 
> for every 
> > RTE in the IOAPIC when we don't even know if they are going to be 
> > used.
> > 
> > This is clearly a holdover from the early IOAPIC/MPS days 
> when we were 
> > talking about 4 to 8 non-legacy RTEs.
> 
> Yes I agree. A lot of the IO-APIC code could probably need 
> some renovation.
>  
> > This is where the big system vector shortage problem should be 
> > addressed.
> 
> If we go to per CPU IDTs it will be much less pressing, but 
> still a good idea.
> 

I've been trying to put together something on per-CPU IDTs also, and got
entangled in all those issues that you just discussed... Dynamically
assigning vectors to IRQs sounds like great improvement of the mechanism
(why would they need to be static really), and vectors are right
entities to manipulate with - while keeping IRQs stable and representing
the interrupts sources clearly to the user.
Len, I suppose it would make sense to accept the Kimball's timer fix for
now, before the whole re-implementation happens?

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-02 23:52 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-05-02 23:52 UTC (permalink / raw)
  To: Eric W. Biederman, Andi Kleen, Brown, Len, akpm
  Cc: sergio, Kimball Murray, linux-kernel, kmurray, linux-acpi

> On Tuesday 02 May 2006 09:41, Brown, Len wrote:
> 
> > You are right.  This code is wrong.
> > It makes absolutely no sense to reserve vectors in advance 
> for every 
> > RTE in the IOAPIC when we don't even know if they are going to be 
> > used.
> > 
> > This is clearly a holdover from the early IOAPIC/MPS days 
> when we were 
> > talking about 4 to 8 non-legacy RTEs.
> 
> Yes I agree. A lot of the IO-APIC code could probably need 
> some renovation.
>  
> > This is where the big system vector shortage problem should be 
> > addressed.
> 
> If we go to per CPU IDTs it will be much less pressing, but 
> still a good idea.
> 

I've been trying to put together something on per-CPU IDTs also, and got
entangled in all those issues that you just discussed... Dynamically
assigning vectors to IRQs sounds like great improvement of the mechanism
(why would they need to be static really), and vectors are right
entities to manipulate with - while keeping IRQs stable and representing
the interrupts sources clearly to the user.
Len, I suppose it would make sense to accept the Kimball's timer fix for
now, before the whole re-implementation happens?

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-02  7:41 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-02  7:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

>IRQ is an interrupt request, from a device.
>Currently they can come in over a pin, or over a packet.
>IRQs at the source can be shared.

Well stated.

>For an IRQ number it is just an enumeration of interrupt sources.
>Ideally in a stable manner so that the assigned number does not
>depend on compile options, or which version of the kernel you
>are running.

While desirable, the "stable" part is not guaranteed.
On an ACPI system with PCI Interrupt Link devices,
request_irq() goes into the pci-enable-device path,
which ends up asking ACPI what interrupt pin this device uses.
But the pin is programmable, so we pick one from the list
of possible, program the router accordingly, and this is what
gets returned to request_irq().  ACPI PCI Interrupt Link Devices,
of course, are just a wrapper for what x86 legacy calls PIRQ routers.

In practice, if you boot the same or similar kernel on the
same configuration, you get 'stable', but if something changes
such as enabling/disabling a device or loading an additional device
driver
or otherwise tinkering with probe order, then all bets are off,
since the sequence of selecting and
balancing possible device interrupt lines between available IRQs has
changed.

>> Re: irq making sense
>> Depends on the defintion, I suppose.  I'm still looking for one
>> that matches how Linux uses the term.
>
>The global irq_desc array is the core of any definition.
>That is how linux knows about irqs.  Beyond that we ideally
>have stable numbers across reboots, and recompiles, and hardware
>addition and removal.  Stable numbers are not really possible
>but we can come quite close.

Okay, I'm good with irq_desc, Linux's generic IRQ definition,
being the center of the universe.  I'm also good with the index
into irq_desc[] being the "IRQ number" show in /proc/interrupts.
I guess I was thinking from a HW-specific point of view this
afternoon...

I don't like vector numbers being exposed in /proc/interrupts
on x86.  On an ACPI-enabled system, I think that they should be
the GSI.  This is consistent with the 0-15 legacy definitions
being pin numbers (which I think we must keep), and consistent
with the 16-n numbers being IOAPIC pin numbers (+offset),
(which I think we should have kept).

Of course, to do this simply, we'd have to select the irq_desc[i]
with i = gsi, and not i = vector.

>> We should never have had a problem with un-connected interrupt lines
>> consuming vectors, as the vectors are handed out at run-time
>> only when interrupts are requested by drivers.
>
>Incorrect.  By being requested by drivers I assume you mean by
>request_irq.  assign_irq_vector is called long before request_irq
>and in fact there is currently not a call from request_irq to
>assign_irq_vector.  Look where assign_irq_vector is called in io_apic.c

You are right.  This code is wrong.
It makes absolutely no sense to reserve vectors in advance
for every RTE in the IOAPIC when we don't even know if they
are going to be used.

This is clearly a holdover from the early IOAPIC/MPS days
when we were talking about 4 to 8 non-legacy RTEs.

This is where the big system vector shortage problem
should be addressed.

>I think there is a legitimate case for legacy edge triggered
>interrupts to request a vector sooner, as there are so many races in
>the enable/disable paths. 

I don't follow you on this part.

>>>Problem 2.
>>>  Some systems have more than 224 GSIs that are actually 
>>>  connected to devices.
>>>  There are three possible ways to handle this case. 
>>>  - Fail after we run out of vectors.
>>>  - Share a vector.
>>>
>>>  - Allow vectors of different cpus to handle different irqs.
>>>    The is the most elegant and scalable, and Natalie's suggestion.
>>
>> So here we allow the same vector to be used by different IOAPICS,
>> or IOAPIC pins, but have them it directed to different CPUs
>> who have per-cpu tables to vector to different devices?
>>
>> A practical workaround?  Certainly.
>> An elegant solution?  No, that would require better hardware;-)
>
>Agreed. But we lost elegant at the point of on demand assigning of
>vectors to irqs.
>
>> The problem with this workaround is going to be choose a policy
>> of where to direct what, and how to move things if interrupts
>> become un-balanced.
>
>The directing problem already exists.  In general an I/O apic can
>only direct an irq at a specific cpu, and linux already supports
>cpusets on which an irq may be delivered.
>
>But yes on systems with 8 or fewer cpus the I/O apics can do the
>directing themselves and it is likely we still want to handle that
>case.

I don't think it is important to optimize for <= 8 CPUs
having hundreds of unique interrupts.  Systems with huge
numbers of interrutps will have more than 8 CPUs.  If they
don't, then the should function, but being optimal isn't
what the administrator had in mind.  With multiple cores
being added to systems, this becomes even more true
over time.

>>>So what would be a path to get there from here?
>>>- Fix the irq set_affinity code so that it makes the changes to
>>>  irqs when the interrupt is actually disabled.
>>>  Calling desc->handler->disable, desc->handler->enable
>>>  does not work immediately so the current code is racy.
>>>  Which shows up very badly if you attempt to change the irq vector,
>>>  and may cause rare problems today
>>>
>>>- Modify the MSI code to allocate irqs and not vectors.
>>>  So we don't have two paths through the code, for no good reason.
>>
>> Modify the MSI code to allocate "something to do
>> with a specific interrupt".  Hmmm, but it does this already:-)
>>
>> Seriously, aren't the interrupt vectors, or perhaps the
>> vector/cpu pair, the fundamental resource being allocated here?
>
>So there are two pieces.  struct irq_desc that describes
>an interrupt source, and the vector that catches the interrupt
>source and tells the cpu about it.  
> 
>To be able to manage an irq source we need to enumerate and give it a
>struct irq_desc entry in the global irq_desc array.  One of the
>operations we perform after that point is to allocate it a vector so
>we can catch that irq source.  But other operations such as reporting,
>and the assignment of cpu affinity also happen based on the entry
>in the irq_desc array.
>
>Except for architecture implementations of irq routing the vectors
>that we catch irqs with should be completely invisible.  The ideal
>architecture would not even have a separate concept and except for
>the msi code linux has not exported the concept of vectors outside
>of the architecture implementations.

Agreed.

>> what about per/irq kstats?
>> would you keep stats on a per-vector basis
>> and translate back to irqs for /proc/interrupts?
>
>No.  What we have on a per irq basis is what makes sense
>and I would leave that untouched.

Agreed.

>> what about when the same vector is independently
>> used by multiple CPUS?
>
>Accounting by irq makes sense.

Agreed.

>>>- Remove the current irq compression.
>>
>> maybe this can be moved up in the to-do list?
>
>Not much.  
>- The current set_affinity code has a real bug.
>- The MSI mismatch is worse and fixing it reduces testing later. 
>- We need to raise the number of irqs we can support.  Either
>  by vastly increasing the number of stubs, whose number looks
>  hard to control at compile time, or by turning this into
>  a data problem.
>
>>>- Move irq vector allocation/deallocation into request_irq, 
>>>and free_irq.
>>>
>>>- Make vector_irq per_cpu.
>>>
>>>- Modify assign_irq to allocate different vectors on different cpus,
>>>  to fail if we cannot find a free irq vector somewhere in the
>>>  irqs cpu set, and call assign_irq from set_affinity so 
>when we change
>>>  cpus we can allocate a different irq.
>>>
>>>I have proof of concept level patches for everything thing 
>>>except making MSI actually allocate irqs, but that should be straight
>> forward.
>>
>> except that irqs should be allocating vectors,
>> the fundamental currency here, and MSI should
>> be allocating the same, no?
>
>Not quite.  The MSI code should first allocate
>an irq_desc entry get an IRQ number.

You're right.  I agree.

>Then when the driver calls request_irq the architecture specific
>code should arrange for __do_IRQ to be called with the appropriate
>irq number when the interrupt happens.  The architecture specific
>code needs to do this by setting the MSI address and the MSI data to
>an architecture specific values, that on x86 will involve the a
>vector.
>

>It is reassuring to see that the way I want to make the code work tends
>to be they way you already think the code works.  I can't be too far
>off in reducing the complexity.

Yeah, thanks for pointing out that the vectors are assigned at
IOAPIC init-time.  I hadn't noticed that, and it needs to be addressed.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-02  7:41 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-02  7:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

>IRQ is an interrupt request, from a device.
>Currently they can come in over a pin, or over a packet.
>IRQs at the source can be shared.

Well stated.

>For an IRQ number it is just an enumeration of interrupt sources.
>Ideally in a stable manner so that the assigned number does not
>depend on compile options, or which version of the kernel you
>are running.

While desirable, the "stable" part is not guaranteed.
On an ACPI system with PCI Interrupt Link devices,
request_irq() goes into the pci-enable-device path,
which ends up asking ACPI what interrupt pin this device uses.
But the pin is programmable, so we pick one from the list
of possible, program the router accordingly, and this is what
gets returned to request_irq().  ACPI PCI Interrupt Link Devices,
of course, are just a wrapper for what x86 legacy calls PIRQ routers.

In practice, if you boot the same or similar kernel on the
same configuration, you get 'stable', but if something changes
such as enabling/disabling a device or loading an additional device
driver
or otherwise tinkering with probe order, then all bets are off,
since the sequence of selecting and
balancing possible device interrupt lines between available IRQs has
changed.

>> Re: irq making sense
>> Depends on the defintion, I suppose.  I'm still looking for one
>> that matches how Linux uses the term.
>
>The global irq_desc array is the core of any definition.
>That is how linux knows about irqs.  Beyond that we ideally
>have stable numbers across reboots, and recompiles, and hardware
>addition and removal.  Stable numbers are not really possible
>but we can come quite close.

Okay, I'm good with irq_desc, Linux's generic IRQ definition,
being the center of the universe.  I'm also good with the index
into irq_desc[] being the "IRQ number" show in /proc/interrupts.
I guess I was thinking from a HW-specific point of view this
afternoon...

I don't like vector numbers being exposed in /proc/interrupts
on x86.  On an ACPI-enabled system, I think that they should be
the GSI.  This is consistent with the 0-15 legacy definitions
being pin numbers (which I think we must keep), and consistent
with the 16-n numbers being IOAPIC pin numbers (+offset),
(which I think we should have kept).

Of course, to do this simply, we'd have to select the irq_desc[i]
with i = gsi, and not i = vector.

>> We should never have had a problem with un-connected interrupt lines
>> consuming vectors, as the vectors are handed out at run-time
>> only when interrupts are requested by drivers.
>
>Incorrect.  By being requested by drivers I assume you mean by
>request_irq.  assign_irq_vector is called long before request_irq
>and in fact there is currently not a call from request_irq to
>assign_irq_vector.  Look where assign_irq_vector is called in io_apic.c

You are right.  This code is wrong.
It makes absolutely no sense to reserve vectors in advance
for every RTE in the IOAPIC when we don't even know if they
are going to be used.

This is clearly a holdover from the early IOAPIC/MPS days
when we were talking about 4 to 8 non-legacy RTEs.

This is where the big system vector shortage problem
should be addressed.

>I think there is a legitimate case for legacy edge triggered
>interrupts to request a vector sooner, as there are so many races in
>the enable/disable paths. 

I don't follow you on this part.

>>>Problem 2.
>>>  Some systems have more than 224 GSIs that are actually 
>>>  connected to devices.
>>>  There are three possible ways to handle this case. 
>>>  - Fail after we run out of vectors.
>>>  - Share a vector.
>>>
>>>  - Allow vectors of different cpus to handle different irqs.
>>>    The is the most elegant and scalable, and Natalie's suggestion.
>>
>> So here we allow the same vector to be used by different IOAPICS,
>> or IOAPIC pins, but have them it directed to different CPUs
>> who have per-cpu tables to vector to different devices?
>>
>> A practical workaround?  Certainly.
>> An elegant solution?  No, that would require better hardware;-)
>
>Agreed. But we lost elegant at the point of on demand assigning of
>vectors to irqs.
>
>> The problem with this workaround is going to be choose a policy
>> of where to direct what, and how to move things if interrupts
>> become un-balanced.
>
>The directing problem already exists.  In general an I/O apic can
>only direct an irq at a specific cpu, and linux already supports
>cpusets on which an irq may be delivered.
>
>But yes on systems with 8 or fewer cpus the I/O apics can do the
>directing themselves and it is likely we still want to handle that
>case.

I don't think it is important to optimize for <= 8 CPUs
having hundreds of unique interrupts.  Systems with huge
numbers of interrutps will have more than 8 CPUs.  If they
don't, then the should function, but being optimal isn't
what the administrator had in mind.  With multiple cores
being added to systems, this becomes even more true
over time.

>>>So what would be a path to get there from here?
>>>- Fix the irq set_affinity code so that it makes the changes to
>>>  irqs when the interrupt is actually disabled.
>>>  Calling desc->handler->disable, desc->handler->enable
>>>  does not work immediately so the current code is racy.
>>>  Which shows up very badly if you attempt to change the irq vector,
>>>  and may cause rare problems today
>>>
>>>- Modify the MSI code to allocate irqs and not vectors.
>>>  So we don't have two paths through the code, for no good reason.
>>
>> Modify the MSI code to allocate "something to do
>> with a specific interrupt".  Hmmm, but it does this already:-)
>>
>> Seriously, aren't the interrupt vectors, or perhaps the
>> vector/cpu pair, the fundamental resource being allocated here?
>
>So there are two pieces.  struct irq_desc that describes
>an interrupt source, and the vector that catches the interrupt
>source and tells the cpu about it.  
> 
>To be able to manage an irq source we need to enumerate and give it a
>struct irq_desc entry in the global irq_desc array.  One of the
>operations we perform after that point is to allocate it a vector so
>we can catch that irq source.  But other operations such as reporting,
>and the assignment of cpu affinity also happen based on the entry
>in the irq_desc array.
>
>Except for architecture implementations of irq routing the vectors
>that we catch irqs with should be completely invisible.  The ideal
>architecture would not even have a separate concept and except for
>the msi code linux has not exported the concept of vectors outside
>of the architecture implementations.

Agreed.

>> what about per/irq kstats?
>> would you keep stats on a per-vector basis
>> and translate back to irqs for /proc/interrupts?
>
>No.  What we have on a per irq basis is what makes sense
>and I would leave that untouched.

Agreed.

>> what about when the same vector is independently
>> used by multiple CPUS?
>
>Accounting by irq makes sense.

Agreed.

>>>- Remove the current irq compression.
>>
>> maybe this can be moved up in the to-do list?
>
>Not much.  
>- The current set_affinity code has a real bug.
>- The MSI mismatch is worse and fixing it reduces testing later. 
>- We need to raise the number of irqs we can support.  Either
>  by vastly increasing the number of stubs, whose number looks
>  hard to control at compile time, or by turning this into
>  a data problem.
>
>>>- Move irq vector allocation/deallocation into request_irq, 
>>>and free_irq.
>>>
>>>- Make vector_irq per_cpu.
>>>
>>>- Modify assign_irq to allocate different vectors on different cpus,
>>>  to fail if we cannot find a free irq vector somewhere in the
>>>  irqs cpu set, and call assign_irq from set_affinity so 
>when we change
>>>  cpus we can allocate a different irq.
>>>
>>>I have proof of concept level patches for everything thing 
>>>except making MSI actually allocate irqs, but that should be straight
>> forward.
>>
>> except that irqs should be allocating vectors,
>> the fundamental currency here, and MSI should
>> be allocating the same, no?
>
>Not quite.  The MSI code should first allocate
>an irq_desc entry get an IRQ number.

You're right.  I agree.

>Then when the driver calls request_irq the architecture specific
>code should arrange for __do_IRQ to be called with the appropriate
>irq number when the interrupt happens.  The architecture specific
>code needs to do this by setting the MSI address and the MSI data to
>an architecture specific values, that on x86 will involve the a
>vector.
>

>It is reassuring to see that the way I want to make the code work tends
>to be they way you already think the code works.  I can't be too far
>off in reducing the complexity.

Yeah, thanks for pointing out that the vectors are assigned at
IOAPIC init-time.  I hadn't noticed that, and it needs to be addressed.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-02  7:41 ` Brown, Len
  (?)
@ 2006-05-02  7:46 ` Andi Kleen
  -1 siblings, 0 replies; 67+ messages in thread
From: Andi Kleen @ 2006-05-02  7:46 UTC (permalink / raw)
  To: Brown, Len
  Cc: Eric W. Biederman, Protasevich, Natalie, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

On Tuesday 02 May 2006 09:41, Brown, Len wrote:

> You are right.  This code is wrong.
> It makes absolutely no sense to reserve vectors in advance
> for every RTE in the IOAPIC when we don't even know if they
> are going to be used.
> 
> This is clearly a holdover from the early IOAPIC/MPS days
> when we were talking about 4 to 8 non-legacy RTEs.

Yes I agree. A lot of the IO-APIC code could probably
need some renovation.
 
> This is where the big system vector shortage problem
> should be addressed.

If we go to per CPU IDTs it will be much less pressing, but
still a good idea.

-Andi

P.S.: There seems to be a lot of confusion about all this.
Maybe it would make sense to do a write up defining all the terms
and stick it into Documentation/* ? 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-02  7:41 ` Brown, Len
@ 2006-05-02  8:33   ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-02  8:33 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>IRQ is an interrupt request, from a device.
>>Currently they can come in over a pin, or over a packet.
>>IRQs at the source can be shared.
>
> Well stated.
>
>>For an IRQ number it is just an enumeration of interrupt sources.
>>Ideally in a stable manner so that the assigned number does not
>>depend on compile options, or which version of the kernel you
>>are running.
>
> While desirable, the "stable" part is not guaranteed.
> On an ACPI system with PCI Interrupt Link devices,
> request_irq() goes into the pci-enable-device path,
> which ends up asking ACPI what interrupt pin this device uses.
> But the pin is programmable, so we pick one from the list
> of possible, program the router accordingly, and this is what
> gets returned to request_irq().  ACPI PCI Interrupt Link Devices,
> of course, are just a wrapper for what x86 legacy calls PIRQ routers.
>
> In practice, if you boot the same or similar kernel on the
> same configuration, you get 'stable', but if something changes
> such as enabling/disabling a device or loading an additional device
> driver
> or otherwise tinkering with probe order, then all bets are off,
> since the sequence of selecting and
> balancing possible device interrupt lines between available IRQs has
> changed.

Agreed in the general case there is nothing we can do.  But
with a reasonable degree of stability things will stay put
long enough for people to reason about them.

Although I would actually expect us to use the normal twiddle of
irq lines on a bus.  But if we can have less contention that is certainly
better.

>>The global irq_desc array is the core of any definition.
>>That is how linux knows about irqs.  Beyond that we ideally
>>have stable numbers across reboots, and recompiles, and hardware
>>addition and removal.  Stable numbers are not really possible
>>but we can come quite close.
>
> Okay, I'm good with irq_desc, Linux's generic IRQ definition,
> being the center of the universe.  I'm also good with the index
> into irq_desc[] being the "IRQ number" show in /proc/interrupts.
> I guess I was thinking from a HW-specific point of view this
> afternoon...

I had to stare at the code for a while to see this as well.

> I don't like vector numbers being exposed in /proc/interrupts
> on x86.  On an ACPI-enabled system, I think that they should be
> the GSI.  This is consistent with the 0-15 legacy definitions
> being pin numbers (which I think we must keep), and consistent
> with the 16-n numbers being IOAPIC pin numbers (+offset),
> (which I think we should have kept).

Agreed.

> Of course, to do this simply, we'd have to select the irq_desc[i]
> with i = gsi, and not i = vector.

Exactly.

>>> We should never have had a problem with un-connected interrupt lines
>>> consuming vectors, as the vectors are handed out at run-time
>>> only when interrupts are requested by drivers.
>>
>>Incorrect.  By being requested by drivers I assume you mean by
>>request_irq.  assign_irq_vector is called long before request_irq
>>and in fact there is currently not a call from request_irq to
>>assign_irq_vector.  Look where assign_irq_vector is called in io_apic.c
>
> You are right.  This code is wrong.
> It makes absolutely no sense to reserve vectors in advance
> for every RTE in the IOAPIC when we don't even know if they
> are going to be used.
>
> This is clearly a holdover from the early IOAPIC/MPS days
> when we were talking about 4 to 8 non-legacy RTEs.
>
> This is where the big system vector shortage problem
> should be addressed.

Yes.  Natalie mentioned that there are some IBM systems
that still run out of processor vectors with our current
irq collapsing.

>>I think there is a legitimate case for legacy edge triggered
>>interrupts to request a vector sooner, as there are so many races in
>>the enable/disable paths. 
>
> I don't follow you on this part.

For edge triggered legacy irqs coming off the i8259 it probably makes
sense to assign them vectors early like we do currently.

I believe although I am not certain that with edge triggered irqs
actually disabling them makes it easy to miss edges.

>>
>>> The problem with this workaround is going to be choose a policy
>>> of where to direct what, and how to move things if interrupts
>>> become un-balanced.
>>
>>The directing problem already exists.  In general an I/O apic can
>>only direct an irq at a specific cpu, and linux already supports
>>cpusets on which an irq may be delivered.
>>
>>But yes on systems with 8 or fewer cpus the I/O apics can do the
>>directing themselves and it is likely we still want to handle that
>>case.
>
> I don't think it is important to optimize for <= 8 CPUs
> having hundreds of unique interrupts.  Systems with huge
> numbers of interrutps will have more than 8 CPUs.  If they
> don't, then the should function, but being optimal isn't
> what the administrator had in mind.  With multiple cores
> being added to systems, this becomes even more true
> over time.

Sorry. To be clear for systems with <= 8 CPUs we need to allocate
a common vector for all 8 cpus at once if we want to use
flat mode and this makes a version of assign_irq_vector that can work
assign per cpu vectors more complicated, since it has to handle both
cases.  Hmm, should assign_irq_vector become a ioapic method?

I don't think we want to loose the optimization of letting the
hardware select select the cpu on small systems.

>>It is reassuring to see that the way I want to make the code work tends
>>to be they way you already think the code works.  I can't be too far
>>off in reducing the complexity.
>
> Yeah, thanks for pointing out that the vectors are assigned at
> IOAPIC init-time.  I hadn't noticed that, and it needs to be addressed.

Exactly.  Once that is fixed setting gsi == irq should be trivial.
Then we only have to worry about big systems that actually use
more than 244 IRQs.  Which is where the per cpu vector assignment
comes in.

Eric



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-02  8:33   ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-02  8:33 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>IRQ is an interrupt request, from a device.
>>Currently they can come in over a pin, or over a packet.
>>IRQs at the source can be shared.
>
> Well stated.
>
>>For an IRQ number it is just an enumeration of interrupt sources.
>>Ideally in a stable manner so that the assigned number does not
>>depend on compile options, or which version of the kernel you
>>are running.
>
> While desirable, the "stable" part is not guaranteed.
> On an ACPI system with PCI Interrupt Link devices,
> request_irq() goes into the pci-enable-device path,
> which ends up asking ACPI what interrupt pin this device uses.
> But the pin is programmable, so we pick one from the list
> of possible, program the router accordingly, and this is what
> gets returned to request_irq().  ACPI PCI Interrupt Link Devices,
> of course, are just a wrapper for what x86 legacy calls PIRQ routers.
>
> In practice, if you boot the same or similar kernel on the
> same configuration, you get 'stable', but if something changes
> such as enabling/disabling a device or loading an additional device
> driver
> or otherwise tinkering with probe order, then all bets are off,
> since the sequence of selecting and
> balancing possible device interrupt lines between available IRQs has
> changed.

Agreed in the general case there is nothing we can do.  But
with a reasonable degree of stability things will stay put
long enough for people to reason about them.

Although I would actually expect us to use the normal twiddle of
irq lines on a bus.  But if we can have less contention that is certainly
better.

>>The global irq_desc array is the core of any definition.
>>That is how linux knows about irqs.  Beyond that we ideally
>>have stable numbers across reboots, and recompiles, and hardware
>>addition and removal.  Stable numbers are not really possible
>>but we can come quite close.
>
> Okay, I'm good with irq_desc, Linux's generic IRQ definition,
> being the center of the universe.  I'm also good with the index
> into irq_desc[] being the "IRQ number" show in /proc/interrupts.
> I guess I was thinking from a HW-specific point of view this
> afternoon...

I had to stare at the code for a while to see this as well.

> I don't like vector numbers being exposed in /proc/interrupts
> on x86.  On an ACPI-enabled system, I think that they should be
> the GSI.  This is consistent with the 0-15 legacy definitions
> being pin numbers (which I think we must keep), and consistent
> with the 16-n numbers being IOAPIC pin numbers (+offset),
> (which I think we should have kept).

Agreed.

> Of course, to do this simply, we'd have to select the irq_desc[i]
> with i = gsi, and not i = vector.

Exactly.

>>> We should never have had a problem with un-connected interrupt lines
>>> consuming vectors, as the vectors are handed out at run-time
>>> only when interrupts are requested by drivers.
>>
>>Incorrect.  By being requested by drivers I assume you mean by
>>request_irq.  assign_irq_vector is called long before request_irq
>>and in fact there is currently not a call from request_irq to
>>assign_irq_vector.  Look where assign_irq_vector is called in io_apic.c
>
> You are right.  This code is wrong.
> It makes absolutely no sense to reserve vectors in advance
> for every RTE in the IOAPIC when we don't even know if they
> are going to be used.
>
> This is clearly a holdover from the early IOAPIC/MPS days
> when we were talking about 4 to 8 non-legacy RTEs.
>
> This is where the big system vector shortage problem
> should be addressed.

Yes.  Natalie mentioned that there are some IBM systems
that still run out of processor vectors with our current
irq collapsing.

>>I think there is a legitimate case for legacy edge triggered
>>interrupts to request a vector sooner, as there are so many races in
>>the enable/disable paths. 
>
> I don't follow you on this part.

For edge triggered legacy irqs coming off the i8259 it probably makes
sense to assign them vectors early like we do currently.

I believe although I am not certain that with edge triggered irqs
actually disabling them makes it easy to miss edges.

>>
>>> The problem with this workaround is going to be choose a policy
>>> of where to direct what, and how to move things if interrupts
>>> become un-balanced.
>>
>>The directing problem already exists.  In general an I/O apic can
>>only direct an irq at a specific cpu, and linux already supports
>>cpusets on which an irq may be delivered.
>>
>>But yes on systems with 8 or fewer cpus the I/O apics can do the
>>directing themselves and it is likely we still want to handle that
>>case.
>
> I don't think it is important to optimize for <= 8 CPUs
> having hundreds of unique interrupts.  Systems with huge
> numbers of interrutps will have more than 8 CPUs.  If they
> don't, then the should function, but being optimal isn't
> what the administrator had in mind.  With multiple cores
> being added to systems, this becomes even more true
> over time.

Sorry. To be clear for systems with <= 8 CPUs we need to allocate
a common vector for all 8 cpus at once if we want to use
flat mode and this makes a version of assign_irq_vector that can work
assign per cpu vectors more complicated, since it has to handle both
cases.  Hmm, should assign_irq_vector become a ioapic method?

I don't think we want to loose the optimization of letting the
hardware select select the cpu on small systems.

>>It is reassuring to see that the way I want to make the code work tends
>>to be they way you already think the code works.  I can't be too far
>>off in reducing the complexity.
>
> Yeah, thanks for pointing out that the vectors are assigned at
> IOAPIC init-time.  I hadn't noticed that, and it needs to be addressed.

Exactly.  Once that is fixed setting gsi == irq should be trivial.
Then we only have to worry about big systems that actually use
more than 244 IRQs.  Which is where the per cpu vector assignment
comes in.

Eric



^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-01 23:21 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-01 23:21 UTC (permalink / raw)
  To: Eric W. Biederman, Protasevich, Natalie
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

>The original IRQ (on x86) was simply the enumeration of the input
>pins on the pair of i8259 interrupt controllers.

The IOAPIC was originally supported by MPS before ACPI.
MPS used IRQ to describe the ISA inputs to IOAPICs too,
and PIRQ to describe the PCI inputs.
Linux (wisely, IMHO) simply used IRQ to refer to an interrupt pin,
no matter where it is.

>The ACPI global system interrrupts is the enumeration of the input
>pins on the ioapics in the system.
>
>Therefore the GSI number is the IRQ number, by definition.

IRQ, as the term is now used in Linux, means "something to do
with a specific interrupt".  So yes, a GSI is an IRQ,
so is an interrupt vector, so is an interrupt handler,
so is an MSI, an interrupt controller pin,
and any number of various structures and handlers
where these items have "something to do with a specific interrupt".

If you have a more precise definition of IRQ in Linux,
by all means lets hear it; and lets compare the code to the definition
and see if we can make them match.

>However GSI is not a complete enumeration because it does not list
>MSI interrupt sources, and in general can not list MSI interrupt
>sources.

I agree, the term GSI doesn't comprehend MSIs, nor should it.

The ACPI spec allows the platform to tell the OS that it should
not enable MSI on a particular box.

The ACPI spec allows the OS to tell the platform that it supports MSI.

But, the specification and operation of MSI is completely outside of
ACPI,
as it should be with a standard native hardware interface.

> Further the term GSI is ACPI specific so it really
> doesn't make sense outside of that context while irq does.

I agree that GSI is a specific term with a single definition,
and it should have a single use, based on that definition,
in a context where that definition applies.

Re: irq making sense
Depends on the defintion, I suppose.  I'm still looking for one
that matches how Linux uses the term.

>So the question is how do we solve the big system problem.

Big systems are working today.  The question is how to keep
them working without overly complicating small systems.

>Problem 1.
>  We have more GSIs than interrupt vectors on a cpu, so a simple
>  1-1 mapping will not work.
>  However as Natalie's patch showed many of the GSIs are not even
>  connected so if we only allocate vectors to GSIs that are 
>  use we should not have a problem.

Which patch does this refer to?
We should never have had a problem with un-connected interrupt lines
consuming vectors, as the vectors are handed out at run-time
only when interrupts are requested by drivers.

>Problem 2.
>  Some systems have more than 224 GSIs that are actually 
>  connected to devices.
>  There are three possible ways to handle this case. 
>  - Fail after we run out of vectors.
>  - Share a vector.
>
>  - Allow vectors of different cpus to handle different irqs.
>    The is the most elegant and scalable, and Natalie's suggestion.

So here we allow the same vector to be used by different IOAPICS,
or IOAPIC pins, but have them it directed to different CPUs
who have per-cpu tables to vector to different devices?

A practical workaround?  Certainly.
An elegant solution?  No, that would require better hardware;-)

The problem with this workaround is going to be choose a policy
of where to direct what, and how to move things if interrupts
become un-balanced.

Further, the implementation should cost nothing on systems
that do not need it.

>So what would be a path to get there from here?
>- Fix the irq set_affinity code so that it makes the changes to
>  irqs when the interrupt is actually disabled.
>  Calling desc->handler->disable, desc->handler->enable
>  does not work immediately so the current code is racy.
>  Which shows up very badly if you attempt to change the irq vector,
>  and may cause rare problems today
>
>- Modify the MSI code to allocate irqs and not vectors.
>  So we don't have two paths through the code, for no good reason.

Modify the MSI code to allocate "something to do
with a specific interrupt".  Hmmm, but it does this already:-)

Seriously, aren't the interrupt vectors, or perhaps the
vector/cpu pair, the fundamental resource being allocated here?

>- Modify do_IRQ to get passed an interrupt vector# from the
>  interrupt vector instead of an irq number, and then lookup
>  the irq number in vector_irq.  This means we don't need
>  a code stub per irq, and allows us to handle more irqs
>  by simply increasing NR_IRQS.

isn't the vector number already on the stack from
ENTRY(interrupt)
	pushl $vector-256

what about per/irq kstats?
would you keep stats on a per-vector basis
and translate back to irqs for /proc/interrupts?
what about when the same vector is independently
used by multiple CPUS?

>- Remove the current irq compression.

maybe this can be moved up in the to-do list?

>- Move irq vector allocation/deallocation into request_irq, 
>and free_irq.
>
>- Make vector_irq per_cpu.
>
>- Modify assign_irq to allocate different vectors on different cpus,
>  to fail if we cannot find a free irq vector somewhere in the
>  irqs cpu set, and call assign_irq from set_affinity so when we change
>  cpus we can allocate a different irq.
>
>I have proof of concept level patches for everything thing 
>except making MSI actually allocate irqs, but that should be straight
forward.

except that irqs should be allocating vectors,
the fundamental currency here, and MSI should
be allocating the same, no?

>Does anyone know if we record the maximum GSI number? That looks like
>my sticking point for being able to write a simple irq allocator.

mpparse.c: mp_ioapic_routing[last].gsi_end

cheers,
-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-01 23:21 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-05-01 23:21 UTC (permalink / raw)
  To: Eric W. Biederman, Protasevich, Natalie
  Cc: Andi Kleen, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

>The original IRQ (on x86) was simply the enumeration of the input
>pins on the pair of i8259 interrupt controllers.

The IOAPIC was originally supported by MPS before ACPI.
MPS used IRQ to describe the ISA inputs to IOAPICs too,
and PIRQ to describe the PCI inputs.
Linux (wisely, IMHO) simply used IRQ to refer to an interrupt pin,
no matter where it is.

>The ACPI global system interrrupts is the enumeration of the input
>pins on the ioapics in the system.
>
>Therefore the GSI number is the IRQ number, by definition.

IRQ, as the term is now used in Linux, means "something to do
with a specific interrupt".  So yes, a GSI is an IRQ,
so is an interrupt vector, so is an interrupt handler,
so is an MSI, an interrupt controller pin,
and any number of various structures and handlers
where these items have "something to do with a specific interrupt".

If you have a more precise definition of IRQ in Linux,
by all means lets hear it; and lets compare the code to the definition
and see if we can make them match.

>However GSI is not a complete enumeration because it does not list
>MSI interrupt sources, and in general can not list MSI interrupt
>sources.

I agree, the term GSI doesn't comprehend MSIs, nor should it.

The ACPI spec allows the platform to tell the OS that it should
not enable MSI on a particular box.

The ACPI spec allows the OS to tell the platform that it supports MSI.

But, the specification and operation of MSI is completely outside of
ACPI,
as it should be with a standard native hardware interface.

> Further the term GSI is ACPI specific so it really
> doesn't make sense outside of that context while irq does.

I agree that GSI is a specific term with a single definition,
and it should have a single use, based on that definition,
in a context where that definition applies.

Re: irq making sense
Depends on the defintion, I suppose.  I'm still looking for one
that matches how Linux uses the term.

>So the question is how do we solve the big system problem.

Big systems are working today.  The question is how to keep
them working without overly complicating small systems.

>Problem 1.
>  We have more GSIs than interrupt vectors on a cpu, so a simple
>  1-1 mapping will not work.
>  However as Natalie's patch showed many of the GSIs are not even
>  connected so if we only allocate vectors to GSIs that are 
>  use we should not have a problem.

Which patch does this refer to?
We should never have had a problem with un-connected interrupt lines
consuming vectors, as the vectors are handed out at run-time
only when interrupts are requested by drivers.

>Problem 2.
>  Some systems have more than 224 GSIs that are actually 
>  connected to devices.
>  There are three possible ways to handle this case. 
>  - Fail after we run out of vectors.
>  - Share a vector.
>
>  - Allow vectors of different cpus to handle different irqs.
>    The is the most elegant and scalable, and Natalie's suggestion.

So here we allow the same vector to be used by different IOAPICS,
or IOAPIC pins, but have them it directed to different CPUs
who have per-cpu tables to vector to different devices?

A practical workaround?  Certainly.
An elegant solution?  No, that would require better hardware;-)

The problem with this workaround is going to be choose a policy
of where to direct what, and how to move things if interrupts
become un-balanced.

Further, the implementation should cost nothing on systems
that do not need it.

>So what would be a path to get there from here?
>- Fix the irq set_affinity code so that it makes the changes to
>  irqs when the interrupt is actually disabled.
>  Calling desc->handler->disable, desc->handler->enable
>  does not work immediately so the current code is racy.
>  Which shows up very badly if you attempt to change the irq vector,
>  and may cause rare problems today
>
>- Modify the MSI code to allocate irqs and not vectors.
>  So we don't have two paths through the code, for no good reason.

Modify the MSI code to allocate "something to do
with a specific interrupt".  Hmmm, but it does this already:-)

Seriously, aren't the interrupt vectors, or perhaps the
vector/cpu pair, the fundamental resource being allocated here?

>- Modify do_IRQ to get passed an interrupt vector# from the
>  interrupt vector instead of an irq number, and then lookup
>  the irq number in vector_irq.  This means we don't need
>  a code stub per irq, and allows us to handle more irqs
>  by simply increasing NR_IRQS.

isn't the vector number already on the stack from
ENTRY(interrupt)
	pushl $vector-256

what about per/irq kstats?
would you keep stats on a per-vector basis
and translate back to irqs for /proc/interrupts?
what about when the same vector is independently
used by multiple CPUS?

>- Remove the current irq compression.

maybe this can be moved up in the to-do list?

>- Move irq vector allocation/deallocation into request_irq, 
>and free_irq.
>
>- Make vector_irq per_cpu.
>
>- Modify assign_irq to allocate different vectors on different cpus,
>  to fail if we cannot find a free irq vector somewhere in the
>  irqs cpu set, and call assign_irq from set_affinity so when we change
>  cpus we can allocate a different irq.
>
>I have proof of concept level patches for everything thing 
>except making MSI actually allocate irqs, but that should be straight
forward.

except that irqs should be allocating vectors,
the fundamental currency here, and MSI should
be allocating the same, no?

>Does anyone know if we record the maximum GSI number? That looks like
>my sticking point for being able to write a simple irq allocator.

mpparse.c: mp_ioapic_routing[last].gsi_end

cheers,
-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-01 23:21 ` Brown, Len
  (?)
@ 2006-05-02  6:14 ` Andi Kleen
  2006-05-02  6:57   ` Eric W. Biederman
  -1 siblings, 1 reply; 67+ messages in thread
From: Andi Kleen @ 2006-05-02  6:14 UTC (permalink / raw)
  To: Brown, Len
  Cc: Eric W. Biederman, Protasevich, Natalie, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi


> >- Modify do_IRQ to get passed an interrupt vector# from the
> >  interrupt vector instead of an irq number, and then lookup
> >  the irq number in vector_irq.  This means we don't need
> >  a code stub per irq, and allows us to handle more irqs
> >  by simply increasing NR_IRQS.
> 
> isn't the vector number already on the stack from
> ENTRY(interrupt)
> 	pushl $vector-256

Yes - and interrupts/vectors are currently always identical. If we go
to per CPU IDTs I suspect the stubs will just need to be generated
at runtime and start passing interrupts.

-Andi

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-02  6:14 ` Andi Kleen
@ 2006-05-02  6:57   ` Eric W. Biederman
  2006-05-02  7:11     ` Andi Kleen
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-02  6:57 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Brown, Len, Protasevich, Natalie, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

Andi Kleen <ak@suse.de> writes:

>> >- Modify do_IRQ to get passed an interrupt vector# from the
>> >  interrupt vector instead of an irq number, and then lookup
>> >  the irq number in vector_irq.  This means we don't need
>> >  a code stub per irq, and allows us to handle more irqs
>> >  by simply increasing NR_IRQS.
>> 
>> isn't the vector number already on the stack from
>> ENTRY(interrupt)
>> 	pushl $vector-256
>
> Yes - and interrupts/vectors are currently always identical. 

No.  At best there is a fixed offset.  They can't be
identical because the first 32 vectors are reserved,
for processor exceptions.  

Beyond that the kernel would not need the vector_irq and irq_vector
arrays if they were always identical, or even if they were one to one.

If you look at assign_irq_vector you will see that by default we
allocate every 8th vector.  Looking at the comment in
init_IO_APIC_traps() this seems to be because we want to avoid
apic bugs with multiple interrupts of the same priority.
Although why we skip 8 instead of 16 is beyond me.

> If we go
> to per CPU IDTs I suspect the stubs will just need to be generated
> at runtime and start passing interrupts.

If we can generate the stubs at run time that will remove my
biggest problem with them, that we can't easily make the number
of stubs track NR_IRQS.  Not needing an extra table lookup is
certainly desirable, and probably worth the extra 4-8 bytes that a stub
is larger that a per cpu table entry.

Although now that I think about it, using some assembler macros
instead of cpp macros could probably solve the problem more easily
than generating the stubs at runtime.  I think the worst case is
256 cpus * 32 irqs per cpu * 10 bytes per stub = 80K.

At 8 cpus we are about where we are now.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-02  6:57   ` Eric W. Biederman
@ 2006-05-02  7:11     ` Andi Kleen
  2006-05-02  7:39       ` Eric W. Biederman
  0 siblings, 1 reply; 67+ messages in thread
From: Andi Kleen @ 2006-05-02  7:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Brown, Len, Protasevich, Natalie, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

On Tuesday 02 May 2006 08:57, Eric W. Biederman wrote:
> Andi Kleen <ak@suse.de> writes:
> 
> >> >- Modify do_IRQ to get passed an interrupt vector# from the
> >> >  interrupt vector instead of an irq number, and then lookup
> >> >  the irq number in vector_irq.  This means we don't need
> >> >  a code stub per irq, and allows us to handle more irqs
> >> >  by simply increasing NR_IRQS.
> >> 
> >> isn't the vector number already on the stack from
> >> ENTRY(interrupt)
> >> 	pushl $vector-256
> >
> > Yes - and interrupts/vectors are currently always identical. 
> 
> No.  At best there is a fixed offset.  They can't be
> identical because the first 32 vectors are reserved,
> for processor exceptions.  
> 
> Beyond that the kernel would not need the vector_irq and irq_vector
> arrays if they were always identical, or even if they were one to one.


Yes I should have said it's a fixed offset. Sorry for the confusion.
Just no mapping table needed.

> 
> If you look at assign_irq_vector you will see that by default we
> allocate every 8th vector.  Looking at the comment in
> init_IO_APIC_traps() this seems to be because we want to avoid
> apic bugs with multiple interrupts of the same priority.
> Although why we skip 8 instead of 16 is beyond me.

Hmm - i guess that's old APIC bugs. Might be worth revisiting
on 64bit.


> 
> Although now that I think about it, using some assembler macros
> instead of cpp macros could probably solve the problem more easily
> than generating the stubs at runtime.  I think the worst case is
> 256 cpus * 32 irqs per cpu 

32 irqs? It's (255-32) 

> * 10 bytes per stub = 80K. 

My calculations gave >200k which is definitely too much.

-Andi


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-02  7:11     ` Andi Kleen
@ 2006-05-02  7:39       ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-02  7:39 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Brown, Len, Protasevich, Natalie, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

Andi Kleen <ak@suse.de> writes:

> On Tuesday 02 May 2006 08:57, Eric W. Biederman wrote:
>> Andi Kleen <ak@suse.de> writes:
>> 
>> No.  At best there is a fixed offset.  They can't be
>> identical because the first 32 vectors are reserved,
>> for processor exceptions.  
>> 
>> Beyond that the kernel would not need the vector_irq and irq_vector
>> arrays if they were always identical, or even if they were one to one.
>
>
> Yes I should have said it's a fixed offset. Sorry for the confusion.
> Just no mapping table needed.

As far as I can see no mapping table is needed simply because
we place a stub with the irq number of the irq we want to service
at the idt entry of the vector we want to service.

For the old ISA irqs it is certainly a fixed offset.

>> If you look at assign_irq_vector you will see that by default we
>> allocate every 8th vector.  Looking at the comment in
>> init_IO_APIC_traps() this seems to be because we want to avoid
>> apic bugs with multiple interrupts of the same priority.
>> Although why we skip 8 instead of 16 is beyond me.
>
> Hmm - i guess that's old APIC bugs. Might be worth revisiting
> on 64bit.

Certainly worth a look.  I just know that is what the code currently
does. 

>> Although now that I think about it, using some assembler macros
>> instead of cpp macros could probably solve the problem more easily
>> than generating the stubs at runtime.  I think the worst case is
>> 256 cpus * 32 irqs per cpu 
>
> 32 irqs? It's (255-32) 

If you look at NR_IRQ_VECTORS.  32 is currently our estimate on the number
of IRQs sources per cpu.  I think a sane formula for NR_IRQS once
the static limits are removed is something like:
#define NR_IRQS (NR_VECTORS + (NR_CPUS*32))

Which starts out a little high to account for systems that are over endowed
with I/O and sticks with our current fairly high estimate of 32 IRQs per cpu.

Although that 32 may actually come from our the skip 8 logic.

>> * 10 bytes per stub = 80K. 
>
> My calculations gave >200k which is definitely too much.

With a per cpu lookup table we get 4 bytes per entry and 256 entries
per cpu = 1K per cpu.  Which ultimately uses 256K, on a 256 cpu
system.  Which is nothing for a machine with 256GB or so of RAM.
And that could really be be 2 bytes per entry.

I really don't care how we generate the stubs, or if we generate
stubs so long as all I need to do is change NR_IRQS recompile and
it works.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-05-01 23:21 ` Brown, Len
@ 2006-05-02  6:24   ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-02  6:24 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>The original IRQ (on x86) was simply the enumeration of the input
>>pins on the pair of i8259 interrupt controllers.
>
> The IOAPIC was originally supported by MPS before ACPI.
> MPS used IRQ to describe the ISA inputs to IOAPICs too,
> and PIRQ to describe the PCI inputs.
> Linux (wisely, IMHO) simply used IRQ to refer to an interrupt pin,
> no matter where it is.
>
>>The ACPI global system interrrupts is the enumeration of the input
>>pins on the ioapics in the system.
>>
>>Therefore the GSI number is the IRQ number, by definition.
>
> IRQ, as the term is now used in Linux, means "something to do
> with a specific interrupt".  So yes, a GSI is an IRQ,
> so is an interrupt vector, so is an interrupt handler,
> so is an MSI, an interrupt controller pin,
> and any number of various structures and handlers
> where these items have "something to do with a specific interrupt".
>
> If you have a more precise definition of IRQ in Linux,
> by all means lets hear it; and lets compare the code to the definition
> and see if we can make them match.

IRQ is an interrupt request, from a device.
Currently they can come in over a pin, or over a packet.
IRQs at the source can be shared.

For an IRQ number it is just an enumeration of interrupt sources.
Ideally in a stable manner so that the assigned number does not
depend on compile options, or which version of the kernel you
are running.

>>However GSI is not a complete enumeration because it does not list
>>MSI interrupt sources, and in general can not list MSI interrupt
>>sources.
>
> I agree, the term GSI doesn't comprehend MSIs, nor should it.
>
> The ACPI spec allows the platform to tell the OS that it should
> not enable MSI on a particular box.
>
> The ACPI spec allows the OS to tell the platform that it supports MSI.
>
> But, the specification and operation of MSI is completely outside of
> ACPI,
> as it should be with a standard native hardware interface.
>
>> Further the term GSI is ACPI specific so it really
>> doesn't make sense outside of that context while irq does.
>
> I agree that GSI is a specific term with a single definition,
> and it should have a single use, based on that definition,
> in a context where that definition applies.
>
> Re: irq making sense
> Depends on the defintion, I suppose.  I'm still looking for one
> that matches how Linux uses the term.

The global irq_desc array is the core of any definition.
That is how linux knows about irqs.  Beyond that we ideally
have stable numbers across reboots, and recompiles, and hardware
addition and removal.  Stable numbers are not really possible
but we can come quite close.

>>So the question is how do we solve the big system problem.
>
> Big systems are working today.  The question is how to keep
> them working without overly complicating small systems.

Agreed.  A better way to have put is how do we solve the problems
of big systems in a simple way.

>>Problem 1.
>>  We have more GSIs than interrupt vectors on a cpu, so a simple
>>  1-1 mapping will not work.
>>  However as Natalie's patch showed many of the GSIs are not even
>>  connected so if we only allocate vectors to GSIs that are 
>>  use we should not have a problem.
>
> Which patch does this refer to?

Sorry to be clear the irq compression code, that being fixed yet
again started this thread.

> We should never have had a problem with un-connected interrupt lines
> consuming vectors, as the vectors are handed out at run-time
> only when interrupts are requested by drivers.

Incorrect.  By being requested by drivers I assume you mean by
request_irq.  assign_irq_vector is called long before request_irq
and in fact there is currently not a call from request_irq to
assign_irq_vector.  Look where assign_irq_vector is called in io_apic.c

I think there is a legitimate case for legacy edge triggered
interrupts to request a vector sooner, as there are so many races in
the enable/disable paths. 

>>Problem 2.
>>  Some systems have more than 224 GSIs that are actually 
>>  connected to devices.
>>  There are three possible ways to handle this case. 
>>  - Fail after we run out of vectors.
>>  - Share a vector.
>>
>>  - Allow vectors of different cpus to handle different irqs.
>>    The is the most elegant and scalable, and Natalie's suggestion.
>
> So here we allow the same vector to be used by different IOAPICS,
> or IOAPIC pins, but have them it directed to different CPUs
> who have per-cpu tables to vector to different devices?
>
> A practical workaround?  Certainly.
> An elegant solution?  No, that would require better hardware;-)

Agreed. But we lost elegant at the point of on demand assigning of
vectors to irqs.

> The problem with this workaround is going to be choose a policy
> of where to direct what, and how to move things if interrupts
> become un-balanced.

The directing problem already exists.  In general an I/O apic can
only direct an irq at a specific cpu, and linux already supports
cpusets on which an irq may be delivered.

But yes on systems with 8 or fewer cpus the I/O apics can do the
directing themselves and it is likely we still want to handle that
case.

> Further, the implementation should cost nothing on systems
> that do not need it.

I think the entire cost can be left in the vector selection,
but certainly something to scrutinized the patch for.

>>So what would be a path to get there from here?
>>- Fix the irq set_affinity code so that it makes the changes to
>>  irqs when the interrupt is actually disabled.
>>  Calling desc->handler->disable, desc->handler->enable
>>  does not work immediately so the current code is racy.
>>  Which shows up very badly if you attempt to change the irq vector,
>>  and may cause rare problems today
>>
>>- Modify the MSI code to allocate irqs and not vectors.
>>  So we don't have two paths through the code, for no good reason.
>
> Modify the MSI code to allocate "something to do
> with a specific interrupt".  Hmmm, but it does this already:-)
>
> Seriously, aren't the interrupt vectors, or perhaps the
> vector/cpu pair, the fundamental resource being allocated here?

So there are two pieces.  struct irq_desc that describes
an interrupt source, and the vector that catches the interrupt
source and tells the cpu about it.  

To be able to manage an irq source we need to enumerate and give it a
struct irq_desc entry in the global irq_desc array.  One of the
operations we perform after that point is to allocate it a vector so
we can catch that irq source.  But other operations such as reporting,
and the assignment of cpu affinity also happen based on the entry
in the irq_desc array.

Except for architecture implementations of irq routing the vectors
that we catch irqs with should be completely invisible.  The ideal
architecture would not even have a separate concept and except for
the msi code linux has not exported the concept of vectors outside
of the architecture implementations.

>>- Modify do_IRQ to get passed an interrupt vector# from the
>>  interrupt vector instead of an irq number, and then lookup
>>  the irq number in vector_irq.  This means we don't need
>>  a code stub per irq, and allows us to handle more irqs
>>  by simply increasing NR_IRQS.
>
> isn't the vector number already on the stack from
> ENTRY(interrupt)
> 	pushl $vector-256

No.  Currently we pass in the irq number.  The vector
number is forgotten.  So this would require a
vector_irq[vector#] lookup in do_IRQ.

Either that or we allocate a lot more stubs.
Basically there is a simple trade off between code size
and data size in how we do this.  I think less code and
more data is the way to go but I could trivially be convinced
otherwise.

> what about per/irq kstats?
> would you keep stats on a per-vector basis
> and translate back to irqs for /proc/interrupts?

No.  What we have on a per irq basis is what makes sense
and I would leave that untouched.

> what about when the same vector is independently
> used by multiple CPUS?

Accounting by irq makes sense.

>>- Remove the current irq compression.
>
> maybe this can be moved up in the to-do list?

Not much.  
- The current set_affinity code has a real bug.
- The MSI mismatch is worse and fixing it reduces testing later. 
- We need to raise the number of irqs we can support.  Either
  by vastly increasing the number of stubs, whose number looks
  hard to control at compile time, or by turning this into
  a data problem.

>>- Move irq vector allocation/deallocation into request_irq, 
>>and free_irq.
>>
>>- Make vector_irq per_cpu.
>>
>>- Modify assign_irq to allocate different vectors on different cpus,
>>  to fail if we cannot find a free irq vector somewhere in the
>>  irqs cpu set, and call assign_irq from set_affinity so when we change
>>  cpus we can allocate a different irq.
>>
>>I have proof of concept level patches for everything thing 
>>except making MSI actually allocate irqs, but that should be straight
> forward.
>
> except that irqs should be allocating vectors,
> the fundamental currency here, and MSI should
> be allocating the same, no?

Not quite.  The MSI code should first allocate
an irq_desc entry get an IRQ number.

Then when the driver calls request_irq the architecture specific
code should arrange for __do_IRQ to be called with the appropriate
irq number when the interrupt happens.  The architecture specific
code needs to do this by setting the MSI address and the MSI data to
an architecture specific values, that on x86 will involve the a
vector.

>>Does anyone know if we record the maximum GSI number? That looks like
>>my sticking point for being able to write a simple irq allocator.
>
> mpparse.c: mp_ioapic_routing[last].gsi_end

Thanks.  I found that a bit before I saw this email.
It looks like I have all of the pieces now to see if I have the time. :)

It is reassuring to see that the way I want to make the code work tends
to be they way you already think the code works.  I can't be too far
off in reducing the complexity.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-05-02  6:24   ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-05-02  6:24 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, Andi Kleen, sergio, Kimball Murray,
	linux-kernel, akpm, kmurray, linux-acpi

"Brown, Len" <len.brown@intel.com> writes:

>>The original IRQ (on x86) was simply the enumeration of the input
>>pins on the pair of i8259 interrupt controllers.
>
> The IOAPIC was originally supported by MPS before ACPI.
> MPS used IRQ to describe the ISA inputs to IOAPICs too,
> and PIRQ to describe the PCI inputs.
> Linux (wisely, IMHO) simply used IRQ to refer to an interrupt pin,
> no matter where it is.
>
>>The ACPI global system interrrupts is the enumeration of the input
>>pins on the ioapics in the system.
>>
>>Therefore the GSI number is the IRQ number, by definition.
>
> IRQ, as the term is now used in Linux, means "something to do
> with a specific interrupt".  So yes, a GSI is an IRQ,
> so is an interrupt vector, so is an interrupt handler,
> so is an MSI, an interrupt controller pin,
> and any number of various structures and handlers
> where these items have "something to do with a specific interrupt".
>
> If you have a more precise definition of IRQ in Linux,
> by all means lets hear it; and lets compare the code to the definition
> and see if we can make them match.

IRQ is an interrupt request, from a device.
Currently they can come in over a pin, or over a packet.
IRQs at the source can be shared.

For an IRQ number it is just an enumeration of interrupt sources.
Ideally in a stable manner so that the assigned number does not
depend on compile options, or which version of the kernel you
are running.

>>However GSI is not a complete enumeration because it does not list
>>MSI interrupt sources, and in general can not list MSI interrupt
>>sources.
>
> I agree, the term GSI doesn't comprehend MSIs, nor should it.
>
> The ACPI spec allows the platform to tell the OS that it should
> not enable MSI on a particular box.
>
> The ACPI spec allows the OS to tell the platform that it supports MSI.
>
> But, the specification and operation of MSI is completely outside of
> ACPI,
> as it should be with a standard native hardware interface.
>
>> Further the term GSI is ACPI specific so it really
>> doesn't make sense outside of that context while irq does.
>
> I agree that GSI is a specific term with a single definition,
> and it should have a single use, based on that definition,
> in a context where that definition applies.
>
> Re: irq making sense
> Depends on the defintion, I suppose.  I'm still looking for one
> that matches how Linux uses the term.

The global irq_desc array is the core of any definition.
That is how linux knows about irqs.  Beyond that we ideally
have stable numbers across reboots, and recompiles, and hardware
addition and removal.  Stable numbers are not really possible
but we can come quite close.

>>So the question is how do we solve the big system problem.
>
> Big systems are working today.  The question is how to keep
> them working without overly complicating small systems.

Agreed.  A better way to have put is how do we solve the problems
of big systems in a simple way.

>>Problem 1.
>>  We have more GSIs than interrupt vectors on a cpu, so a simple
>>  1-1 mapping will not work.
>>  However as Natalie's patch showed many of the GSIs are not even
>>  connected so if we only allocate vectors to GSIs that are 
>>  use we should not have a problem.
>
> Which patch does this refer to?

Sorry to be clear the irq compression code, that being fixed yet
again started this thread.

> We should never have had a problem with un-connected interrupt lines
> consuming vectors, as the vectors are handed out at run-time
> only when interrupts are requested by drivers.

Incorrect.  By being requested by drivers I assume you mean by
request_irq.  assign_irq_vector is called long before request_irq
and in fact there is currently not a call from request_irq to
assign_irq_vector.  Look where assign_irq_vector is called in io_apic.c

I think there is a legitimate case for legacy edge triggered
interrupts to request a vector sooner, as there are so many races in
the enable/disable paths. 

>>Problem 2.
>>  Some systems have more than 224 GSIs that are actually 
>>  connected to devices.
>>  There are three possible ways to handle this case. 
>>  - Fail after we run out of vectors.
>>  - Share a vector.
>>
>>  - Allow vectors of different cpus to handle different irqs.
>>    The is the most elegant and scalable, and Natalie's suggestion.
>
> So here we allow the same vector to be used by different IOAPICS,
> or IOAPIC pins, but have them it directed to different CPUs
> who have per-cpu tables to vector to different devices?
>
> A practical workaround?  Certainly.
> An elegant solution?  No, that would require better hardware;-)

Agreed. But we lost elegant at the point of on demand assigning of
vectors to irqs.

> The problem with this workaround is going to be choose a policy
> of where to direct what, and how to move things if interrupts
> become un-balanced.

The directing problem already exists.  In general an I/O apic can
only direct an irq at a specific cpu, and linux already supports
cpusets on which an irq may be delivered.

But yes on systems with 8 or fewer cpus the I/O apics can do the
directing themselves and it is likely we still want to handle that
case.

> Further, the implementation should cost nothing on systems
> that do not need it.

I think the entire cost can be left in the vector selection,
but certainly something to scrutinized the patch for.

>>So what would be a path to get there from here?
>>- Fix the irq set_affinity code so that it makes the changes to
>>  irqs when the interrupt is actually disabled.
>>  Calling desc->handler->disable, desc->handler->enable
>>  does not work immediately so the current code is racy.
>>  Which shows up very badly if you attempt to change the irq vector,
>>  and may cause rare problems today
>>
>>- Modify the MSI code to allocate irqs and not vectors.
>>  So we don't have two paths through the code, for no good reason.
>
> Modify the MSI code to allocate "something to do
> with a specific interrupt".  Hmmm, but it does this already:-)
>
> Seriously, aren't the interrupt vectors, or perhaps the
> vector/cpu pair, the fundamental resource being allocated here?

So there are two pieces.  struct irq_desc that describes
an interrupt source, and the vector that catches the interrupt
source and tells the cpu about it.  

To be able to manage an irq source we need to enumerate and give it a
struct irq_desc entry in the global irq_desc array.  One of the
operations we perform after that point is to allocate it a vector so
we can catch that irq source.  But other operations such as reporting,
and the assignment of cpu affinity also happen based on the entry
in the irq_desc array.

Except for architecture implementations of irq routing the vectors
that we catch irqs with should be completely invisible.  The ideal
architecture would not even have a separate concept and except for
the msi code linux has not exported the concept of vectors outside
of the architecture implementations.

>>- Modify do_IRQ to get passed an interrupt vector# from the
>>  interrupt vector instead of an irq number, and then lookup
>>  the irq number in vector_irq.  This means we don't need
>>  a code stub per irq, and allows us to handle more irqs
>>  by simply increasing NR_IRQS.
>
> isn't the vector number already on the stack from
> ENTRY(interrupt)
> 	pushl $vector-256

No.  Currently we pass in the irq number.  The vector
number is forgotten.  So this would require a
vector_irq[vector#] lookup in do_IRQ.

Either that or we allocate a lot more stubs.
Basically there is a simple trade off between code size
and data size in how we do this.  I think less code and
more data is the way to go but I could trivially be convinced
otherwise.

> what about per/irq kstats?
> would you keep stats on a per-vector basis
> and translate back to irqs for /proc/interrupts?

No.  What we have on a per irq basis is what makes sense
and I would leave that untouched.

> what about when the same vector is independently
> used by multiple CPUS?

Accounting by irq makes sense.

>>- Remove the current irq compression.
>
> maybe this can be moved up in the to-do list?

Not much.  
- The current set_affinity code has a real bug.
- The MSI mismatch is worse and fixing it reduces testing later. 
- We need to raise the number of irqs we can support.  Either
  by vastly increasing the number of stubs, whose number looks
  hard to control at compile time, or by turning this into
  a data problem.

>>- Move irq vector allocation/deallocation into request_irq, 
>>and free_irq.
>>
>>- Make vector_irq per_cpu.
>>
>>- Modify assign_irq to allocate different vectors on different cpus,
>>  to fail if we cannot find a free irq vector somewhere in the
>>  irqs cpu set, and call assign_irq from set_affinity so when we change
>>  cpus we can allocate a different irq.
>>
>>I have proof of concept level patches for everything thing 
>>except making MSI actually allocate irqs, but that should be straight
> forward.
>
> except that irqs should be allocating vectors,
> the fundamental currency here, and MSI should
> be allocating the same, no?

Not quite.  The MSI code should first allocate
an irq_desc entry get an IRQ number.

Then when the driver calls request_irq the architecture specific
code should arrange for __do_IRQ to be called with the appropriate
irq number when the interrupt happens.  The architecture specific
code needs to do this by setting the MSI address and the MSI data to
an architecture specific values, that on x86 will involve the a
vector.

>>Does anyone know if we record the maximum GSI number? That looks like
>>my sticking point for being able to write a simple irq allocator.
>
> mpparse.c: mp_ioapic_routing[last].gsi_end

Thanks.  I found that a bit before I saw this email.
It looks like I have all of the pieces now to see if I have the time. :)

It is reassuring to see that the way I want to make the code work tends
to be they way you already think the code works.  I can't be too far
off in reducing the complexity.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 20:36 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-27 20:36 UTC (permalink / raw)
  To: Brown, Len, Andi Kleen
  Cc: sergio, Kimball Murray, linux-kernel, akpm, kmurray, linux-acpi

> >But I guess using GSI/vector internally only would be fine.
> 
> The last time I tried to name a variable "gsi" instead of "irq",
> Linus launched into a tirade that "GSI" doesn't mean anything to him,
> or anybody else that googles it.  On the other hand "IRQ" means
> something
> to everybody, and if you google it you find all kinds of interesting
> interrupt-related things.
> 
> My point was that "IRQ" means so many "interrupt related" things to
> different people in different contexts, that it is effectively
> meaningless.
> 
> But Linus was not swayed.
> 

Oh Len, let's call this thing IRQ why not ;) I kind of agree that this
is more popular and well-known term, like an old trade mark. I just see
all those layers of code right now to map those to GSIs, pins, whatever
it is, that can be replaced with... well, much smaller layers of code :)
and maybe less "assumpti-ous" too.

--Natalie 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 20:36 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-27 20:36 UTC (permalink / raw)
  To: Brown, Len, Andi Kleen
  Cc: sergio, Kimball Murray, linux-kernel, akpm, kmurray, linux-acpi

> >But I guess using GSI/vector internally only would be fine.
> 
> The last time I tried to name a variable "gsi" instead of "irq",
> Linus launched into a tirade that "GSI" doesn't mean anything to him,
> or anybody else that googles it.  On the other hand "IRQ" means
> something
> to everybody, and if you google it you find all kinds of interesting
> interrupt-related things.
> 
> My point was that "IRQ" means so many "interrupt related" things to
> different people in different contexts, that it is effectively
> meaningless.
> 
> But Linus was not swayed.
> 

Oh Len, let's call this thing IRQ why not ;) I kind of agree that this
is more popular and well-known term, like an old trade mark. I just see
all those layers of code right now to map those to GSIs, pins, whatever
it is, that can be replaced with... well, much smaller layers of code :)
and maybe less "assumpti-ous" too.

--Natalie 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-27 20:36 ` Protasevich, Natalie
@ 2006-04-30 23:17   ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-04-30 23:17 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: Brown, Len, Andi Kleen, sergio, Kimball Murray, linux-kernel,
	akpm, kmurray, linux-acpi

"Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> writes:

>> >But I guess using GSI/vector internally only would be fine.
>> 
>> The last time I tried to name a variable "gsi" instead of "irq",
>> Linus launched into a tirade that "GSI" doesn't mean anything to him,
>> or anybody else that googles it.  On the other hand "IRQ" means
>> something
>> to everybody, and if you google it you find all kinds of interesting
>> interrupt-related things.
>> 
>> My point was that "IRQ" means so many "interrupt related" things to
>> different people in different contexts, that it is effectively
>> meaningless.
>> 
>> But Linus was not swayed.
>> 
>
> Oh Len, let's call this thing IRQ why not ;) I kind of agree that this
> is more popular and well-known term, like an old trade mark. I just see
> all those layers of code right now to map those to GSIs, pins, whatever
> it is, that can be replaced with... well, much smaller layers of code :)
> and maybe less "assumpti-ous" too.

The original IRQ (on x86) was simply the enumeration of the input
pins on the pair of i8259 interrupt controllers.

The ACPI global system interrrupts is the enumeration of the input
pins on the ioapics in the system.

Therefore the GSI number is the IRQ number, by definition.

However GSI is not a complete enumeration because it does not list
MSI interrupt sources, and in general can not list MSI interrupt
sources.  Further the term GSI is ACPI specific so it really
doesn't make sense outside of that context while irq does.

So the question is how do we solve the big system problem.

Problem 1.
  We have more GSIs than interrupt vectors on a cpu, so a simple
  1-1 mapping will not work.
  However as Natalie's patch showed many of the GSIs are not even
  connected so if we only allocate vectors to GSIs that are use we should
  not have a problem.

Problem 2.
  Some systems have more than 224 GSIs that are actually connected to devices.
  There are three possible ways to handle this case. 
  - Fail after we run out of vectors.
  - Share a vector.
  - Allow vectors of different cpus to handle different irqs.
    The is the most elegant and scalable, and Natalie's suggestion.

So what would be a path to get there from here?
- Fix the irq set_affinity code so that it makes the changes to
  irqs when the interrupt is actually disabled.
  Calling desc->handler->disable, desc->handler->enable
  does not work immediately so the current code is racy.
  Which shows up very badly if you attempt to change the irq vector,
  and may cause rare problems today

- Modify the MSI code to allocate irqs and not vectors.
  So we don't have two paths through the code, for no good reason.

- Modify do_IRQ to get passed an interrupt vector# from the
  interrupt vector instead of an irq number, and then lookup
  the irq number in vector_irq.  This means we don't need
  a code stub per irq, and allows us to handle more irqs
  by simply increasing NR_IRQS.

- Remove the current irq compression.

- Move irq vector allocation/deallocation into request_irq, and free_irq.

- Make vector_irq per_cpu.

- Modify assign_irq to allocate different vectors on different cpus,
  to fail if we cannot find a free irq vector somewhere in the
  irqs cpu set, and call assign_irq from set_affinity so when we change
  cpus we can allocate a different irq.

I have proof of concept level patches for everything thing except making
MSI actually allocate irqs, but that should be straight forward.

Does anyone know if we record the maximum GSI number? That looks like
my sticking point for being able to write a simple irq allocator.

If I have a couple more free minutes I will see if I can post some
preliminary patches.  This is issue play for me so I have limited time
to work on it.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-30 23:17   ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2006-04-30 23:17 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: Brown, Len, Andi Kleen, sergio, Kimball Murray, linux-kernel,
	akpm, kmurray, linux-acpi

"Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> writes:

>> >But I guess using GSI/vector internally only would be fine.
>> 
>> The last time I tried to name a variable "gsi" instead of "irq",
>> Linus launched into a tirade that "GSI" doesn't mean anything to him,
>> or anybody else that googles it.  On the other hand "IRQ" means
>> something
>> to everybody, and if you google it you find all kinds of interesting
>> interrupt-related things.
>> 
>> My point was that "IRQ" means so many "interrupt related" things to
>> different people in different contexts, that it is effectively
>> meaningless.
>> 
>> But Linus was not swayed.
>> 
>
> Oh Len, let's call this thing IRQ why not ;) I kind of agree that this
> is more popular and well-known term, like an old trade mark. I just see
> all those layers of code right now to map those to GSIs, pins, whatever
> it is, that can be replaced with... well, much smaller layers of code :)
> and maybe less "assumpti-ous" too.

The original IRQ (on x86) was simply the enumeration of the input
pins on the pair of i8259 interrupt controllers.

The ACPI global system interrrupts is the enumeration of the input
pins on the ioapics in the system.

Therefore the GSI number is the IRQ number, by definition.

However GSI is not a complete enumeration because it does not list
MSI interrupt sources, and in general can not list MSI interrupt
sources.  Further the term GSI is ACPI specific so it really
doesn't make sense outside of that context while irq does.

So the question is how do we solve the big system problem.

Problem 1.
  We have more GSIs than interrupt vectors on a cpu, so a simple
  1-1 mapping will not work.
  However as Natalie's patch showed many of the GSIs are not even
  connected so if we only allocate vectors to GSIs that are use we should
  not have a problem.

Problem 2.
  Some systems have more than 224 GSIs that are actually connected to devices.
  There are three possible ways to handle this case. 
  - Fail after we run out of vectors.
  - Share a vector.
  - Allow vectors of different cpus to handle different irqs.
    The is the most elegant and scalable, and Natalie's suggestion.

So what would be a path to get there from here?
- Fix the irq set_affinity code so that it makes the changes to
  irqs when the interrupt is actually disabled.
  Calling desc->handler->disable, desc->handler->enable
  does not work immediately so the current code is racy.
  Which shows up very badly if you attempt to change the irq vector,
  and may cause rare problems today

- Modify the MSI code to allocate irqs and not vectors.
  So we don't have two paths through the code, for no good reason.

- Modify do_IRQ to get passed an interrupt vector# from the
  interrupt vector instead of an irq number, and then lookup
  the irq number in vector_irq.  This means we don't need
  a code stub per irq, and allows us to handle more irqs
  by simply increasing NR_IRQS.

- Remove the current irq compression.

- Move irq vector allocation/deallocation into request_irq, and free_irq.

- Make vector_irq per_cpu.

- Modify assign_irq to allocate different vectors on different cpus,
  to fail if we cannot find a free irq vector somewhere in the
  irqs cpu set, and call assign_irq from set_affinity so when we change
  cpus we can allocate a different irq.

I have proof of concept level patches for everything thing except making
MSI actually allocate irqs, but that should be straight forward.

Does anyone know if we record the maximum GSI number? That looks like
my sticking point for being able to write a simple irq allocator.

If I have a couple more free minutes I will see if I can post some
preliminary patches.  This is issue play for me so I have limited time
to work on it.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 19:26 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-27 19:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Brown, Len, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> > Len, maybe it sounds dramatic and/or extreme, but how about getting 
> > rid of IRQs and just having GSI-vector pair.
> > I intuitively think that would be possible (not that I have all the 
> > details lined up :) And this would probably take away confusing IRQ 
> > abstraction out once and for all? I think something like 
> that is done 
> > in ia64.
> 
> x86 users are attached to their interrupt numbers I think 
> back from the bad old days with only 16 interrupts and 
> interrupt sharing didn't work. We might have a revolt in the 
> user base if /proc/interrupts didn't display them anymore @)
> 
> But I guess using GSI/vector internally only would be fine.
> 
Oh I completely agree there are probably few strings attached that have
to be mimicked and kept (especially those 16), but I think that can be
done.
It feels like not a small change to me, but would probably be
worthwhile. It is such a boring thing trying to fit new chipsets and
system dimensions into old inflexible format...

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 19:26 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-27 19:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Brown, Len, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

> > Len, maybe it sounds dramatic and/or extreme, but how about getting 
> > rid of IRQs and just having GSI-vector pair.
> > I intuitively think that would be possible (not that I have all the 
> > details lined up :) And this would probably take away confusing IRQ 
> > abstraction out once and for all? I think something like 
> that is done 
> > in ia64.
> 
> x86 users are attached to their interrupt numbers I think 
> back from the bad old days with only 16 interrupts and 
> interrupt sharing didn't work. We might have a revolt in the 
> user base if /proc/interrupts didn't display them anymore @)
> 
> But I guess using GSI/vector internally only would be fine.
> 
Oh I completely agree there are probably few strings attached that have
to be mimicked and kept (especially those 16), but I think that can be
done.
It feels like not a small change to me, but would probably be
worthwhile. It is such a boring thing trying to fit new chipsets and
system dimensions into old inflexible format...

--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 19:10 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-27 19:10 UTC (permalink / raw)
  To: Brown, Len, sergio
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray, linux-acpi

> 
> >There are probably better ways to control 224 possible IRQs by their 
> >total number instead of their range, and per-cpu IDTs are the better 
> >answer to the IRQ shortage altogether. But just going back 
> to the way 
> >it was wouldn't be right I think.
> >We were able to run 2 generations of
> >systems only because we had this compression, other big systems 
> >benefited from it as well.
> 
> I don't propose reverting the IRQ re-name patch and breaking 
> the big iron without replacing it with something else that works.

Len, maybe it sounds dramatic and/or extreme, but how about getting rid
of IRQs and just having GSI-vector pair.
I intuitively think that would be possible (not that I have all the
details lined up :)
And this would probably take away confusing IRQ abstraction out once and
for all? I think something like that is done in ia64.

--Natalie 
> 
> My point is that the re-name patch has added unnecessary 
> maintenance complexity to the 99.9% of systems that it runs 
> on.  We pay that price in several ways, including 
> mis-understandings about what devices are on what irqs, and 
> mis-understandings about how the code is supposed to work.
> 
> -Len
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 19:10 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-27 19:10 UTC (permalink / raw)
  To: Brown, Len, sergio
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray, linux-acpi

> 
> >There are probably better ways to control 224 possible IRQs by their 
> >total number instead of their range, and per-cpu IDTs are the better 
> >answer to the IRQ shortage altogether. But just going back 
> to the way 
> >it was wouldn't be right I think.
> >We were able to run 2 generations of
> >systems only because we had this compression, other big systems 
> >benefited from it as well.
> 
> I don't propose reverting the IRQ re-name patch and breaking 
> the big iron without replacing it with something else that works.

Len, maybe it sounds dramatic and/or extreme, but how about getting rid
of IRQs and just having GSI-vector pair.
I intuitively think that would be possible (not that I have all the
details lined up :)
And this would probably take away confusing IRQ abstraction out once and
for all? I think something like that is done in ia64.

--Natalie 
> 
> My point is that the re-name patch has added unnecessary 
> maintenance complexity to the 99.9% of systems that it runs 
> on.  We pay that price in several ways, including 
> mis-understandings about what devices are on what irqs, and 
> mis-understandings about how the code is supposed to work.
> 
> -Len
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-27 19:10 ` Protasevich, Natalie
  (?)
@ 2006-04-27 19:13 ` Andi Kleen
  -1 siblings, 0 replies; 67+ messages in thread
From: Andi Kleen @ 2006-04-27 19:13 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: Brown, Len, sergio, Kimball Murray, linux-kernel, akpm, kmurray,
	linux-acpi

On Thursday 27 April 2006 21:10, Protasevich, Natalie wrote:
> > 
> > >There are probably better ways to control 224 possible IRQs by their 
> > >total number instead of their range, and per-cpu IDTs are the better 
> > >answer to the IRQ shortage altogether. But just going back 
> > to the way 
> > >it was wouldn't be right I think.
> > >We were able to run 2 generations of
> > >systems only because we had this compression, other big systems 
> > >benefited from it as well.
> > 
> > I don't propose reverting the IRQ re-name patch and breaking 
> > the big iron without replacing it with something else that works.
> 
> Len, maybe it sounds dramatic and/or extreme, but how about getting rid
> of IRQs and just having GSI-vector pair.
> I intuitively think that would be possible (not that I have all the
> details lined up :)
> And this would probably take away confusing IRQ abstraction out once and
> for all? I think something like that is done in ia64.

x86 users are attached to their interrupt numbers I think back from the bad old
days with only 16 interrupts and interrupt sharing didn't work. We might have a revolt
in the user base if /proc/interrupts didn't display them anymore @)

But I guess using GSI/vector internally only would be fine.

-Andi


^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 18:13 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-04-27 18:13 UTC (permalink / raw)
  To: Protasevich, Natalie, sergio
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray, linux-acpi

>There are probably better ways to control 224 possible IRQs by their
>total number instead of their range, and per-cpu IDTs are the better
>answer to the IRQ shortage altogether. But just going back to 
>the way it was wouldn't be right I think.
>We were able to run 2 generations of
>systems only because we had this compression, other big systems
>benefited from it as well.

I don't propose reverting the IRQ re-name patch and breaking the
big iron without replacing it with something else that works.

My point is that the re-name patch has added unnecessary maintenance
complexity to the 99.9% of systems that it runs on.  We pay that price
in several ways, including mis-understandings about what devices
are on what irqs, and mis-understandings about how the code is
supposed to work.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-27 18:13 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-04-27 18:13 UTC (permalink / raw)
  To: Protasevich, Natalie, sergio
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray, linux-acpi

>There are probably better ways to control 224 possible IRQs by their
>total number instead of their range, and per-cpu IDTs are the better
>answer to the IRQ shortage altogether. But just going back to 
>the way it was wouldn't be right I think.
>We were able to run 2 generations of
>systems only because we had this compression, other big systems
>benefited from it as well.

I don't propose reverting the IRQ re-name patch and breaking the
big iron without replacing it with something else that works.

My point is that the re-name patch has added unnecessary maintenance
complexity to the 99.9% of systems that it runs on.  We pay that price
in several ways, including mis-understandings about what devices
are on what irqs, and mis-understandings about how the code is
supposed to work.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-27 18:13 ` Brown, Len
  (?)
@ 2006-04-27 18:16 ` Andi Kleen
  -1 siblings, 0 replies; 67+ messages in thread
From: Andi Kleen @ 2006-04-27 18:16 UTC (permalink / raw)
  To: Brown, Len
  Cc: Protasevich, Natalie, sergio, Kimball Murray, linux-kernel, akpm,
	kmurray, linux-acpi

On Thursday 27 April 2006 20:13, Brown, Len wrote:
> 
> >There are probably better ways to control 224 possible IRQs by their
> >total number instead of their range, and per-cpu IDTs are the better
> >answer to the IRQ shortage altogether. But just going back to 
> >the way it was wouldn't be right I think.
> >We were able to run 2 generations of
> >systems only because we had this compression, other big systems
> >benefited from it as well.
> 
> I don't propose reverting the IRQ re-name patch and breaking the
> big iron 

It would break VIA, not the big iron.  The big iron is just broken
by not applying the new patch.

> without replacing it with something else that works. 

Sure a lot of users would be unhappy if VIA didn't work anymore.
 
> My point is that the re-name patch has added unnecessary maintenance
> complexity to the 99.9% of systems that it runs on.  We pay that price
> in several ways, including mis-understandings about what devices
> are on what irqs, and mis-understandings about how the code is
> supposed to work.

Undoubtedly it would be cleaner to not have such hacks, but do you have a 
better proposal to make VIA work?

-Andi

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-26 14:00 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-26 14:00 UTC (permalink / raw)
  To: sergio, Brown, Len
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray, linux-acpi

> I think, it is about time, not thinking via quirks as 
> workarounds, because all pcis (on via) are quirked, some are 
> quirked twice.
> And we should think in programmer interrupts of via chipset, 
> in specific function for this propose, for me, doesn't make 
> sense every time VIA put other ID out, we have to add quirks 
> to this ID , as this was an exception. 
> 
> Thanks, 

VIA's numerous pci quirks are not related to this patch. They only hit
one problem with it having only 4 bits encoding their interrupt.
 
> On Tue, 2006-04-25 at 15:53 -0400, Brown, Len wrote:
> > I'd rather see the original irq-renaming patch and its subsequent 
> > multiple via workaround patches reverted than to further complicate 
> > what is becoming a fragile mess.
> > 
> > -Len

There are probably better ways to control 224 possible IRQs by their
total number instead of their range, and per-cpu IDTs are the better
answer to the IRQ shortage altogether. But just going back to the way it
was wouldn't be right I think. We were able to run 2 generations of
systems only because we had this compression, other big systems
benefited from it as well.
Thanks,
--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-26 14:00 ` Protasevich, Natalie
  0 siblings, 0 replies; 67+ messages in thread
From: Protasevich, Natalie @ 2006-04-26 14:00 UTC (permalink / raw)
  To: sergio, Brown, Len
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray, linux-acpi

> I think, it is about time, not thinking via quirks as 
> workarounds, because all pcis (on via) are quirked, some are 
> quirked twice.
> And we should think in programmer interrupts of via chipset, 
> in specific function for this propose, for me, doesn't make 
> sense every time VIA put other ID out, we have to add quirks 
> to this ID , as this was an exception. 
> 
> Thanks, 

VIA's numerous pci quirks are not related to this patch. They only hit
one problem with it having only 4 bits encoding their interrupt.
 
> On Tue, 2006-04-25 at 15:53 -0400, Brown, Len wrote:
> > I'd rather see the original irq-renaming patch and its subsequent 
> > multiple via workaround patches reverted than to further complicate 
> > what is becoming a fragile mess.
> > 
> > -Len

There are probably better ways to control 224 possible IRQs by their
total number instead of their range, and per-cpu IDTs are the better
answer to the IRQ shortage altogether. But just going back to the way it
was wouldn't be right I think. We were able to run 2 generations of
systems only because we had this compression, other big systems
benefited from it as well.
Thanks,
--Natalie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-25 19:53 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-04-25 19:53 UTC (permalink / raw)
  To: Kimball Murray, linux-kernel
  Cc: akpm, ak, kmurray, natalie.protasevich, linux-acpi

I'd rather see the original irq-renaming patch
and its subsequent multiple via workaround patches
reverted than to further complicate what is becoming
a fragile mess.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-25 19:53 ` Brown, Len
  0 siblings, 0 replies; 67+ messages in thread
From: Brown, Len @ 2006-04-25 19:53 UTC (permalink / raw)
  To: Kimball Murray, linux-kernel
  Cc: akpm, ak, kmurray, natalie.protasevich, linux-acpi

I'd rather see the original irq-renaming patch
and its subsequent multiple via workaround patches
reverted than to further complicate what is becoming
a fragile mess.

-Len

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-25 19:53 ` Brown, Len
  (?)
@ 2006-04-26 12:58 ` Sergio Monteiro Basto
  -1 siblings, 0 replies; 67+ messages in thread
From: Sergio Monteiro Basto @ 2006-04-26 12:58 UTC (permalink / raw)
  To: Brown, Len
  Cc: Kimball Murray, linux-kernel, akpm, ak, kmurray,
	natalie.protasevich, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

I think, it is about time, not thinking via quirks as workarounds,
because all pcis (on via) are quirked, some are quirked twice.
And we should think in programmer interrupts of via chipset, in specific
function for this propose, for me, doesn't make sense every time VIA put
other ID out, we have to add quirks to this ID , as this was an
exception. 

Thanks, 


On Tue, 2006-04-25 at 15:53 -0400, Brown, Len wrote:
> I'd rather see the original irq-renaming patch
> and its subsequent multiple via workaround patches
> reverted than to further complicate what is becoming
> a fragile mess.
> 
> -Len
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Sérgio M.B.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 2166 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-25 19:53 ` Brown, Len
  (?)
  (?)
@ 2006-04-26 13:17 ` Andi Kleen
  2006-04-26 13:56   ` Kimball Murray
  -1 siblings, 1 reply; 67+ messages in thread
From: Andi Kleen @ 2006-04-26 13:17 UTC (permalink / raw)
  To: Brown, Len
  Cc: Kimball Murray, linux-kernel, akpm, kmurray, natalie.protasevich,
	linux-acpi

On Tuesday 25 April 2006 21:53, Brown, Len wrote:
> I'd rather see the original irq-renaming patch
> and its subsequent multiple via workaround patches
> reverted than to further complicate what is becoming
> a fragile mess.

Sorry rechecking - i already got the patch now. You want me to drop it again?

I guess we could drop it all, but VIA must still work afterwards.
How would we do that?

-Andi

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-26 13:17 ` Andi Kleen
@ 2006-04-26 13:56   ` Kimball Murray
  2006-04-26 14:01     ` Kimball Murray
  0 siblings, 1 reply; 67+ messages in thread
From: Kimball Murray @ 2006-04-26 13:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Brown, Len, linux-kernel, akpm, kmurray, natalie.protasevich, linux-acpi

Hi Andi,



On 4/26/06, Andi Kleen <ak@suse.de> wrote:
> On Tuesday 25 April 2006 21:53, Brown, Len wrote:
> > I'd rather see the original irq-renaming patch
> > and its subsequent multiple via workaround patches
> > reverted than to further complicate what is becoming
> > a fragile mess.
>
> Sorry rechecking - i already got the patch now. You want me to drop it again?
>
> I guess we could drop it all, but VIA must still work afterwards.
> How would we do that?
>
> -Andi
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-26 13:56   ` Kimball Murray
@ 2006-04-26 14:01     ` Kimball Murray
  0 siblings, 0 replies; 67+ messages in thread
From: Kimball Murray @ 2006-04-26 14:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Brown, Len, linux-kernel, akpm, kmurray, natalie.protasevich, linux-acpi

Oops, previous message got sent before I had typed anything!

Andi, I just wanted to be clear that my patch is not a VIA workaround,
it is a VIA workaround workaround.  So please don't remove my patch
while leaving in the original VIA workaround.  That will break our
platform, and possibly others.

I don't know if there's an easy way to have both the VIA workaround
(Natalie's original patch) and the VIA workaround workaround (my
patch) in a more unified construct.

I believe our platform would work fine with the removal of my patch
_and_ the VIA patch.  But, as you say, what about VIA?

-kimball

On 4/26/06, Kimball Murray <kimball.murray@gmail.com> wrote:
> Hi Andi,
>
>
>
> On 4/26/06, Andi Kleen <ak@suse.de> wrote:
> > On Tuesday 25 April 2006 21:53, Brown, Len wrote:
> > > I'd rather see the original irq-renaming patch
> > > and its subsequent multiple via workaround patches
> > > reverted than to further complicate what is becoming
> > > a fragile mess.
> >
> > Sorry rechecking - i already got the patch now. You want me to drop it again?
> >
> > I guess we could drop it all, but VIA must still work afterwards.
> > How would we do that?
> >
> > -Andi
> >
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-25 16:06 ` Kimball Murray
  0 siblings, 0 replies; 67+ messages in thread
From: Kimball Murray @ 2006-04-25 16:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kimball Murray, akpm, ak, kmurray, natalie.protasevich,
	len.brown, linux-acpi

Per Andi Kleen's request, I have made slight changes to the patch I
submitted last week.  Look for the "snip" further down if you don't
want to re-read the original post.  Thanks!

------- Begin original post -----------------------

Hello!

I'm working on an x86_64 platfrom, where I've hit a problem that appears to
be caused by a patch that went into 2.6.13 some time ago.  (git details are
below).  Basically, that patch introduced a work-around for a VIA chipset
limitation (only 4 bits to store a PCI interrupt).  And that work-around
causes an unfortunate pin collision on our ioapic.

Our system uses an ACPI Interrupt Source Override to inform the OS that the
8254 timer (IRQ0) is on pin 1 of the ioapic.  On that same ioapic, pin 0
handles an interrupt from a PCI device.  The work-around for the VIA chipset
now causes pin 0 to get IRQ0 on our platform, which the timer also claims.
The sad result is both pins 0 and 1 drive IRQ0, but pins 0 and 1 have
different triggering characterists (and polarity), so time learches forward
in an IRQ0 interrupt storm.

I brought this up with Natalie, who provided the VIA patch.  What we came up
with is a means of keeping the VIA work-around, but at the same time, avoiding
the IRQ0 collision.  What I'm doing here is creating a flag in io_apic.c
(timer_uses_ioapic_pin_0), which is checked in mpparse.c.  If the timer is
not using pin 0, and we find a PCI-type interrupt for gsi 0, then I bump
the gsi up to the current value of pci_irq.  Otherwise, it behaves as before.

For reference, here are the git details for the patch that works around the
VIA issue.

--------- original git patch (2.6.13) --------------------

[PATCH] x86: avoid wasting IRQs patch update

The patch addresses a problem with ACPI SCI interrupt entry, which gets
re-used, and the IRQ is assigned to another unrelated device.  The patch
corrects the code such that SCI IRQ is skipped and duplicate entry is
avoided.  Second issue came up with VIA chipset, the problem was caused by
original patch assigning IRQs starting 16 and up.  The VIA chipset uses
4-bit IRQ register for internal interrupt routing, and therefore cannot
handle IRQ numbers assigned to its devices.  The patch corrects this
problem by allowing PCI IRQs below 16.

Signed-off by: Natalie Protasevich <Natalie.Protasevich@unisys.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

---
commit e1afc3f522ed088405fc8932110d338330db82bb
tree 944b79bef5f73bfe1ea7fc5e89cb9e36562d0929
parent 80625942094b114d85811e5ff1fbc9e06dabe0ff
author Natalie.Protasevich@unisys.com <Natalie.Protasevich@unisys.com> 
Fri, 29 Jul 2005 14:03:32 -0700
committer Linus Torvalds <torvalds@g5.osdl.org> Fri, 29 Jul 2005 15:01:13 
-0700

 arch/i386/kernel/mpparse.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/i386/kernel/mpparse.c b/arch/i386/kernel/mpparse.c
index af917f6..ce838ab 100644
--- a/arch/i386/kernel/mpparse.c
+++ b/arch/i386/kernel/mpparse.c
@@ -1116,7 +1116,15 @@ int mp_register_gsi (u32 gsi, int edge_l
                 */
                int irq = gsi;
                if (gsi < MAX_GSI_NUM) {
-                       gsi = pci_irq++;
+                       if (gsi > 15)
+                               gsi = pci_irq++;
+#ifdef CONFIG_ACPI_BUS
+                       /*
+                        * Don't assign IRQ used by ACPI SCI
+                        */
+                       if (gsi == acpi_fadt.sci_int)
+                               gsi = pci_irq++;
+#endif
                        gsi_to_irq[irq] = gsi;
                } else {
                        printk(KERN_ERR "GSI %u is too high\n", gsi);

--------- end of original git patch (2.6.13) ----------


------- End original post -----------------------

And finally, here is my updated patch for both i386 and x86_64,
generated from Linus' git tree. It incorporates Andi's suggsetions.

----------------- snip -----------------------------------
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index f8f132a..d70f2ad 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -2238,6 +2238,8 @@ static inline void unlock_ExtINT_logic(v
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+int timer_uses_ioapic_pin_0;
+
 /*
  * This code may look a bit paranoid, but it's supposed to cooperate with
  * a wide range of boards and BIOS bugs.  Fortunately only the timer IRQ
@@ -2274,6 +2276,9 @@ static inline void check_timer(void)
 	pin2  = ioapic_i8259.pin;
 	apic2 = ioapic_i8259.apic;
 
+	if (pin1 == 0)
+		timer_uses_ioapic_pin_0 = 1;
+
 	printk(KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
 		vector, apic1, pin1, apic2, pin2);
 
diff --git a/arch/i386/kernel/mpparse.c b/arch/i386/kernel/mpparse.c
index 34d21e2..6b1392d 100644
--- a/arch/i386/kernel/mpparse.c
+++ b/arch/i386/kernel/mpparse.c
@@ -1130,7 +1130,17 @@ int mp_register_gsi (u32 gsi, int trigge
 		 */
 		int irq = gsi;
 		if (gsi < MAX_GSI_NUM) {
-			if (gsi > 15)
+			/*
+			 * Retain the VIA chipset work-around (gsi > 15), but
+			 * avoid a problem where the 8254 timer (IRQ0) is setup
+			 * via an override (so it's not on pin 0 of the ioapic),
+			 * and at the same time, the pin 0 interrupt is a PCI
+			 * type.  The gsi > 15 test could cause these two pins
+			 * to be shared as IRQ0, and they are not shareable.
+			 * So test for this condition, and if necessary, avoid
+			 * the pin collision.
+			 */
+			if (gsi > 15 || (gsi == 0 && !timer_uses_ioapic_pin_0))
 				gsi = pci_irq++;
 			/*
 			 * Don't assign IRQ used by ACPI SCI
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 77b4c60..0de3ea9 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -1777,6 +1777,8 @@ static inline void unlock_ExtINT_logic(v
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+int timer_uses_ioapic_pin_0;
+
 /*
  * This code may look a bit paranoid, but it's supposed to cooperate with
  * a wide range of boards and BIOS bugs.  Fortunately only the timer IRQ
@@ -1814,6 +1816,9 @@ static inline void check_timer(void)
 	pin2  = ioapic_i8259.pin;
 	apic2 = ioapic_i8259.apic;
 
+	if (pin1 == 0)
+		timer_uses_ioapic_pin_0 = 1;
+
 	apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
 		vector, apic1, pin1, apic2, pin2);
 
diff --git a/arch/x86_64/kernel/mpparse.c b/arch/x86_64/kernel/mpparse.c
index b17cf3e..083da7e 100644
--- a/arch/x86_64/kernel/mpparse.c
+++ b/arch/x86_64/kernel/mpparse.c
@@ -968,7 +968,17 @@ int mp_register_gsi(u32 gsi, int trigger
 		 */
 		int irq = gsi;
 		if (gsi < MAX_GSI_NUM) {
-			if (gsi > 15)
+			/*
+			 * Retain the VIA chipset work-around (gsi > 15), but
+			 * avoid a problem where the 8254 timer (IRQ0) is setup
+			 * via an override (so it's not on pin 0 of the ioapic),
+			 * and at the same time, the pin 0 interrupt is a PCI
+			 * type.  The gsi > 15 test could cause these two pins
+			 * to be shared as IRQ0, and they are not shareable.
+			 * So test for this condition, and if necessary, avoid
+			 * the pin collision.
+			 */
+			if (gsi > 15 || (gsi == 0 && !timer_uses_ioapic_pin_0))
 				gsi = pci_irq++;
 			/*
 			 * Don't assign IRQ used by ACPI SCI
diff --git a/include/asm-i386/io_apic.h b/include/asm-i386/io_apic.h
index 51c4e5f..d92e253 100644
--- a/include/asm-i386/io_apic.h
+++ b/include/asm-i386/io_apic.h
@@ -200,6 +200,7 @@ extern int io_apic_get_unique_id (int io
 extern int io_apic_get_version (int ioapic);
 extern int io_apic_get_redir_entries (int ioapic);
 extern int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int active_high_low);
+extern int timer_uses_ioapic_pin_0;
 #endif /* CONFIG_ACPI */
 
 extern int (*ioapic_renumber_irq)(int ioapic, int irq);
diff --git a/include/asm-x86_64/io_apic.h b/include/asm-x86_64/io_apic.h
index ee1bc69..52484e8 100644
--- a/include/asm-x86_64/io_apic.h
+++ b/include/asm-x86_64/io_apic.h
@@ -205,6 +205,7 @@ extern int skip_ioapic_setup;
 extern int io_apic_get_version (int ioapic);
 extern int io_apic_get_redir_entries (int ioapic);
 extern int io_apic_set_pci_routing (int ioapic, int pin, int irq, int, int);
+extern int timer_uses_ioapic_pin_0;
 #endif
 
 extern int sis_apic_bug; /* dummy */ 

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
@ 2006-04-25 16:06 ` Kimball Murray
  0 siblings, 0 replies; 67+ messages in thread
From: Kimball Murray @ 2006-04-25 16:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kimball Murray, akpm, ak, kmurray, natalie.protasevich,
	len.brown, linux-acpi

Per Andi Kleen's request, I have made slight changes to the patch I
submitted last week.  Look for the "snip" further down if you don't
want to re-read the original post.  Thanks!

------- Begin original post -----------------------

Hello!

I'm working on an x86_64 platfrom, where I've hit a problem that appears to
be caused by a patch that went into 2.6.13 some time ago.  (git details are
below).  Basically, that patch introduced a work-around for a VIA chipset
limitation (only 4 bits to store a PCI interrupt).  And that work-around
causes an unfortunate pin collision on our ioapic.

Our system uses an ACPI Interrupt Source Override to inform the OS that the
8254 timer (IRQ0) is on pin 1 of the ioapic.  On that same ioapic, pin 0
handles an interrupt from a PCI device.  The work-around for the VIA chipset
now causes pin 0 to get IRQ0 on our platform, which the timer also claims.
The sad result is both pins 0 and 1 drive IRQ0, but pins 0 and 1 have
different triggering characterists (and polarity), so time learches forward
in an IRQ0 interrupt storm.

I brought this up with Natalie, who provided the VIA patch.  What we came up
with is a means of keeping the VIA work-around, but at the same time, avoiding
the IRQ0 collision.  What I'm doing here is creating a flag in io_apic.c
(timer_uses_ioapic_pin_0), which is checked in mpparse.c.  If the timer is
not using pin 0, and we find a PCI-type interrupt for gsi 0, then I bump
the gsi up to the current value of pci_irq.  Otherwise, it behaves as before.

For reference, here are the git details for the patch that works around the
VIA issue.

--------- original git patch (2.6.13) --------------------

[PATCH] x86: avoid wasting IRQs patch update

The patch addresses a problem with ACPI SCI interrupt entry, which gets
re-used, and the IRQ is assigned to another unrelated device.  The patch
corrects the code such that SCI IRQ is skipped and duplicate entry is
avoided.  Second issue came up with VIA chipset, the problem was caused by
original patch assigning IRQs starting 16 and up.  The VIA chipset uses
4-bit IRQ register for internal interrupt routing, and therefore cannot
handle IRQ numbers assigned to its devices.  The patch corrects this
problem by allowing PCI IRQs below 16.

Signed-off by: Natalie Protasevich <Natalie.Protasevich@unisys.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

---
commit e1afc3f522ed088405fc8932110d338330db82bb
tree 944b79bef5f73bfe1ea7fc5e89cb9e36562d0929
parent 80625942094b114d85811e5ff1fbc9e06dabe0ff
author Natalie.Protasevich@unisys.com <Natalie.Protasevich@unisys.com> 
Fri, 29 Jul 2005 14:03:32 -0700
committer Linus Torvalds <torvalds@g5.osdl.org> Fri, 29 Jul 2005 15:01:13 
-0700

 arch/i386/kernel/mpparse.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/i386/kernel/mpparse.c b/arch/i386/kernel/mpparse.c
index af917f6..ce838ab 100644
--- a/arch/i386/kernel/mpparse.c
+++ b/arch/i386/kernel/mpparse.c
@@ -1116,7 +1116,15 @@ int mp_register_gsi (u32 gsi, int edge_l
                 */
                int irq = gsi;
                if (gsi < MAX_GSI_NUM) {
-                       gsi = pci_irq++;
+                       if (gsi > 15)
+                               gsi = pci_irq++;
+#ifdef CONFIG_ACPI_BUS
+                       /*
+                        * Don't assign IRQ used by ACPI SCI
+                        */
+                       if (gsi == acpi_fadt.sci_int)
+                               gsi = pci_irq++;
+#endif
                        gsi_to_irq[irq] = gsi;
                } else {
                        printk(KERN_ERR "GSI %u is too high\n", gsi);

--------- end of original git patch (2.6.13) ----------


------- End original post -----------------------

And finally, here is my updated patch for both i386 and x86_64,
generated from Linus' git tree. It incorporates Andi's suggsetions.

----------------- snip -----------------------------------
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index f8f132a..d70f2ad 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -2238,6 +2238,8 @@ static inline void unlock_ExtINT_logic(v
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+int timer_uses_ioapic_pin_0;
+
 /*
  * This code may look a bit paranoid, but it's supposed to cooperate with
  * a wide range of boards and BIOS bugs.  Fortunately only the timer IRQ
@@ -2274,6 +2276,9 @@ static inline void check_timer(void)
 	pin2  = ioapic_i8259.pin;
 	apic2 = ioapic_i8259.apic;
 
+	if (pin1 == 0)
+		timer_uses_ioapic_pin_0 = 1;
+
 	printk(KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
 		vector, apic1, pin1, apic2, pin2);
 
diff --git a/arch/i386/kernel/mpparse.c b/arch/i386/kernel/mpparse.c
index 34d21e2..6b1392d 100644
--- a/arch/i386/kernel/mpparse.c
+++ b/arch/i386/kernel/mpparse.c
@@ -1130,7 +1130,17 @@ int mp_register_gsi (u32 gsi, int trigge
 		 */
 		int irq = gsi;
 		if (gsi < MAX_GSI_NUM) {
-			if (gsi > 15)
+			/*
+			 * Retain the VIA chipset work-around (gsi > 15), but
+			 * avoid a problem where the 8254 timer (IRQ0) is setup
+			 * via an override (so it's not on pin 0 of the ioapic),
+			 * and at the same time, the pin 0 interrupt is a PCI
+			 * type.  The gsi > 15 test could cause these two pins
+			 * to be shared as IRQ0, and they are not shareable.
+			 * So test for this condition, and if necessary, avoid
+			 * the pin collision.
+			 */
+			if (gsi > 15 || (gsi == 0 && !timer_uses_ioapic_pin_0))
 				gsi = pci_irq++;
 			/*
 			 * Don't assign IRQ used by ACPI SCI
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 77b4c60..0de3ea9 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -1777,6 +1777,8 @@ static inline void unlock_ExtINT_logic(v
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+int timer_uses_ioapic_pin_0;
+
 /*
  * This code may look a bit paranoid, but it's supposed to cooperate with
  * a wide range of boards and BIOS bugs.  Fortunately only the timer IRQ
@@ -1814,6 +1816,9 @@ static inline void check_timer(void)
 	pin2  = ioapic_i8259.pin;
 	apic2 = ioapic_i8259.apic;
 
+	if (pin1 == 0)
+		timer_uses_ioapic_pin_0 = 1;
+
 	apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
 		vector, apic1, pin1, apic2, pin2);
 
diff --git a/arch/x86_64/kernel/mpparse.c b/arch/x86_64/kernel/mpparse.c
index b17cf3e..083da7e 100644
--- a/arch/x86_64/kernel/mpparse.c
+++ b/arch/x86_64/kernel/mpparse.c
@@ -968,7 +968,17 @@ int mp_register_gsi(u32 gsi, int trigger
 		 */
 		int irq = gsi;
 		if (gsi < MAX_GSI_NUM) {
-			if (gsi > 15)
+			/*
+			 * Retain the VIA chipset work-around (gsi > 15), but
+			 * avoid a problem where the 8254 timer (IRQ0) is setup
+			 * via an override (so it's not on pin 0 of the ioapic),
+			 * and at the same time, the pin 0 interrupt is a PCI
+			 * type.  The gsi > 15 test could cause these two pins
+			 * to be shared as IRQ0, and they are not shareable.
+			 * So test for this condition, and if necessary, avoid
+			 * the pin collision.
+			 */
+			if (gsi > 15 || (gsi == 0 && !timer_uses_ioapic_pin_0))
 				gsi = pci_irq++;
 			/*
 			 * Don't assign IRQ used by ACPI SCI
diff --git a/include/asm-i386/io_apic.h b/include/asm-i386/io_apic.h
index 51c4e5f..d92e253 100644
--- a/include/asm-i386/io_apic.h
+++ b/include/asm-i386/io_apic.h
@@ -200,6 +200,7 @@ extern int io_apic_get_unique_id (int io
 extern int io_apic_get_version (int ioapic);
 extern int io_apic_get_redir_entries (int ioapic);
 extern int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int active_high_low);
+extern int timer_uses_ioapic_pin_0;
 #endif /* CONFIG_ACPI */
 
 extern int (*ioapic_renumber_irq)(int ioapic, int irq);
diff --git a/include/asm-x86_64/io_apic.h b/include/asm-x86_64/io_apic.h
index ee1bc69..52484e8 100644
--- a/include/asm-x86_64/io_apic.h
+++ b/include/asm-x86_64/io_apic.h
@@ -205,6 +205,7 @@ extern int skip_ioapic_setup;
 extern int io_apic_get_version (int ioapic);
 extern int io_apic_get_redir_entries (int ioapic);
 extern int io_apic_set_pci_routing (int ioapic, int pin, int irq, int, int);
+extern int timer_uses_ioapic_pin_0;
 #endif
 
 extern int sis_apic_bug; /* dummy */ 

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
  2006-04-25 16:06 ` Kimball Murray
  (?)
@ 2006-04-26 11:27 ` Andi Kleen
  -1 siblings, 0 replies; 67+ messages in thread
From: Andi Kleen @ 2006-04-26 11:27 UTC (permalink / raw)
  To: Kimball Murray
  Cc: linux-kernel, akpm, kmurray, natalie.protasevich, len.brown, linux-acpi

On Tuesday 25 April 2006 18:06, Kimball Murray wrote:
> Per Andi Kleen's request, I have made slight changes to the patch I
> submitted last week.  Look for the "snip" further down if you don't
> want to re-read the original post.  Thanks!

Added to my tree thanks.

But next time please don't paste other patches in the introduction.
It makes the patch merging scripts unhappy. Instead just resend the patch 
with the full initial description again. I fixed it up.

Thanks,

-Ando

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2006-05-09  5:14 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-27 19:32 [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision Brown, Len
2006-04-27 19:32 ` Brown, Len
2006-04-28  8:06 ` hello Teo
  -- strict thread matches above, loose matches on Subject: below --
2006-05-09  5:14 [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision Protasevich, Natalie
2006-05-09  5:14 ` Protasevich, Natalie
2006-05-09  4:25 Brown, Len
2006-05-09  4:25 ` Brown, Len
2006-05-09  3:10 Protasevich, Natalie
2006-05-09  3:10 ` Protasevich, Natalie
2006-05-08 21:51 Brown, Len
2006-05-08 21:51 ` Brown, Len
2006-05-08 18:37 Protasevich, Natalie
2006-05-08 18:37 ` Protasevich, Natalie
2006-05-06  6:42 Protasevich, Natalie
2006-05-06  6:42 ` Protasevich, Natalie
2006-05-06  6:18 Brown, Len
2006-05-06  6:18 ` Brown, Len
2006-05-04 16:42 Protasevich, Natalie
2006-05-04 16:42 ` Protasevich, Natalie
2006-05-04 16:04 Brown, Len
2006-05-04 16:04 ` Brown, Len
2006-05-04 22:41 ` Eric W. Biederman
2006-05-04 22:41   ` Eric W. Biederman
2006-05-04 15:33 Brown, Len
2006-05-04 15:33 ` Brown, Len
2006-05-04  5:07 Brown, Len
2006-05-04  5:07 ` Brown, Len
2006-05-04 15:31 ` Eric W. Biederman
2006-05-04 15:31   ` Eric W. Biederman
2006-05-02 23:52 Protasevich, Natalie
2006-05-02 23:52 ` Protasevich, Natalie
2006-05-02  7:41 Brown, Len
2006-05-02  7:41 ` Brown, Len
2006-05-02  7:46 ` Andi Kleen
2006-05-02  8:33 ` Eric W. Biederman
2006-05-02  8:33   ` Eric W. Biederman
2006-05-01 23:21 Brown, Len
2006-05-01 23:21 ` Brown, Len
2006-05-02  6:14 ` Andi Kleen
2006-05-02  6:57   ` Eric W. Biederman
2006-05-02  7:11     ` Andi Kleen
2006-05-02  7:39       ` Eric W. Biederman
2006-05-02  6:24 ` Eric W. Biederman
2006-05-02  6:24   ` Eric W. Biederman
2006-04-27 20:36 Protasevich, Natalie
2006-04-27 20:36 ` Protasevich, Natalie
2006-04-30 23:17 ` Eric W. Biederman
2006-04-30 23:17   ` Eric W. Biederman
2006-04-27 19:26 Protasevich, Natalie
2006-04-27 19:26 ` Protasevich, Natalie
2006-04-27 19:10 Protasevich, Natalie
2006-04-27 19:10 ` Protasevich, Natalie
2006-04-27 19:13 ` Andi Kleen
2006-04-27 18:13 Brown, Len
2006-04-27 18:13 ` Brown, Len
2006-04-27 18:16 ` Andi Kleen
2006-04-26 14:00 Protasevich, Natalie
2006-04-26 14:00 ` Protasevich, Natalie
2006-04-25 19:53 Brown, Len
2006-04-25 19:53 ` Brown, Len
2006-04-26 12:58 ` Sergio Monteiro Basto
2006-04-26 13:17 ` Andi Kleen
2006-04-26 13:56   ` Kimball Murray
2006-04-26 14:01     ` Kimball Murray
2006-04-25 16:06 Kimball Murray
2006-04-25 16:06 ` Kimball Murray
2006-04-26 11:27 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.