All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
@ 2007-02-05 19:29 Lu, Yinghai
  2007-02-05 19:59 ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Lu, Yinghai @ 2007-02-05 19:29 UTC (permalink / raw)
  To: Andi Kleen, Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, Luigi Genoni, Ingo Molnar,
	Natalie Protasevich

Eric,

How about let apic_hangle_pending_vector take the irq too?
I wonder if there any chance that you have two IRQ_MOVE_PENDING? 


BYW, Andi, Do you want to try "reuse vector when do the irq-balance
between CPUS"?

If So, I will update the patch that you dropped some months ago.

YH



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
  2007-02-05 19:29 [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Lu, Yinghai
@ 2007-02-05 19:59 ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-05 19:59 UTC (permalink / raw)
  To: Lu, Yinghai
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

"Lu, Yinghai" <yinghai.lu@amd.com> writes:

> Eric,
>
> How about let apic_hangle_pending_vector take the irq too?

We can't compute the vector by reading the hardware registers after
we have acknowledged the irq.

I hope that was the answer you were looking for I'm not quite certain
what you mean by take.

> I wonder if there any chance that you have two IRQ_MOVE_PENDING?

I'm not quite certain what you are asking on the face of it there
is one bit per irq so it is impossible.

I don't think we can get into any trouble we don't let a vector be
reused on a cpu until we know it isn't in irr.  So even if things
come in highly out of order the should still function.

I also have seen irqs migrate on nearly every irq received.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-08 20:19                   ` Eric W. Biederman
@ 2007-02-09  6:40                     ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-09  6:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen

>
> The version I would up testing is below, and it doesn't work.
> I still get "No irq handler for vector" warnings as well as
> a couple of complaints from lock/irq debugging.    The debugging
> doesn't worry me.  The fact that I don't have a good way to ensure
> I have no more irqs in flight does.
>
> So unless someone can find a sure way to drain the irqs in flight,
> I can't migrate an irq from process context, and looking at irr and
> handling a pending irq appears required.  '

Bah.  I had not taken into account that the local apic despite
being tightly coupled with the cpu is for programming purposes
an asynchronous device.  If I want to give it time to react to something
I need to read from it.

The routine below actually works. 

My remaining practical question is can this been done cleanly.

Ingo's lock debugging dislikes this routine.  
By using raw_local_irq_enable I have avoided all but a message on
the irq return path, I haven't quite worked out where.

But at least this version feels like it could be done better
(less inline? different helpers?) someone.

For interrupts coming through a sane interrupt controller moving
this into process context would certainly simplify things.  For edge
triggered interrupts coming through an io_apic I'm not at all certain
what makes sense.

When the routine below is used to ack an edge triggered interrupt
it runs before the edge triggered interrupt handler so losing an
edge shouldn't happen (we haven't acknowledged the hardware yet)
and even if we do the device driver gets to run at least once.

So doing migration in the irq handler still looks like the best
solution even if it is ugly.  As long as the little bit of
stack overhead isn't a problem I think enabling interrupts to
clear out any pending irqs certainly looks simpler. 

In another vein.  I went and looked through all of Intel's and
AMD's public errata that I could find and there weren't any associated
with irr or isr, so I think my previous version of the code is still
sane, and not likely to break.

I can improve it a little by getting the vector as:
"vector = ~ get_irq_regs()->orig_rax;" instead of reading
ISR.  That still leaves reading the pending bit in ISR and the
other funny tricks.

I'm conflicted between the two approaches a little because playing
games with enabling interrupts in an interrupt handler seems to
have some weird corner cases.

static void ack_apic(unsigned int irq)
{
#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
	struct irq_desc *desc;
	desc = irq_desc + irq;
	if (likely(!(desc->status & IRQ_MOVE_PENDING)))
		goto simple;

	if (hardirq_count() != HARDIRQ_OFFSET)
		goto simple;

	desc->chip->mask(irq);
	ack_APIC_irq();

	/* Ensure all of the irq handlers for this irq have completed
	 * before we migrate it.
	 */
	spin_unlock(&desc->lock);
	raw_local_irq_enable();
	apic_read(APIC_ID);
	raw_local_irq_disable();
	spin_lock(&desc->lock);

	move_masked_irq(irq);
	desc->chip->unmask(irq);
	return;
simple:
#endif
	ack_APIC_irq();
}


BUG: at /home/eric/projects/linux/linux-2.6-devel/kernel/lockdep.c:1860 trace_hardirqs_on()

Call Trace:
 <IRQ>  [<ffffffff8048562f>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff80290401>] generic_delete_inode+0x0/0x13e
 [<ffffffff8020a0fc>] restore_args+0x0/0x30
 [<ffffffff80290401>] generic_delete_inode+0x0/0x13e
 [<ffffffff8021648d>] ack_apic+0x63/0x99
 [<ffffffff80216485>] ack_apic+0x5b/0x99
 [<ffffffff8025881e>] handle_fasteoi_irq+0xc1/0xd1
 [<ffffffff80290401>] generic_delete_inode+0x0/0x13e
 [<ffffffff8020c0de>] do_IRQ+0x89/0xf3
 [<ffffffff80208ce8>] default_idle+0x35/0x51
 [<ffffffff80208cb3>] default_idle+0x0/0x51
 [<ffffffff8020a0a6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80290401>] generic_delete_inode+0x0/0x13e
 [<ffffffff80208cb3>] default_idle+0x0/0x51
 [<ffffffff80208ce8>] default_idle+0x35/0x51
 [<ffffffff80208cea>] default_idle+0x37/0x51
 [<ffffffff80208ce8>] default_idle+0x35/0x51
 [<ffffffff80208d5a>] cpu_idle+0x56/0x75
 [<ffffffff808b9a69>] start_secondary+0x481/0x490


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-08 11:48                 ` Eric W. Biederman
@ 2007-02-08 20:19                   ` Eric W. Biederman
  2007-02-09  6:40                     ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-08 20:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Lu, Yinghai,
	Luigi Genoni, Natalie Protasevich, Andi Kleen

ebiederm@xmission.com (Eric W. Biederman) writes:

> Ingo Molnar <mingo@elte.hu> writes:
>
>> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>>> Ingo would it be reasonable to get a wait queue so I can wait for an 
>>> irq that needs the delayed disable action to actually become masked?
>>
>> that might make sense, but what will do the wakeup - incidental IRQ 
>> arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
>> generate IRQs for a really long time.
>
> I still need to test this, but I believe I have found a simpler
> way to avoid irr problems during migration, and I believe the code
> works equally well with either edge or level triggered interrupts.
>
> The idea is this: Instead of trying test for and handle when irr
> occurs, simply enable local interrupts after disabling and
> acknowledging the irq so that anything pending will be processed,
> before we perform the migration operation.
>
> I don't think the edge case cares about the mask/ack order but
> but masking before acking appears important for the level triggered
> case, so we might as well use that order for both.
>
> Does this look like a sane way to handle this?

The version I would up testing is below, and it doesn't work.
I still get "No irq handler for vector" warnings as well as
a couple of complaints from lock/irq debugging.    The debugging
doesn't worry me.  The fact that I don't have a good way to ensure
I have no more irqs in flight does.

So unless someone can find a sure way to drain the irqs in flight,
I can't migrate an irq from process context, and looking at irr and
handling a pending irq appears required.  '

Eric

static void ack_apic(unsigned int irq)
{
#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
	if (unlikely((irq_desc[irq].status & IRQ_MOVE_PENDING) && 
		     (hardirq_count() == HARDIRQ_OFFSET))) {
		struct irq_desc *desc = irq_desc + irq;
		desc->chip->mask(irq);
		ack_APIC_irq();

		/* Ensure all of the irq handlers for this irq have completed
		 * before we migrate it.
		 */
		raw_local_irq_enable();
		cpu_relax();
		raw_local_irq_disable();
		synchronize_irq(irq);

		move_masked_irq(irq);
		desc->chip->unmask(irq);
		return;
	}
#endif
	ack_APIC_irq();
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-06 22:25               ` Ingo Molnar
  2007-02-07  2:33                 ` Eric W. Biederman
@ 2007-02-08 11:48                 ` Eric W. Biederman
  2007-02-08 20:19                   ` Eric W. Biederman
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-08 11:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ingo would it be reasonable to get a wait queue so I can wait for an 
>> irq that needs the delayed disable action to actually become masked?
>
> that might make sense, but what will do the wakeup - incidental IRQ 
> arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
> generate IRQs for a really long time.

I still need to test this, but I believe I have found a simpler
way to avoid irr problems during migration, and I believe the code
works equally well with either edge or level triggered interrupts.

The idea is this: Instead of trying test for and handle when irr
occurs, simply enable local interrupts after disabling and
acknowledging the irq so that anything pending will be processed,
before we perform the migration operation.

I don't think the edge case cares about the mask/ack order but
but masking before acking appears important for the level triggered
case, so we might as well use that order for both.

Does this look like a sane way to handle this?

static void ack_apic(unsigned int irq)
{
	struct irq_desc *desc = irq_desc + irq;
	int do_unmask_irq = 0;
	if (unlikely((irq_desc[irq].status & IRQ_MOVE_PENDING) && !in_irq()) {
		do_unmask_irq = 1;
		desc->chip->mask(irq);
	}
	ack_APIC_irq();
	if (unlikely(do_unmask_irq)) {
		/* Don't let pending irqs accumulate */
		local_irq_enable();
                syncrhonize_irq(irq);
		move_masked_irq(irq);
		local_irq_disable();
		desc->chip->unmask(irq);
	}
}

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-06 22:25               ` Ingo Molnar
@ 2007-02-07  2:33                 ` Eric W. Biederman
  2007-02-08 11:48                 ` Eric W. Biederman
  1 sibling, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-07  2:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ingo would it be reasonable to get a wait queue so I can wait for an 
>> irq that needs the delayed disable action to actually become masked?
>
> that might make sense, but what will do the wakeup - incidental IRQ 
> arriving on the new CPU? 

That is what I was thinking.

> Isnt that a bit risky - maybe the device wont 
> generate IRQs for a really long time.

Well this is in a user space context called from user space and it
exactly matches the semantics we have now.  If we make it an
interruptible sleep the user space process shouldn't block.

I guess the other thing to do is do it in a non-block fashion
and just call schedule_work from the interrupt context when the
irq is disabled.  For i386 with it's in kernel irq scheduler
that might be better.

I think the nasty case is probably what do we do when it is
the timer interrupt we are dealing with.

Hmm.  I think I should look up what the rules are for
calling local_irq_enable when in interrupt context.  That
might be another way to satisfy this problem.

If local irqs are enabled I don't have to worry about the irr
register.

You've got me brainstorming now.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-06 22:16             ` Eric W. Biederman
@ 2007-02-06 22:25               ` Ingo Molnar
  2007-02-07  2:33                 ` Eric W. Biederman
  2007-02-08 11:48                 ` Eric W. Biederman
  0 siblings, 2 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-02-06 22:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Ingo would it be reasonable to get a wait queue so I can wait for an 
> irq that needs the delayed disable action to actually become masked?

that might make sense, but what will do the wakeup - incidental IRQ 
arriving on the new CPU? Isnt that a bit risky - maybe the device wont 
generate IRQs for a really long time.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-06  7:36           ` Ingo Molnar
  2007-02-06  8:57             ` Eric W. Biederman
       [not found]             ` <200702061012.25910.luigi.genoni@pirelli.com>
@ 2007-02-06 22:16             ` Eric W. Biederman
  2007-02-06 22:25               ` Ingo Molnar
  2 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-06 22:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen


Ingo would it be reasonable to get a wait queue so I can wait for an
irq that needs the delayed disable action to actually become masked?

There are a lot of practical reasons while I cannot reprogram an
unmasked irq.  However if we can wait it should be able to get all of
the convoluted irq migration logic out of interrupt context, and into
process context.

If know interrupts have been enabled on the cpus where an irq could be
delivered after the irq has been masked, I know that the irq is not
currently in progress. 

Therefore I know the irq will not be in progress again until the irq
is unmasked.

Once I know the irq will not be received again it is safe to modify
the irq handling data structures and the interrupt controller.

I think the only generic piece missing from this equation is a wait
queue so I can wait for the irq to become masked.  Say a generic
sleeping masq_irq() call?

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
       [not found]             ` <200702061012.25910.luigi.genoni@pirelli.com>
@ 2007-02-06 22:05               ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-06 22:05 UTC (permalink / raw)
  To: Luigi Genoni
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Lu, Yinghai,
	Natalie Protasevich, Andi Kleen

"Luigi Genoni" <luigi.genoni@pirelli.com> writes:

> btw, I tested in on ( CPU and no way to reproduce the bug, (no messages 
> appeared and load average was quite as high as exspected). 
> Going to test an a 16 CPU (the same who triggered the bug at the beginning) 
> immediatelly when it will be free.

Thanks.  Hopefully that will confirm I have properly tracked this bug.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-06  7:36           ` Ingo Molnar
@ 2007-02-06  8:57             ` Eric W. Biederman
       [not found]             ` <200702061012.25910.luigi.genoni@pirelli.com>
  2007-02-06 22:16             ` Eric W. Biederman
  2 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-06  8:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen


Ingo thanks for the review.

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> When making the interrupt vectors per cpu I failed to handle a case 
>> during irq migration.  If the same interrupt comes in while we are 
>> servicing the irq but before we migrate it the pending bit in the 
>> local apic IRR register will be set for that irq.
>
> hm. I think this needs more work. For example this part of the fix looks 
> quite ugly to me:

I'm not at all certain I can make an ugly reality look beautiful.

>> +static unsigned apic_in_service_vector(void)
>> +{
>> +	unsigned isr, vector;
>> +	/* Figure out which vector we are servicing */
>> + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector +=
> 32) {
>> +		isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
>> +		if (isr)
>> +			break;
>> +	}
>> +	/* Find the low bits of the vector we are servicing */
>> +	vector += __ffs(isr);
>> +	return vector;
>
> so we read the hardware to figure out what the hell we are doing. The 
> problem is - why doesnt the kernel know at this point what it is doing? 
> It knows the CPU and it should know all the vector numbers. It also has 
> an irq number.

Yes.  And by adding a percpu global I can do this.  If figured since this
should be a rare case it would be equally simple to read this from
the hardware, as it already has the information stored in roughly the
form I would need to store it in.  If there errata in this area and I
am likely to hit them then it is probably a good idea to look at a
different implementation.


>> +	/* If the irq we are servicing has moved and is not pending
>> +	 * free it's vector.
>> +	 */
>> +	irr = apic_read(APIC_IRR + ((vector/32) * 0x10));
>
> the IRR is quite fragile. It's a rarely used hardware field and it has 
> erratums attached to it. Again, it seems like this function too tries to 
> recover information it /should/ already have.

If I am servicing an interrupt and that same interrupt comes in again
before I acknowledge it how /should/ I know that?

The only way I know to find that information is to ask the hardware.

>> @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
>>  
>>  static void ack_apic_edge(unsigned int irq)
>>  {
>> -	move_native_irq(irq);
>> +	if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
>> +		move_native_irq(irq);
>> +		apic_handle_pending_vector(apic_in_service_vector());
>> +	}
>>  	ack_APIC_irq();
>
> this looks a bit ugly and a bit fragile. We had a simple 
> 'move_native_irq()' call for IRQ balancing, which is straightforward, 
> but now we've got this complex condition open coded.

Well the condition is testing a single bit so I don't think it is that
complex.  Maybe taking it out of line will help or maybe that will obscure
things.  I'm inclined to believe hiding the irq migration logic will
obscure things and make it harder to debug.

Now part of the reason I did it this way is I have at least 3
set_affinity implementations and this issue really has nothing to do
with the external interrupt controller but everything to do with the
cpu local interrupt controller, so this did not seem like something
that was reasonably buried in set_affinity.

> If then this should be done in some sort of helper - but even then, the 
> whole approach looks a bit erroneous.
>
> To me the cleanest way to migrate an IRQ between two different vectors 
> on two different CPUs would be to first mask the IRQ source in the PIC, 
> then to do the migration /atomically/ (no intermediary state), and then 
> unmask. If the PIC loses edges while masked then that's a problem of the 
> irq chip implementation of the PIC: its ->set_affinity() method should 
> refuse to migrate edge-triggered IRQs if it can lose edges while 
> unmasked!

Ingo I believe what you have described is essentially what we are
doing before my patches, or what we were doing in even older versions
that had other races and problems. 

To some extent I have inherited the current design that mostly works.  The
only known reliable way to block and edge triggered irq is to be servicing it.
The practical problem there is that when we sit on an irq the irq can come
in again and queue up in irr.  Which means that once we have updated the
data structures acked the irq and returned the irq will come in again
in the old location because the io_apic has a 1 deep queue for each
irq.  Of course for the irq controllers that can be masked safely your
proposed change to disable irq is unfortunate.

You appear to be lobbying for disabling the irq asynchronously to all
of the irq reception activity.  That would seem to me to require
running on the cpu where the irq is currently programmed to be
delivered disabling local interrupts, as well as disabling the irq in
the interrupt controller before reprogramming it.  For the irqs that
applies to there does seem to be some merit in that. 

Of course the irq will queue in irr so it can be delivered when the
irqs are enabled again and therefore we have to be very careful about
data structure changes and not removing them if we have the specified
irq pending on the current cpu.  So I don't see how this changes the
solution domain in any meaningful way, except by reducing the set
of interrupt controllers we can support with it, because it requires
masking.

The core question:  How do I know when all interrupts messages that
have sent have been serviced and acknowledged?

If you have the guarantee that the irq is disabled and queuing in the
interrupt controller and all interrupt messages that have been sent
have been serviced and acknowledged.   Then the reprogramming problem
is simple.  Otherwise we have to cope with interrupts in the cpu local
queue.

Dropping support for the more difficult cases when we know how to
migrate them and they are in fact some of our most common interrupts
like the timer interrupt seems an irresponsible thing to do.  Now
if you can make the case that we cannot migrate them safely that
is another issue.

I'm open to suggestions and with my simple patch we at least won't
hang the kernel when this happens.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  0:35         ` [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Eric W. Biederman
  2007-02-03  1:05           ` Andrew Morton
  2007-02-03 10:01           ` Andi Kleen
@ 2007-02-06  7:36           ` Ingo Molnar
  2007-02-06  8:57             ` Eric W. Biederman
                               ` (2 more replies)
  2 siblings, 3 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-02-06  7:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Natalie Protasevich, Andi Kleen


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> When making the interrupt vectors per cpu I failed to handle a case 
> during irq migration.  If the same interrupt comes in while we are 
> servicing the irq but before we migrate it the pending bit in the 
> local apic IRR register will be set for that irq.

hm. I think this needs more work. For example this part of the fix looks 
quite ugly to me:

> +static unsigned apic_in_service_vector(void)
> +{
> +	unsigned isr, vector;
> +	/* Figure out which vector we are servicing */
> +	for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector += 32) {
> +		isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
> +		if (isr)
> +			break;
> +	}
> +	/* Find the low bits of the vector we are servicing */
> +	vector += __ffs(isr);
> +	return vector;

so we read the hardware to figure out what the hell we are doing. The 
problem is - why doesnt the kernel know at this point what it is doing? 
It knows the CPU and it should know all the vector numbers. It also has 
an irq number.

> +	/* If the irq we are servicing has moved and is not pending
> +	 * free it's vector.
> +	 */
> +	irr = apic_read(APIC_IRR + ((vector/32) * 0x10));

the IRR is quite fragile. It's a rarely used hardware field and it has 
erratums attached to it. Again, it seems like this function too tries to 
recover information it /should/ already have.

> @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
>  
>  static void ack_apic_edge(unsigned int irq)
>  {
> -	move_native_irq(irq);
> +	if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
> +		move_native_irq(irq);
> +		apic_handle_pending_vector(apic_in_service_vector());
> +	}
>  	ack_APIC_irq();

this looks a bit ugly and a bit fragile. We had a simple 
'move_native_irq()' call for IRQ balancing, which is straightforward, 
but now we've got this complex condition open coded.

If then this should be done in some sort of helper - but even then, the 
whole approach looks a bit erroneous.

To me the cleanest way to migrate an IRQ between two different vectors 
on two different CPUs would be to first mask the IRQ source in the PIC, 
then to do the migration /atomically/ (no intermediary state), and then 
unmask. If the PIC loses edges while masked then that's a problem of the 
irq chip implementation of the PIC: its ->set_affinity() method should 
refuse to migrate edge-triggered IRQs if it can lose edges while 
unmasked!

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
@ 2007-02-05 23:33 Lu, Yinghai
  0 siblings, 0 replies; 26+ messages in thread
From: Lu, Yinghai @ 2007-02-05 23:33 UTC (permalink / raw)
  To: ebiederm
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

-----Original Message-----
From: ebiederm@xmission.com [mailto:ebiederm@xmission.com] 
Sent: Monday, February 05, 2007 3:03 PM
>Nope. irq routines are a stack.  if apic_in_service_vector could return
>the wrong value.  ack_APIC_irq() which use the same information would
>acknowledge the wrong irq.  If there was actually any danger of
>mis-computing that information I would just pass it from the interrupt
>service routine stash it in a per cpu variable and then read it out.
>But the apic already has registers doing that, so I was lazy and used
>what was available.  It should be the common case that we need that
>information.

OK.

I wonder if current kernel support different cpu handle irq request for
different device at the same time.

YH



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
  2007-02-05 20:54 Lu, Yinghai
@ 2007-02-05 23:03 ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-05 23:03 UTC (permalink / raw)
  To: Lu, Yinghai
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

"Lu, Yinghai" <yinghai.lu@amd.com> writes:

> -----Original Message-----
> From: ebiederm@xmission.com [mailto:ebiederm@xmission.com] 
> Sent: Monday, February 05, 2007 12:37 PM
>
>
>>The only corner case I can see that might potentially happen is
>>"apic_in_service_vector() != irq_vector[irq]" and if that is the case
>>we don't want to migrate, because the precondition that we are in the
>>irq handler servicing the expected irq isn't true.
>
> Reuse vector could help in that case.

A little but you are still on the wrong cpu, and that corner case might
even be worth looking at on i386.  I haven't assessed what might go wrong
yet but the current strategy of migrating irqs is based in the interrupt
service routine being where new instances of that irq will be
delivered.

What I do know is all of this is a very narrow window, and insanely
hard to trigger with consequences that shouldn't hang the machine.
The only reason we are even getting a reasonable number of irqs being
in service and in the irr is because we are acking a level triggered
vector used by two irqs and then the pending irq retriggers.  Avoiding
that case would certainly be an efficiency improvement.

Basically I think it takes the alignment of several 1 in a million
chances before the code I posted will have problems with this
theoretical case.  

> In another case, if two irq are migrated from one cpu to another cpu.
> ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1,
> and handle that to clear irr for irq1. instead of irq2.

Nope. irq routines are a stack.  if apic_in_service_vector could return
the wrong value.  ack_APIC_irq() which use the same information would
acknowledge the wrong irq.  If there was actually any danger of
mis-computing that information I would just pass it from the interrupt
service routine stash it in a per cpu variable and then read it out.
But the apic already has registers doing that, so I was lazy and used
what was available.  It should be the common case that we need that
information.


Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
@ 2007-02-05 20:54 Lu, Yinghai
  2007-02-05 23:03 ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Lu, Yinghai @ 2007-02-05 20:54 UTC (permalink / raw)
  To: ebiederm
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

-----Original Message-----
From: ebiederm@xmission.com [mailto:ebiederm@xmission.com] 
Sent: Monday, February 05, 2007 12:37 PM


>The only corner case I can see that might potentially happen is
>"apic_in_service_vector() != irq_vector[irq]" and if that is the case
>we don't want to migrate, because the precondition that we are in the
>irq handler servicing the expected irq isn't true.

Reuse vector could help in that case.

In another case, if two irq are migrated from one cpu to another cpu.
ack_apic_edge for irq2 could use get apci_in_servier_vector for irq1,
and handle that to clear irr for irq1. instead of irq2.

YH





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
  2007-02-05 20:11 Lu, Yinghai
@ 2007-02-05 20:37 ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-05 20:37 UTC (permalink / raw)
  To: Lu, Yinghai
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

"Lu, Yinghai" <yinghai.lu@amd.com> writes:

> From: ebiederm@xmission.com [mailto:ebiederm@xmission.com] 
>>> How about let apic_hangle_pending_vector take the irq too?
>
>>We can't compute the vector by reading the hardware registers after
>>we have acknowledged the irq.
>
>>I hope that was the answer you were looking for I'm not quite certain
>>what you mean by take.
>
> I mean 

I don't see the point.  We would have to miscompute the vector that
we are servicing or improperly fill in vector_irq for that to be useful.

The only corner case I can see that might potentially happen is
"apic_in_service_vector() != irq_vector[irq]" and if that is the case
we don't want to migrate, because the precondition that we are in the
irq handler servicing the expected irq isn't true.

Of course someone would have to set IRQ_MOVE_PENDING pretty fast for
that case to hit, it's a tiny hole probably worth closing though.

Eric


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration.
@ 2007-02-05 20:11 Lu, Yinghai
  2007-02-05 20:37 ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Lu, Yinghai @ 2007-02-05 20:11 UTC (permalink / raw)
  To: ebiederm
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

From: ebiederm@xmission.com [mailto:ebiederm@xmission.com] 
>> How about let apic_hangle_pending_vector take the irq too?

>We can't compute the vector by reading the hardware registers after
>we have acknowledged the irq.

>I hope that was the answer you were looking for I'm not quite certain
>what you mean by take.

I mean 


+static void apic_handle_pending_vector(unsigned vector)
+{
+	unsigned irr;
+	int irq;
+
+	irq = __get_cpu_var(vector_irq)[vector];
+	if (irq >= 0)
+		return;
+
+	/* If the irq we are servicing has moved and is not pending
+	 * free it's vector.
+	 */

==>

+static void apic_handle_pending_vector(unsigned vector, unsigned irqx)
+{
+	unsigned irr;
+	int irq;
+
+	irq = __get_cpu_var(vector_irq)[vector];
+	if ( (-irq) != irqx)
+		return;
+ 
+
+	/* If the irq we are servicing has moved and is not pending
+	 * free it's vector.
+	 */




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  7:55               ` Eric W. Biederman
@ 2007-02-03 14:31                 ` l.genoni
  0 siblings, 0 replies; 26+ messages in thread
From: l.genoni @ 2007-02-03 14:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Arjan van de Ven, Andrew Morton, linux-kernel, Lu, Yinghai,
	Luigi Genoni, Ingo Molnar, Natalie Protasevich, Andi Kleen,
	l.genoni

As I reported when I tested this patch, it works, but I could see an 
abnormally high load averay while triggering the error message. anyway, it 
is better to have an high load averag three or four times higher than what 
you would expect then a crash/reboot. isn't it? :)

Luigi Genoni

p.s.
will test the other definitive patch on montday on both 8 and 16 CPU 
system.

On Sat, 3 Feb 2007, Eric W. Biederman wrote:

> Date: Sat, 03 Feb 2007 00:55:11 -0700
> From: Eric W. Biederman <ebiederm@xmission.com>
> To: Arjan van de Ven <arjan@infradead.org>
> Cc: Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org,
>     "Lu, Yinghai" <yinghai.lu@amd.com>,
>     Luigi Genoni <luigi.genoni@pirelli.com>, Ingo Molnar <mingo@elte.hu>,
>     Natalie Protasevich <protasnb@gmail.com>, Andi Kleen <ak@suse.de>
> Subject: Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq
>     migration.
> Resent-Date: Sat, 03 Feb 2007 09:05:10 +0100
> Resent-From: <l.genoni@sns.it>
> 
> Arjan van de Ven <arjan@infradead.org> writes:
>
>>>> Once the migration operation is complete we know we will receive
>>>> no more interrupts on this vector so the irq pending state for
>>>> this irq will no longer be updated.  If the irq is not pending and
>>>> we are in the intermediate state we immediately free the vector,
>>>> otherwise in we free the vector in do_IRQ when the pending irq
>>>> arrives.
>>>
>>> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
>>> I assume it doesn't affect many people?
>>
>> I got a few reports of this; irqbalance may trigger this kernel bug it
>> seems... I would suggest to consider this for 2.6.20 since it's a
>> hard-hang case
>
>
> Yes.  The bug I fixed will not happen if you don't migrate irqs.
>
> At the very least we want the patch below (already in -mm)
> that makes it not a hard hang case.
>
> Subject: [PATCH] x86_64:  Survive having no irq mapping for a vector
>
> Occasionally the kernel has bugs that result in no irq being
> found for a given cpu vector.  If we acknowledge the irq
> the system has a good chance of continuing even though we dropped
> an missed an irq message.  If we continue to simply print a
> message and drop and not acknowledge the irq the system is
> likely to become non-responsive shortly there after.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> arch/x86_64/kernel/irq.c |   11 ++++++++---
> 1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
> index 0c06af6..648055a 100644
> --- a/arch/x86_64/kernel/irq.c
> +++ b/arch/x86_64/kernel/irq.c
> @@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)
>
> 	if (likely(irq < NR_IRQS))
> 		generic_handle_irq(irq);
> -	else if (printk_ratelimit())
> -		printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
> -			__func__, smp_processor_id(), vector);
> +	else {
> +		if (!disable_apic)
> +			ack_APIC_irq();
> +
> +		if (printk_ratelimit())
> +			printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
> +				__func__, smp_processor_id(), vector);
> +	}
>
> 	irq_exit();
>
> -- 
> 1.4.4.1.g278f
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03 10:22             ` Eric W. Biederman
@ 2007-02-03 10:26               ` Andi Kleen
  0 siblings, 0 replies; 26+ messages in thread
From: Andi Kleen @ 2007-02-03 10:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

On Saturday 03 February 2007 11:22, Eric W. Biederman wrote:
> Andi Kleen <ak@suse.de> writes:
> 
> >> Once the migration operation is complete we know we will receive
> >> no more interrupts on this vector so the irq pending state for
> >> this irq will no longer be updated.  If the irq is not pending and
> >> we are in the intermediate state we immediately free the vector,
> >> otherwise in we free the vector in do_IRQ when the pending irq
> >> arrives.
> >
> > Ok for me, although the magic numbers are a little nasty.
> 
> You must be talking about (vector/32) *0x10.

No I meant the -1s

-Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03 10:01           ` Andi Kleen
@ 2007-02-03 10:22             ` Eric W. Biederman
  2007-02-03 10:26               ` Andi Kleen
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-03 10:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich

Andi Kleen <ak@suse.de> writes:

>> Once the migration operation is complete we know we will receive
>> no more interrupts on this vector so the irq pending state for
>> this irq will no longer be updated.  If the irq is not pending and
>> we are in the intermediate state we immediately free the vector,
>> otherwise in we free the vector in do_IRQ when the pending irq
>> arrives.
>
> Ok for me, although the magic numbers are a little nasty.

You must be talking about (vector/32) *0x10.
The 32 is the number of bits and 0x10 the gap between apic
registers.  I couldn't think of a better form.    If someone
can think of a better way it probably warrants a cleanup patch
at some point.

> What about i386?

i386 does not handle this case but since it is still globally
allocating all of it's vectors and never changes it's vectors during
migration it is totally harmless when an irq comes in on a cpu other
than the one we are expecting it on.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  0:35         ` [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Eric W. Biederman
  2007-02-03  1:05           ` Andrew Morton
@ 2007-02-03 10:01           ` Andi Kleen
  2007-02-03 10:22             ` Eric W. Biederman
  2007-02-06  7:36           ` Ingo Molnar
  2 siblings, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2007-02-03 10:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich


> Once the migration operation is complete we know we will receive
> no more interrupts on this vector so the irq pending state for
> this irq will no longer be updated.  If the irq is not pending and
> we are in the intermediate state we immediately free the vector,
> otherwise in we free the vector in do_IRQ when the pending irq
> arrives.

Ok for me, although the magic numbers are a little nasty.

What about i386?

-Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  7:32             ` Arjan van de Ven
@ 2007-02-03  7:55               ` Eric W. Biederman
  2007-02-03 14:31                 ` l.genoni
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-03  7:55 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich, Andi Kleen

Arjan van de Ven <arjan@infradead.org> writes:

>> > Once the migration operation is complete we know we will receive
>> > no more interrupts on this vector so the irq pending state for
>> > this irq will no longer be updated.  If the irq is not pending and
>> > we are in the intermediate state we immediately free the vector,
>> > otherwise in we free the vector in do_IRQ when the pending irq
>> > arrives.
>> 
>> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
>> I assume it doesn't affect many people?
>
> I got a few reports of this; irqbalance may trigger this kernel bug it
> seems... I would suggest to consider this for 2.6.20 since it's a
> hard-hang case


Yes.  The bug I fixed will not happen if you don't migrate irqs.

At the very least we want the patch below (already in -mm)
that makes it not a hard hang case.

Subject: [PATCH] x86_64:  Survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being
found for a given cpu vector.  If we acknowledge the irq
the system has a good chance of continuing even though we dropped
an missed an irq message.  If we continue to simply print a
message and drop and not acknowledge the irq the system is
likely to become non-responsive shortly there after.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/x86_64/kernel/irq.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..648055a 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)
 
 	if (likely(irq < NR_IRQS))
 		generic_handle_irq(irq);
-	else if (printk_ratelimit())
-		printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
-			__func__, smp_processor_id(), vector);
+	else {
+		if (!disable_apic)
+			ack_APIC_irq();
+
+		if (printk_ratelimit())
+			printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
+				__func__, smp_processor_id(), vector);
+	}
 
 	irq_exit();
 
-- 
1.4.4.1.g278f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  1:05           ` Andrew Morton
  2007-02-03  1:39             ` Eric W. Biederman
@ 2007-02-03  7:32             ` Arjan van de Ven
  2007-02-03  7:55               ` Eric W. Biederman
  1 sibling, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2007-02-03  7:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, linux-kernel, Lu, Yinghai, Luigi Genoni,
	Ingo Molnar, Natalie Protasevich, Andi Kleen


> > Once the migration operation is complete we know we will receive
> > no more interrupts on this vector so the irq pending state for
> > this irq will no longer be updated.  If the irq is not pending and
> > we are in the intermediate state we immediately free the vector,
> > otherwise in we free the vector in do_IRQ when the pending irq
> > arrives.
> 
> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
> I assume it doesn't affect many people?

I got a few reports of this; irqbalance may trigger this kernel bug it
seems... I would suggest to consider this for 2.6.20 since it's a
hard-hang case


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  1:39             ` Eric W. Biederman
@ 2007-02-03  2:01               ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2007-02-03  2:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Lu, Yinghai, Luigi Genoni, Ingo Molnar,
	Natalie Protasevich, Andi Kleen

On Fri, 02 Feb 2007 18:39:15 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Andrew Morton <akpm@osdl.org> writes:
> 
> > So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
> > I assume it doesn't affect many people?
> 
> If it's not to late, and this patch isn't too scary.
> 
> It's a really rare set of circumstances that trigger it, but the
> possibility of being hit is pretty widespread, anything with
> more than one cpu, and more then one irq could see this.
> 
> The easiest way to trigger this is to have two level triggered irqs on
> two different cpus using the same vector.  In that case if one acks
> it's irq while the other irq is migrating to a different cpu 2.6.19
> get completely confused and stop handling interrupts properly.
> 
> With my previous bug fix (not to drop the ack when we are confused)
> the machine will stay up, and that is obviously correct and can't
> affect anything else so is probably a candidate for the stable tree.
> 
> With this fix everything just works.
> 
> I don't know how often a legitimate case of the exact same irq
> going off twice in a row is, but that is a possibility as well
> especially with edge triggered interrupts.
> 
> Setting up the test scenario was a pain, but by extremely limiting
> my choice of vectors I was able to confirm I survived several hundred
> of these events with in a couple of minutes no problem.
> 

OK, thanks.  Let's await Andi's feedback.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  1:05           ` Andrew Morton
@ 2007-02-03  1:39             ` Eric W. Biederman
  2007-02-03  2:01               ` Andrew Morton
  2007-02-03  7:32             ` Arjan van de Ven
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-03  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Lu, Yinghai, Luigi Genoni, Ingo Molnar,
	Natalie Protasevich, Andi Kleen

Andrew Morton <akpm@osdl.org> writes:

> So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
> I assume it doesn't affect many people?

If it's not to late, and this patch isn't too scary.

It's a really rare set of circumstances that trigger it, but the
possibility of being hit is pretty widespread, anything with
more than one cpu, and more then one irq could see this.

The easiest way to trigger this is to have two level triggered irqs on
two different cpus using the same vector.  In that case if one acks
it's irq while the other irq is migrating to a different cpu 2.6.19
get completely confused and stop handling interrupts properly.

With my previous bug fix (not to drop the ack when we are confused)
the machine will stay up, and that is obviously correct and can't
affect anything else so is probably a candidate for the stable tree.

With this fix everything just works.

I don't know how often a legitimate case of the exact same irq
going off twice in a row is, but that is a possibility as well
especially with edge triggered interrupts.

Setting up the test scenario was a pain, but by extremely limiting
my choice of vectors I was able to confirm I survived several hundred
of these events with in a couple of minutes no problem.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  0:35         ` [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Eric W. Biederman
@ 2007-02-03  1:05           ` Andrew Morton
  2007-02-03  1:39             ` Eric W. Biederman
  2007-02-03  7:32             ` Arjan van de Ven
  2007-02-03 10:01           ` Andi Kleen
  2007-02-06  7:36           ` Ingo Molnar
  2 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2007-02-03  1:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Lu, Yinghai, Luigi Genoni, Ingo Molnar,
	Natalie Protasevich, Andi Kleen

On Fri, 02 Feb 2007 17:35:31 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> When making the interrupt vectors per cpu I failed to handle a case
> during irq migration.  If the same interrupt comes in while we are
> servicing the irq but before we migrate it the pending bit in the
> local apic IRR register will be set for that irq.
> 
> After migrating the irq to another cpu and or vector the data
> structures will no longer be setup to handle this pending irq.  Then as
> soon as we return from servicing the irq we just migrated we will get
> a nasty: "No irq handler for vector" message. 
> 
> Since we do not disable irqs for edge triggered interrupts except
> in the smallest possible window during migration I cannot avoid
> migrating an irq in the unlikely event that it becomes pending.
> This is because by the time the irq could no longer become pending
> I have already updated all of my data structures.
> 
> Therefore this patch introduces an intermediate state that
> exists soley on the cpu where we are handling the irq during
> migration.  The irq number is changed to negative in the
> vector_irq data structure.
> 
> Once the migration operation is complete we know we will receive
> no more interrupts on this vector so the irq pending state for
> this irq will no longer be updated.  If the irq is not pending and
> we are in the intermediate state we immediately free the vector,
> otherwise in we free the vector in do_IRQ when the pending irq
> arrives.

So is this a for-2.6.20 thing?  The bug was present in 2.6.19, so
I assume it doesn't affect many people?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
  2007-02-03  0:31       ` [PATCH 1/2] x86_64 irq: Simplfy __assign_irq_vector Eric W. Biederman
@ 2007-02-03  0:35         ` Eric W. Biederman
  2007-02-03  1:05           ` Andrew Morton
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Eric W. Biederman @ 2007-02-03  0:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Lu, Yinghai, Luigi Genoni, Ingo Molnar,
	Natalie Protasevich, Andi Kleen


When making the interrupt vectors per cpu I failed to handle a case
during irq migration.  If the same interrupt comes in while we are
servicing the irq but before we migrate it the pending bit in the
local apic IRR register will be set for that irq.

After migrating the irq to another cpu and or vector the data
structures will no longer be setup to handle this pending irq.  Then as
soon as we return from servicing the irq we just migrated we will get
a nasty: "No irq handler for vector" message. 

Since we do not disable irqs for edge triggered interrupts except
in the smallest possible window during migration I cannot avoid
migrating an irq in the unlikely event that it becomes pending.
This is because by the time the irq could no longer become pending
I have already updated all of my data structures.

Therefore this patch introduces an intermediate state that
exists soley on the cpu where we are handling the irq during
migration.  The irq number is changed to negative in the
vector_irq data structure.

Once the migration operation is complete we know we will receive
no more interrupts on this vector so the irq pending state for
this irq will no longer be updated.  If the irq is not pending and
we are in the intermediate state we immediately free the vector,
otherwise in we free the vector in do_IRQ when the pending irq
arrives.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/x86_64/kernel/io_apic.c |   56 ++++++++++++++++++++++++++++++++++++++---
 arch/x86_64/kernel/irq.c     |   19 +++++++++++++-
 2 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index d1fcd4b..ffcb5f6 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -730,9 +730,17 @@ next:
 		/* Found one! */
 		current_vector = vector;
 		current_offset = offset;
-		for_each_cpu_mask(old_cpu, old_mask)
-			per_cpu(vector_irq, old_cpu)[old_vector] = -1;
-		for_each_cpu_mask(new_cpu, new_mask)
+		for_each_cpu_mask(old_cpu, old_mask) {
+			int free = -1;
+			/* When migrating we need to preserve the old
+			 * vector until we have processed all of the
+			 * pending irqs.
+			 */
+			if (old_cpu == smp_processor_id())
+				free = -irq;
+			per_cpu(vector_irq, old_cpu)[old_vector] = free;
+		} 
+ 		for_each_cpu_mask(new_cpu, new_mask)
 			per_cpu(vector_irq, new_cpu)[vector] = irq;
 		irq_vector[irq] = vector;
 		irq_domain[irq] = domain;
@@ -1389,6 +1397,37 @@ static int ioapic_retrigger_irq(unsigned int irq)
 	return 1;
 }
 
+static unsigned apic_in_service_vector(void)
+{
+	unsigned isr, vector;
+	/* Figure out which vector we are servicing */
+	for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector += 32) {
+		isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
+		if (isr)
+			break;
+	}
+	/* Find the low bits of the vector we are servicing */
+	vector += __ffs(isr);
+	return vector;
+}
+
+static void apic_handle_pending_vector(unsigned vector)
+{
+	unsigned irr;
+	int irq;
+
+	irq = __get_cpu_var(vector_irq)[vector];
+	if (irq >= 0)
+		return;
+
+	/* If the irq we are servicing has moved and is not pending
+	 * free it's vector.
+	 */
+	irr = apic_read(APIC_IRR + ((vector/32) * 0x10));
+	if (!(irr & (1 << (vector % 32))))
+		__get_cpu_var(vector_irq)[vector] = -1;
+}
+
 /*
  * Level and edge triggered IO-APIC interrupts need different handling,
  * so we use two separate IRQ descriptors. Edge triggered IRQs can be
@@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
 
 static void ack_apic_edge(unsigned int irq)
 {
-	move_native_irq(irq);
+	if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
+		move_native_irq(irq);
+		apic_handle_pending_vector(apic_in_service_vector());
+	}
 	ack_APIC_irq();
 }
 
 static void ack_apic_level(unsigned int irq)
 {
 	int do_unmask_irq = 0;
+	unsigned int vector = 0;
 
 #if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
 	/* If we are moving the irq we need to mask it */
 	if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
 		do_unmask_irq = 1;
 		mask_IO_APIC_irq(irq);
+		vector = apic_in_service_vector();
 	}
 #endif
 
@@ -1424,8 +1468,10 @@ static void ack_apic_level(unsigned int irq)
 
 	/* Now we can move and renable the irq */
 	move_masked_irq(irq);
-	if (unlikely(do_unmask_irq))
+	if (unlikely(do_unmask_irq)) {
+		apic_handle_pending_vector(vector);
 		unmask_IO_APIC_irq(irq);
+	}
 }
 
 static struct irq_chip ioapic_chip __read_mostly = {
diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 648055a..0ff5fbc 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -97,6 +97,23 @@ skip:
 	return 0;
 }
 
+static inline unsigned int irq_from_vector(unsigned int vector)
+{
+	int irq;
+	irq = __get_cpu_var(vector_irq)[vector];
+
+	/* If we changed vectors during migration and we had a pending
+	 * irq, we left the irq allocated on this cpu.  Now that the
+	 * pending irq has arrived get the irq number and free this
+	 * vector. 
+	 */
+	if (irq < -1) {
+		__get_cpu_var(vector_irq)[vector] = -1;
+		irq = -irq;
+	}
+	return irq;
+}
+
 /*
  * do_IRQ handles all normal device IRQ's (the special
  * SMP cross-CPU interrupts have their own specific
@@ -112,7 +129,7 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)
 
 	exit_idle();
 	irq_enter();
-	irq = __get_cpu_var(vector_irq)[vector];
+	irq = irq_from_vector(vector);
 
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 	stack_overflow_check(regs);
-- 
1.4.4.1.g278f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-02-09  6:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-05 19:29 [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Lu, Yinghai
2007-02-05 19:59 ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2007-02-05 23:33 Lu, Yinghai
2007-02-05 20:54 Lu, Yinghai
2007-02-05 23:03 ` Eric W. Biederman
2007-02-05 20:11 Lu, Yinghai
2007-02-05 20:37 ` Eric W. Biederman
     [not found] <200701221116.13154.luigi.genoni@pirelli.com>
     [not found] ` <200702021848.55921.luigi.genoni@pirelli.com>
2007-02-02 18:02   ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
     [not found]     ` <200702021905.39922.luigi.genoni@pirelli.com>
2007-02-03  0:31       ` [PATCH 1/2] x86_64 irq: Simplfy __assign_irq_vector Eric W. Biederman
2007-02-03  0:35         ` [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Eric W. Biederman
2007-02-03  1:05           ` Andrew Morton
2007-02-03  1:39             ` Eric W. Biederman
2007-02-03  2:01               ` Andrew Morton
2007-02-03  7:32             ` Arjan van de Ven
2007-02-03  7:55               ` Eric W. Biederman
2007-02-03 14:31                 ` l.genoni
2007-02-03 10:01           ` Andi Kleen
2007-02-03 10:22             ` Eric W. Biederman
2007-02-03 10:26               ` Andi Kleen
2007-02-06  7:36           ` Ingo Molnar
2007-02-06  8:57             ` Eric W. Biederman
     [not found]             ` <200702061012.25910.luigi.genoni@pirelli.com>
2007-02-06 22:05               ` Eric W. Biederman
2007-02-06 22:16             ` Eric W. Biederman
2007-02-06 22:25               ` Ingo Molnar
2007-02-07  2:33                 ` Eric W. Biederman
2007-02-08 11:48                 ` Eric W. Biederman
2007-02-08 20:19                   ` Eric W. Biederman
2007-02-09  6:40                     ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.