All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, "Lu, Yinghai" <yinghai.lu@amd.com>,
	Luigi Genoni <luigi.genoni@pirelli.com>,
	Natalie Protasevich <protasnb@gmail.com>, Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 2/2] x86_64 irq:  Handle irqs pending in IRR during irq migration.
Date: Tue, 06 Feb 2007 01:57:42 -0700	[thread overview]
Message-ID: <m1zm7ru0hl.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20070206073616.GA15016@elte.hu> (Ingo Molnar's message of "Tue, 6 Feb 2007 08:36:16 +0100")


Ingo thanks for the review.

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> When making the interrupt vectors per cpu I failed to handle a case 
>> during irq migration.  If the same interrupt comes in while we are 
>> servicing the irq but before we migrate it the pending bit in the 
>> local apic IRR register will be set for that irq.
>
> hm. I think this needs more work. For example this part of the fix looks 
> quite ugly to me:

I'm not at all certain I can make an ugly reality look beautiful.

>> +static unsigned apic_in_service_vector(void)
>> +{
>> +	unsigned isr, vector;
>> +	/* Figure out which vector we are servicing */
>> + for (vector = FIRST_EXTERNAL_VECTOR; vector < FIRST_SYSTEM_VECTOR; vector +=
> 32) {
>> +		isr = apic_read(APIC_ISR + ((vector/32) * 0x10));
>> +		if (isr)
>> +			break;
>> +	}
>> +	/* Find the low bits of the vector we are servicing */
>> +	vector += __ffs(isr);
>> +	return vector;
>
> so we read the hardware to figure out what the hell we are doing. The 
> problem is - why doesnt the kernel know at this point what it is doing? 
> It knows the CPU and it should know all the vector numbers. It also has 
> an irq number.

Yes.  And by adding a percpu global I can do this.  If figured since this
should be a rare case it would be equally simple to read this from
the hardware, as it already has the information stored in roughly the
form I would need to store it in.  If there errata in this area and I
am likely to hit them then it is probably a good idea to look at a
different implementation.


>> +	/* If the irq we are servicing has moved and is not pending
>> +	 * free it's vector.
>> +	 */
>> +	irr = apic_read(APIC_IRR + ((vector/32) * 0x10));
>
> the IRR is quite fragile. It's a rarely used hardware field and it has 
> erratums attached to it. Again, it seems like this function too tries to 
> recover information it /should/ already have.

If I am servicing an interrupt and that same interrupt comes in again
before I acknowledge it how /should/ I know that?

The only way I know to find that information is to ask the hardware.

>> @@ -1400,19 +1439,24 @@ static int ioapic_retrigger_irq(unsigned int irq)
>>  
>>  static void ack_apic_edge(unsigned int irq)
>>  {
>> -	move_native_irq(irq);
>> +	if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
>> +		move_native_irq(irq);
>> +		apic_handle_pending_vector(apic_in_service_vector());
>> +	}
>>  	ack_APIC_irq();
>
> this looks a bit ugly and a bit fragile. We had a simple 
> 'move_native_irq()' call for IRQ balancing, which is straightforward, 
> but now we've got this complex condition open coded.

Well the condition is testing a single bit so I don't think it is that
complex.  Maybe taking it out of line will help or maybe that will obscure
things.  I'm inclined to believe hiding the irq migration logic will
obscure things and make it harder to debug.

Now part of the reason I did it this way is I have at least 3
set_affinity implementations and this issue really has nothing to do
with the external interrupt controller but everything to do with the
cpu local interrupt controller, so this did not seem like something
that was reasonably buried in set_affinity.

> If then this should be done in some sort of helper - but even then, the 
> whole approach looks a bit erroneous.
>
> To me the cleanest way to migrate an IRQ between two different vectors 
> on two different CPUs would be to first mask the IRQ source in the PIC, 
> then to do the migration /atomically/ (no intermediary state), and then 
> unmask. If the PIC loses edges while masked then that's a problem of the 
> irq chip implementation of the PIC: its ->set_affinity() method should 
> refuse to migrate edge-triggered IRQs if it can lose edges while 
> unmasked!

Ingo I believe what you have described is essentially what we are
doing before my patches, or what we were doing in even older versions
that had other races and problems. 

To some extent I have inherited the current design that mostly works.  The
only known reliable way to block and edge triggered irq is to be servicing it.
The practical problem there is that when we sit on an irq the irq can come
in again and queue up in irr.  Which means that once we have updated the
data structures acked the irq and returned the irq will come in again
in the old location because the io_apic has a 1 deep queue for each
irq.  Of course for the irq controllers that can be masked safely your
proposed change to disable irq is unfortunate.

You appear to be lobbying for disabling the irq asynchronously to all
of the irq reception activity.  That would seem to me to require
running on the cpu where the irq is currently programmed to be
delivered disabling local interrupts, as well as disabling the irq in
the interrupt controller before reprogramming it.  For the irqs that
applies to there does seem to be some merit in that. 

Of course the irq will queue in irr so it can be delivered when the
irqs are enabled again and therefore we have to be very careful about
data structure changes and not removing them if we have the specified
irq pending on the current cpu.  So I don't see how this changes the
solution domain in any meaningful way, except by reducing the set
of interrupt controllers we can support with it, because it requires
masking.

The core question:  How do I know when all interrupts messages that
have sent have been serviced and acknowledged?

If you have the guarantee that the irq is disabled and queuing in the
interrupt controller and all interrupt messages that have been sent
have been serviced and acknowledged.   Then the reprogramming problem
is simple.  Otherwise we have to cope with interrupts in the cpu local
queue.

Dropping support for the more difficult cases when we know how to
migrate them and they are in fact some of our most common interrupts
like the timer interrupt seems an irresponsible thing to do.  Now
if you can make the case that we cannot migrate them safely that
is another issue.

I'm open to suggestions and with my simple patch we at least won't
hang the kernel when this happens.

Eric

  reply	other threads:[~2007-02-06  8:58 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200701221116.13154.luigi.genoni@pirelli.com>
2007-01-22 17:14 ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
     [not found]   ` <200701231051.32945.luigi.genoni@pirelli.com>
2007-01-23 12:18     ` Eric W. Biederman
     [not found]       ` <Pine.LNX.4.64.0701232052330.32111@baldios.it.pirelli.com>
2007-01-31  8:39         ` Eric W. Biederman
     [not found]           ` <200701311549.22512.luigi.genoni@pirelli.com>
2007-02-01  5:56             ` [PATCH] x86_64: Survive having no irq mapping for a vector Eric W. Biederman
2007-02-01  5:59             ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
2007-02-01  7:20             ` Eric W. Biederman
     [not found]               ` <200702021848.55921.luigi.genoni@pirelli.com>
2007-02-02 18:02                 ` Eric W. Biederman
     [not found]                   ` <200702021905.39922.luigi.genoni@pirelli.com>
2007-02-02 18:32                     ` Eric W. Biederman
2007-02-03  0:31                     ` [PATCH 1/2] x86_64 irq: Simplfy __assign_irq_vector Eric W. Biederman
2007-02-03  0:35                       ` [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Eric W. Biederman
2007-02-03  1:05                         ` Andrew Morton
2007-02-03  1:39                           ` Eric W. Biederman
2007-02-03  2:01                             ` Andrew Morton
2007-02-03  7:32                           ` Arjan van de Ven
2007-02-03  7:55                             ` Eric W. Biederman
2007-02-03 14:31                               ` l.genoni
2007-02-03 10:01                         ` Andi Kleen
2007-02-03 10:22                           ` Eric W. Biederman
2007-02-03 10:26                             ` Andi Kleen
2007-02-06  7:36                         ` Ingo Molnar
2007-02-06  8:57                           ` Eric W. Biederman [this message]
     [not found]                           ` <200702061012.25910.luigi.genoni@pirelli.com>
2007-02-06 22:05                             ` Eric W. Biederman
2007-02-06 22:16                           ` Eric W. Biederman
2007-02-06 22:25                             ` Ingo Molnar
2007-02-07  2:33                               ` Eric W. Biederman
2007-02-08 11:48                               ` Eric W. Biederman
2007-02-08 20:19                                 ` Eric W. Biederman
2007-02-09  6:40                                   ` Eric W. Biederman
2007-02-10 23:52                                     ` What are the real ioapic rte programming constraints? Eric W. Biederman
2007-02-11  5:57                                       ` Zwane Mwaikambo
2007-02-11 10:20                                         ` Eric W. Biederman
2007-02-11 16:16                                           ` Zwane Mwaikambo
2007-02-11 22:01                                             ` Eric W. Biederman
2007-02-12  1:05                                               ` Zwane Mwaikambo
2007-02-12  4:51                                                 ` Eric W. Biederman
2007-02-23 10:51                                                   ` Conclusions from my investigation about ioapic programming Eric W. Biederman
2007-02-23 11:10                                                     ` [PATCH 0/14] x86_64 irq related fixes and cleanups Eric W. Biederman
2007-02-23 11:11                                                       ` [PATCH 01/14] x86_64 irq: Simplfy __assign_irq_vector Eric W. Biederman
2007-02-23 11:13                                                         ` [PATCH 02/14] irq: Remove set_native_irq_info Eric W. Biederman
2007-02-23 11:15                                                           ` [PATCH 03/14] x86_64 irq: Kill declaration of removed array, interrupt Eric W. Biederman
2007-02-23 11:16                                                             ` [PATCH 04/14] x86_64 irq: Remove the unused vector parameter from ioapic_register_intr Eric W. Biederman
2007-02-23 11:19                                                               ` [PATCH 05/14] x86_64 irq: Refactor setup_IO_APIC_irq Eric W. Biederman
2007-02-23 11:20                                                                 ` [PATCH 06/14] x86_64 irq: Simplfiy the set_affinity logic Eric W. Biederman
2007-02-23 11:23                                                                   ` [PATCH 07/14] x86_64 irq: In __DO_ACTION perform the FINAL action for every entry Eric W. Biederman
2007-02-23 11:26                                                                     ` [PATCH 08/14] x86_64 irq: Use NR_IRQS not NR_IRQ_VECTORS Eric W. Biederman
2007-02-23 11:32                                                                       ` [PATCH 09/14] x86_64 irq: Begin consolidating per_irq data in structures Eric W. Biederman
2007-02-23 11:35                                                                         ` [PATCH 10/14] x86_64 irq: Simplify assign_irq_vector's arguments Eric W. Biederman
2007-02-23 11:36                                                                           ` [PATCH 11/14] x86_64 irq: Remove unnecessary irq 0 setup Eric W. Biederman
2007-02-23 11:38                                                                             ` [PATCH 12/14] x86_64 irq: Add constants for the reserved IRQ vectors Eric W. Biederman
2007-02-23 11:40                                                                               ` [PATCH 13/14] x86_64 irq: Safely cleanup an irq after moving it Eric W. Biederman
2007-02-25 11:53                                                                                 ` Mika Penttilä
2007-02-25 12:09                                                                                   ` Eric W. Biederman
2007-02-23 11:46                                                                               ` [PATCH 14/14] genirq: Mask irqs when migrating them Eric W. Biederman
2007-02-23 12:01                                                                                 ` [PATCH] x86_64 irq: Document what works and why on ioapics Eric W. Biederman
2007-02-24  2:06                                                                                 ` [PATCH 14/14] genirq: Mask irqs when migrating them Siddha, Suresh B
2007-02-27 20:26                                                                                   ` Andrew Morton
2007-02-27 20:41                                                                                     ` Eric W. Biederman
2007-02-25 10:43                                                                               ` [PATCH 12/14] x86_64 irq: Add constants for the reserved IRQ vectors Pavel Machek
2007-02-25 11:15                                                                                 ` Eric W. Biederman
2007-02-25 19:48                                                                                   ` Pavel Machek
2007-02-25 21:01                                                                                     ` Eric W. Biederman
2007-02-25 21:13                                                                                       ` Pavel Machek
2007-02-23 16:48                                                     ` Conclusions from my investigation about ioapic programming Jeff V. Merkey
2007-02-23 18:10                                                       ` Eric W. Biederman
2007-02-23 17:48                                                         ` Jeff V. Merkey
2007-02-24  4:05                                                           ` Eric W. Biederman
2007-02-24  5:44                                                             ` Jeffrey V. Merkey
2007-02-23 17:48                                                         ` Jeff V. Merkey
     [not found]                                           ` <32209efe0702111212j77f5011xe2430cb13c13686@mail.gmail.com>
2007-02-11 21:36                                             ` What are the real ioapic rte programming constraints? Eric W. Biederman
2007-02-03  9:50                       ` [PATCH 1/2] x86_64 irq: Simplfy __assign_irq_vector Andi Kleen
2007-02-03  0:40                     ` System crash after "No irq handler for vector" linux 2.6.19 Eric W. Biederman
2007-02-05 19:29 [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq migration Lu, Yinghai
2007-02-05 19:59 ` Eric W. Biederman
2007-02-05 20:11 Lu, Yinghai
2007-02-05 20:37 ` Eric W. Biederman
2007-02-05 20:54 Lu, Yinghai
2007-02-05 23:03 ` Eric W. Biederman
2007-02-05 23:33 Lu, Yinghai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1zm7ru0hl.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luigi.genoni@pirelli.com \
    --cc=mingo@elte.hu \
    --cc=protasnb@gmail.com \
    --cc=yinghai.lu@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.