All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] Fix lost interrupt race in Xen event channels
@ 2010-08-24 21:35 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-24 21:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tom Kopec, Daniel Stodden, Xen-devel, Stable Kernel,
	Linux Kernel Mailing List

 Hi Linus,

This pair of patches fixes a long-standing bug whose most noticeable
symptom was that the Xen blkfront driver would hang up very occasionally
(sometimes never, sometimes after weeks or months of uptime).

We worked out the root cause was that it was incorrectly treating Xen
events as level rather than edge triggered interrupts, which works fine
unless you're handling one interrupt, the interrupt gets migrated to
another cpu and then re-raised.  This ends up losing the interrupt
because the edge-triggering of the second interrupt is lost.

The other change changes IPI and VIRQ event sources to use
handle_percpu_irq, because treating them as level is also wrong, and
they're actually inherently percpu events, much like LAPIC vectors.

I'd like to get this fix into the current kernel and into stable sooner
rather than later.

Thanks,
    J

The following changes since commit 76be97c1fc945db08aae1f1b746012662d643e97:

  Linux 2.6.36-rc2 (2010-08-22 17:43:29 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/core

Jeremy Fitzhardinge (2):
      xen: use percpu interrupts for IPIs and VIRQs
      xen: handle events as edge-triggered

 drivers/xen/events.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 72f91bf..13365ba 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -112,6 +112,7 @@ static inline unsigned long *cpu_evtchn_mask(int cpu)
 #define VALID_EVTCHN(chn)	((chn) != 0)
 
 static struct irq_chip xen_dynamic_chip;
+static struct irq_chip xen_percpu_chip;
 
 /* Constructor for packed IRQ information. */
 static struct irq_info mk_unbound_info(void)
@@ -377,7 +378,7 @@ int bind_evtchn_to_irq(unsigned int evtchn)
 		irq = find_unbound_irq();
 
 		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
-					      handle_level_irq, "event");
+					      handle_edge_irq, "event");
 
 		evtchn_to_irq[evtchn] = irq;
 		irq_info[irq] = mk_evtchn_info(evtchn);
@@ -403,8 +404,8 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
 		if (irq < 0)
 			goto out;
 
-		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
-					      handle_level_irq, "ipi");
+		set_irq_chip_and_handler_name(irq, &xen_percpu_chip,
+					      handle_percpu_irq, "ipi");
 
 		bind_ipi.vcpu = cpu;
 		if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
@@ -444,8 +445,8 @@ static int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
 
 		irq = find_unbound_irq();
 
-		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
-					      handle_level_irq, "virq");
+		set_irq_chip_and_handler_name(irq, &xen_percpu_chip,
+					      handle_percpu_irq, "virq");
 
 		evtchn_to_irq[evtchn] = irq;
 		irq_info[irq] = mk_virq_info(evtchn, virq);
@@ -964,6 +965,16 @@ static struct irq_chip xen_dynamic_chip __read_mostly = {
 	.retrigger	= retrigger_dynirq,
 };
 
+static struct irq_chip xen_percpu_chip __read_mostly = {
+	.name		= "xen-percpu",
+
+	.disable	= disable_dynirq,
+	.mask		= disable_dynirq,
+	.unmask		= enable_dynirq,
+
+	.ack		= ack_dynirq,
+};
+
 int xen_set_callback_via(uint64_t via)
 {
 	struct xen_hvm_param a;



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [GIT PULL] Fix lost interrupt race in Xen event channels
@ 2010-08-24 21:35 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-24 21:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Xen-devel, Stable Kernel, Tom Kopec,
	Daniel Stodden

 Hi Linus,

This pair of patches fixes a long-standing bug whose most noticeable
symptom was that the Xen blkfront driver would hang up very occasionally
(sometimes never, sometimes after weeks or months of uptime).

We worked out the root cause was that it was incorrectly treating Xen
events as level rather than edge triggered interrupts, which works fine
unless you're handling one interrupt, the interrupt gets migrated to
another cpu and then re-raised.  This ends up losing the interrupt
because the edge-triggering of the second interrupt is lost.

The other change changes IPI and VIRQ event sources to use
handle_percpu_irq, because treating them as level is also wrong, and
they're actually inherently percpu events, much like LAPIC vectors.

I'd like to get this fix into the current kernel and into stable sooner
rather than later.

Thanks,
    J

The following changes since commit 76be97c1fc945db08aae1f1b746012662d643e97:

  Linux 2.6.36-rc2 (2010-08-22 17:43:29 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/core

Jeremy Fitzhardinge (2):
      xen: use percpu interrupts for IPIs and VIRQs
      xen: handle events as edge-triggered

 drivers/xen/events.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 72f91bf..13365ba 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -112,6 +112,7 @@ static inline unsigned long *cpu_evtchn_mask(int cpu)
 #define VALID_EVTCHN(chn)	((chn) != 0)
 
 static struct irq_chip xen_dynamic_chip;
+static struct irq_chip xen_percpu_chip;
 
 /* Constructor for packed IRQ information. */
 static struct irq_info mk_unbound_info(void)
@@ -377,7 +378,7 @@ int bind_evtchn_to_irq(unsigned int evtchn)
 		irq = find_unbound_irq();
 
 		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
-					      handle_level_irq, "event");
+					      handle_edge_irq, "event");
 
 		evtchn_to_irq[evtchn] = irq;
 		irq_info[irq] = mk_evtchn_info(evtchn);
@@ -403,8 +404,8 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
 		if (irq < 0)
 			goto out;
 
-		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
-					      handle_level_irq, "ipi");
+		set_irq_chip_and_handler_name(irq, &xen_percpu_chip,
+					      handle_percpu_irq, "ipi");
 
 		bind_ipi.vcpu = cpu;
 		if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
@@ -444,8 +445,8 @@ static int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
 
 		irq = find_unbound_irq();
 
-		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
-					      handle_level_irq, "virq");
+		set_irq_chip_and_handler_name(irq, &xen_percpu_chip,
+					      handle_percpu_irq, "virq");
 
 		evtchn_to_irq[evtchn] = irq;
 		irq_info[irq] = mk_virq_info(evtchn, virq);
@@ -964,6 +965,16 @@ static struct irq_chip xen_dynamic_chip __read_mostly = {
 	.retrigger	= retrigger_dynirq,
 };
 
+static struct irq_chip xen_percpu_chip __read_mostly = {
+	.name		= "xen-percpu",
+
+	.disable	= disable_dynirq,
+	.mask		= disable_dynirq,
+	.unmask		= enable_dynirq,
+
+	.ack		= ack_dynirq,
+};
+
 int xen_set_callback_via(uint64_t via)
 {
 	struct xen_hvm_param a;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-24 21:35 ` Jeremy Fitzhardinge
  (?)
@ 2010-08-25  7:52 ` Jan Beulich
  2010-08-25 10:04   ` Daniel Stodden
                     ` (2 more replies)
  -1 siblings, 3 replies; 27+ messages in thread
From: Jan Beulich @ 2010-08-25  7:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Tom Kopec, Daniel Stodden, Stable Kernel, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List

 >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> We worked out the root cause was that it was incorrectly treating Xen
> events as level rather than edge triggered interrupts, which works fine
> unless you're handling one interrupt, the interrupt gets migrated to
> another cpu and then re-raised.  This ends up losing the interrupt
> because the edge-triggering of the second interrupt is lost.

While this description would seem plausible at the first glance, it
doesn't match up with unmask_evtchn() already taking care of
exactly this case. Or are you implicitly saying that this code is
broken in some way (if so, how, and shouldn't it then be that
code that needs fixing, or removing if you want to stay with the
edge handling)?

I do however agree that using handle_level_irq() is problematic
(see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
but as said there I think using the fasteoi logic is preferable. No
matter whether using edge or level, the ->end() method will
never be called (whereas fasteoi calls ->eoi(), which would
just need to be vectored to the same function as ->end()).
Without end_pirq() ever called, you can't let Xen know of
bad PIRQs (so that it can disable them instead of continuing
to call the [now shortcut] handler in the owning domain).

> The other change changes IPI and VIRQ event sources to use
> handle_percpu_irq, because treating them as level is also wrong, and
> they're actually inherently percpu events, much like LAPIC vectors.

This doesn't seem right for the general VIRQ case: global ones
should not be disallowed migration between CPUs. Since in your
model the requestor has to pass IRQF_PERCPU anyway,
shouldn't you make the selection of the handler dependent
upon this flag?

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-25  7:52 ` [Xen-devel] " Jan Beulich
@ 2010-08-25 10:04   ` Daniel Stodden
  2010-08-25 11:24     ` Jan Beulich
  2010-08-25 17:54     ` Jeremy Fitzhardinge
  2010-09-03 18:46   ` Using handle_fasteoi_irq for pirqs Jeremy Fitzhardinge
  2 siblings, 1 reply; 27+ messages in thread
From: Daniel Stodden @ 2010-08-25 10:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jeremy Fitzhardinge, Tom Kopec, Stable Kernel, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List

On Wed, 2010-08-25 at 03:52 -0400, Jan Beulich wrote:
> >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > We worked out the root cause was that it was incorrectly treating Xen
> > events as level rather than edge triggered interrupts, which works fine
> > unless you're handling one interrupt, the interrupt gets migrated to
> > another cpu and then re-raised.  This ends up losing the interrupt
> > because the edge-triggering of the second interrupt is lost.
> 
> While this description would seem plausible at the first glance, it
> doesn't match up with unmask_evtchn() already taking care of
> exactly this case. Or are you implicitly saying that this code is
> broken in some way (if so, how, and shouldn't it then be that
> code that needs fixing, or removing if you want to stay with the
> edge handling)?

Not broken, but a different problem. The unmask 'resend' only catches
the edge lost if the event was raised while it was still masked. But
level irq doesn't have to save PENDING state. In the Xen event migration
case the edge isn't lost, but the upcall will drop the invocation when
the handler is found inprogress on the previous cpu.

> I do however agree that using handle_level_irq() is problematic
> (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
> but as said there I think using the fasteoi logic is preferable. No
> matter whether using edge or level, the ->end() method will
> never be called (whereas fasteoi calls ->eoi(), which would
> just need to be vectored to the same function as ->end()).
> Without end_pirq() ever called, you can't let Xen know of
> bad PIRQs (so that it can disable them instead of continuing
> to call the [now shortcut] handler in the owning domain).

Not an opinion, just confused: Isn't all that dealt with in
chip->disable?

Daniel


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-25 10:04   ` Daniel Stodden
@ 2010-08-25 11:24     ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2010-08-25 11:24 UTC (permalink / raw)
  To: Daniel Stodden
  Cc: Tom Kopec, Jeremy Fitzhardinge, Stable Kernel, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List

>>> On 25.08.10 at 12:04, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> On Wed, 2010-08-25 at 03:52 -0400, Jan Beulich wrote:
>> >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> > We worked out the root cause was that it was incorrectly treating Xen
>> > events as level rather than edge triggered interrupts, which works fine
>> > unless you're handling one interrupt, the interrupt gets migrated to
>> > another cpu and then re-raised.  This ends up losing the interrupt
>> > because the edge-triggering of the second interrupt is lost.
>> 
>> While this description would seem plausible at the first glance, it
>> doesn't match up with unmask_evtchn() already taking care of
>> exactly this case. Or are you implicitly saying that this code is
>> broken in some way (if so, how, and shouldn't it then be that
>> code that needs fixing, or removing if you want to stay with the
>> edge handling)?
> 
> Not broken, but a different problem. The unmask 'resend' only catches
> the edge lost if the event was raised while it was still masked. But
> level irq doesn't have to save PENDING state. In the Xen event migration
> case the edge isn't lost, but the upcall will drop the invocation when
> the handler is found inprogress on the previous cpu.

Hmm, indeed. But that problem must have existed in all post-2.6.18
kernels then... And that shouldn't be a problem with fasteoi, as that
one calls ->eoi() even when INPROGRESS was set (other than level,
which calls unmask only when it wasn't set).

>> I do however agree that using handle_level_irq() is problematic
>> (see 
> http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
>> but as said there I think using the fasteoi logic is preferable. No
>> matter whether using edge or level, the ->end() method will
>> never be called (whereas fasteoi calls ->eoi(), which would
>> just need to be vectored to the same function as ->end()).
>> Without end_pirq() ever called, you can't let Xen know of
>> bad PIRQs (so that it can disable them instead of continuing
>> to call the [now shortcut] handler in the owning domain).
> 
> Not an opinion, just confused: Isn't all that dealt with in
> chip->disable?

With disable_pirq() being empty (at least in the branches I
looked at)?

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-25  7:52 ` [Xen-devel] " Jan Beulich
@ 2010-08-25 17:54     ` Jeremy Fitzhardinge
  2010-08-25 17:54     ` Jeremy Fitzhardinge
  2010-09-03 18:46   ` Using handle_fasteoi_irq for pirqs Jeremy Fitzhardinge
  2 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-25 17:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tom Kopec, Daniel Stodden, Stable Kernel, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List

 On 08/25/2010 12:52 AM, Jan Beulich wrote:
>  >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> We worked out the root cause was that it was incorrectly treating Xen
>> events as level rather than edge triggered interrupts, which works fine
>> unless you're handling one interrupt, the interrupt gets migrated to
>> another cpu and then re-raised.  This ends up losing the interrupt
>> because the edge-triggering of the second interrupt is lost.
> While this description would seem plausible at the first glance, it
> doesn't match up with unmask_evtchn() already taking care of
> exactly this case. Or are you implicitly saying that this code is
> broken in some way (if so, how, and shouldn't it then be that
> code that needs fixing, or removing if you want to stay with the
> edge handling)?
>
> I do however agree that using handle_level_irq() is problematic
> (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
> but as said there I think using the fasteoi logic is preferable. No
> matter whether using edge or level, the ->end() method will
> never be called (whereas fasteoi calls ->eoi(), which would
> just need to be vectored to the same function as ->end()).
> Without end_pirq() ever called, you can't let Xen know of
> bad PIRQs (so that it can disable them instead of continuing
> to call the [now shortcut] handler in the owning domain).

Note that this patch is specifically for upstream Xen, which doesn't
have any pirq support in it at present.

However,  I did consider using fasteoi, but I couldn't see how to make
it work.  The problem is that it only does a single call into the
irq_chip for EOI after calling the interrupt handler, but there is no
call beforehand to ack the interrupt (which means clear the event flag
in our case).  This leads to a race where an event can be lost after the
interrupt handler has returned, but before the event flag has been
cleared (because Xen won't set pending or call the upcall function if
the event is already set).  I guess I could pre-clear the event in the
upcall function, but I'm not sure that's any better.

In the dom0 kernels, I followed the example of the IOAPIC irq_chip
implementation, which does the hardware EOI in the ack call at the start
of handle_edge_irq, can did the EOI hypercall (when necessary) there.  I
welcome a reviewer's eye on this though.

>> The other change changes IPI and VIRQ event sources to use
>> handle_percpu_irq, because treating them as level is also wrong, and
>> they're actually inherently percpu events, much like LAPIC vectors.
> This doesn't seem right for the general VIRQ case: global ones
> should not be disallowed migration between CPUs. Since in your
> model the requestor has to pass IRQF_PERCPU anyway,
> shouldn't you make the selection of the handler dependent
> upon this flag?

I was thinking specifically of the timer, debug and console virqs.  The
last is the only one which could conceivably be non-percpu, but in
practice I think it would be a bad idea to put it on anything other than
cpu0.  What other global virqs are there?  Nothing that's high-frequency
enough to be worth migrating?

But yes, if necessary, I could make it depend on the IRQF_PERCPU flag.

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
@ 2010-08-25 17:54     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-25 17:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Tom Kopec, Linux Kernel Mailing List, Linus Torvalds,
	Stable Kernel, Daniel Stodden

 On 08/25/2010 12:52 AM, Jan Beulich wrote:
>  >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> We worked out the root cause was that it was incorrectly treating Xen
>> events as level rather than edge triggered interrupts, which works fine
>> unless you're handling one interrupt, the interrupt gets migrated to
>> another cpu and then re-raised.  This ends up losing the interrupt
>> because the edge-triggering of the second interrupt is lost.
> While this description would seem plausible at the first glance, it
> doesn't match up with unmask_evtchn() already taking care of
> exactly this case. Or are you implicitly saying that this code is
> broken in some way (if so, how, and shouldn't it then be that
> code that needs fixing, or removing if you want to stay with the
> edge handling)?
>
> I do however agree that using handle_level_irq() is problematic
> (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
> but as said there I think using the fasteoi logic is preferable. No
> matter whether using edge or level, the ->end() method will
> never be called (whereas fasteoi calls ->eoi(), which would
> just need to be vectored to the same function as ->end()).
> Without end_pirq() ever called, you can't let Xen know of
> bad PIRQs (so that it can disable them instead of continuing
> to call the [now shortcut] handler in the owning domain).

Note that this patch is specifically for upstream Xen, which doesn't
have any pirq support in it at present.

However,  I did consider using fasteoi, but I couldn't see how to make
it work.  The problem is that it only does a single call into the
irq_chip for EOI after calling the interrupt handler, but there is no
call beforehand to ack the interrupt (which means clear the event flag
in our case).  This leads to a race where an event can be lost after the
interrupt handler has returned, but before the event flag has been
cleared (because Xen won't set pending or call the upcall function if
the event is already set).  I guess I could pre-clear the event in the
upcall function, but I'm not sure that's any better.

In the dom0 kernels, I followed the example of the IOAPIC irq_chip
implementation, which does the hardware EOI in the ack call at the start
of handle_edge_irq, can did the EOI hypercall (when necessary) there.  I
welcome a reviewer's eye on this though.

>> The other change changes IPI and VIRQ event sources to use
>> handle_percpu_irq, because treating them as level is also wrong, and
>> they're actually inherently percpu events, much like LAPIC vectors.
> This doesn't seem right for the general VIRQ case: global ones
> should not be disallowed migration between CPUs. Since in your
> model the requestor has to pass IRQF_PERCPU anyway,
> shouldn't you make the selection of the handler dependent
> upon this flag?

I was thinking specifically of the timer, debug and console virqs.  The
last is the only one which could conceivably be non-percpu, but in
practice I think it would be a bad idea to put it on anything other than
cpu0.  What other global virqs are there?  Nothing that's high-frequency
enough to be worth migrating?

But yes, if necessary, I could make it depend on the IRQF_PERCPU flag.

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-25 17:54     ` Jeremy Fitzhardinge
  (?)
@ 2010-08-26  6:46     ` Jan Beulich
  2010-08-26 16:32       ` Jeremy Fitzhardinge
  -1 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2010-08-26  6:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Tom Kopec, Daniel Stodden

 >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Note that this patch is specifically for upstream Xen, which doesn't
> have any pirq support in it at present.

I understand that, but saw that you had paralleling changes to the
pirq handling in your Dom0 tree.

> However,  I did consider using fasteoi, but I couldn't see how to make
> it work.  The problem is that it only does a single call into the
> irq_chip for EOI after calling the interrupt handler, but there is no
> call beforehand to ack the interrupt (which means clear the event flag
> in our case).  This leads to a race where an event can be lost after the
> interrupt handler has returned, but before the event flag has been
> cleared (because Xen won't set pending or call the upcall function if
> the event is already set).  I guess I could pre-clear the event in the
> upcall function, but I'm not sure that's any better.

That's precisely what we're doing.

> In the dom0 kernels, I followed the example of the IOAPIC irq_chip
> implementation, which does the hardware EOI in the ack call at the start
> of handle_edge_irq, can did the EOI hypercall (when necessary) there.  I
> welcome a reviewer's eye on this though.

This I didn't actually notice so far.

That doesn't look right, at least in combination with ->mask() being
wired to disable_pirq(), which is empty (and btw., if the latter was
right, you should also wire ->mask_ack() to disable_pirq() to avoid
a pointless indirect call).

But even with ->mask() actually masking the IRQ I'm not certain
this is right. At the very least it'll make Xen see a potential
second instance of the same IRQ much earlier than you will
really be able to handle it. This is particularly bad for level
triggered ones, as Xen will see them right again after you
passed it the EOI notification - as a result there'll be twice as
many interrupts seen by Xen on the respective lines.

The native I/O APIC can validly do this as ->ack() only gets
called for edge triggered interrupts (which is why ->eoi() is
wired to ack_apic_level()).

> I was thinking specifically of the timer, debug and console virqs.  The
> last is the only one which could conceivably be non-percpu, but in
> practice I think it would be a bad idea to put it on anything other than
> cpu0.  What other global virqs are there?  Nothing that's high-frequency
> enough to be worth migrating?

Once supported in your tree, oprofile could have high interrupt
rates, and the trace buffer ones might too. Admittedly both are
debugging aids, but I don't think it'd be nice for them to influence
performance more than necessary.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-26  6:46     ` Jan Beulich
@ 2010-08-26 16:32       ` Jeremy Fitzhardinge
  2010-08-27  8:56         ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-26 16:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Tom Kopec, Daniel Stodden

 On 08/25/2010 11:46 PM, Jan Beulich wrote:
>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> Note that this patch is specifically for upstream Xen, which doesn't
>> have any pirq support in it at present.
> I understand that, but saw that you had paralleling changes to the
> pirq handling in your Dom0 tree.
>
>> However,  I did consider using fasteoi, but I couldn't see how to make
>> it work.  The problem is that it only does a single call into the
>> irq_chip for EOI after calling the interrupt handler, but there is no
>> call beforehand to ack the interrupt (which means clear the event flag
>> in our case).  This leads to a race where an event can be lost after the
>> interrupt handler has returned, but before the event flag has been
>> cleared (because Xen won't set pending or call the upcall function if
>> the event is already set).  I guess I could pre-clear the event in the
>> upcall function, but I'm not sure that's any better.
> That's precisely what we're doing.

You mean pre-clearing the event?  OK.

But aren't you still subject to the bug the switch to handle_edge_irq fixed?

With handle_fasteoi_irq:

cpu A			cpu B
get event
set INPROGRESS
call action
   :
   :
<migrate event channel to B>
   :			get event
   :			INPROGRESS set? -> EOI, return
   :
action returns
clear INPROGRESS
EOI


The event arriving on B is lost, and there's no record made of it ever
having arrived.  How do you avoid this?

With handle_edge_irq, the second event will set PENDING in the irq_desc,
and a loop will keep running on cpu A until PENDING is clear, so nothing
is ever lost.

>> In the dom0 kernels, I followed the example of the IOAPIC irq_chip
>> implementation, which does the hardware EOI in the ack call at the start
>> of handle_edge_irq, can did the EOI hypercall (when necessary) there.  I
>> welcome a reviewer's eye on this though.
> This I didn't actually notice so far.
>
> That doesn't look right, at least in combination with ->mask() being
> wired to disable_pirq(), which is empty (and btw., if the latter was
> right, you should also wire ->mask_ack() to disable_pirq() to avoid
> a pointless indirect call).
>
> But even with ->mask() actually masking the IRQ I'm not certain
> this is right. At the very least it'll make Xen see a potential
> second instance of the same IRQ much earlier than you will
> really be able to handle it. This is particularly bad for level
> triggered ones, as Xen will see them right again after you
> passed it the EOI notification - as a result there'll be twice as
> many interrupts seen by Xen on the respective lines.
>
> The native I/O APIC can validly do this as ->ack() only gets
> called for edge triggered interrupts (which is why ->eoi() is
> wired to ack_apic_level()).

Yes.  The code as-is works OK, but I haven't checked to see if Xen it
taking many spurious interrupts.  It probably helps that my test machine
has MSI for almost everything.

But does that mean the pirq code needs to have different ack/eoi
behaviour depending on the triggering of the ioapic interrupt?


>> I was thinking specifically of the timer, debug and console virqs.  The
>> last is the only one which could conceivably be non-percpu, but in
>> practice I think it would be a bad idea to put it on anything other than
>> cpu0.  What other global virqs are there?  Nothing that's high-frequency
>> enough to be worth migrating?
> Once supported in your tree, oprofile could have high interrupt
> rates, and the trace buffer ones might too. Admittedly both are
> debugging aids, but I don't think it'd be nice for them to influence
> performance more than necessary.

VIRQ_XENOPROF is percpu as well.  VIRQ_TBUF is global, but it has to
happen on *some* vcpu - but yes, I guess it would be useful to put it on
another vcpu so that it doesn't affect all the stuff tied to vcpu 0. 
(At the moment it would be tied to whatever vcpu set up the virq, so you
could control initial placement that way...)

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-26 16:32       ` Jeremy Fitzhardinge
@ 2010-08-27  8:56         ` Jan Beulich
  2010-08-27 20:43           ` Daniel Stodden
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2010-08-27  8:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Tom Kopec, Daniel Stodden

 >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 08/25/2010 11:46 PM, Jan Beulich wrote:
>>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> Note that this patch is specifically for upstream Xen, which doesn't
>>> have any pirq support in it at present.
>> I understand that, but saw that you had paralleling changes to the
>> pirq handling in your Dom0 tree.
>>
>>> However,  I did consider using fasteoi, but I couldn't see how to make
>>> it work.  The problem is that it only does a single call into the
>>> irq_chip for EOI after calling the interrupt handler, but there is no
>>> call beforehand to ack the interrupt (which means clear the event flag
>>> in our case).  This leads to a race where an event can be lost after the
>>> interrupt handler has returned, but before the event flag has been
>>> cleared (because Xen won't set pending or call the upcall function if
>>> the event is already set).  I guess I could pre-clear the event in the
>>> upcall function, but I'm not sure that's any better.
>> That's precisely what we're doing.
> 
> You mean pre-clearing the event?  OK.
> 
> But aren't you still subject to the bug the switch to handle_edge_irq fixed?
> 
> With handle_fasteoi_irq:
> 
> cpu A			cpu B
> get event

mask and clear event

> set INPROGRESS
> call action
>    :
>    :
> <migrate event channel to B>
>    :			get event

Cannot happen, event is masked (i.e. all that would happen is
that the event occurrence would be logged evtchn_pending).

>    :			INPROGRESS set? -> EOI, return
>    :
> action returns
> clear INPROGRESS
> EOI

unmask event, checking for whether the event got re-bound (and
doing the unmask through a hypercall if necessary), thus re-raising
the event in any case

> The event arriving on B is lost, and there's no record made of it ever
> having arrived.  How do you avoid this?
> 
> With handle_edge_irq, the second event will set PENDING in the irq_desc,
> and a loop will keep running on cpu A until PENDING is clear, so nothing
> is ever lost.

Actually, considering that you mask and unmask just like we do, I
cannot even see why you would have run into above scenario
with handle_level_irq(). Where is the window that I'm missing?

>>> In the dom0 kernels, I followed the example of the IOAPIC irq_chip
>>> implementation, which does the hardware EOI in the ack call at the start
>>> of handle_edge_irq, can did the EOI hypercall (when necessary) there.  I
>>> welcome a reviewer's eye on this though.
>> This I didn't actually notice so far.
>>
>> That doesn't look right, at least in combination with ->mask() being
>> wired to disable_pirq(), which is empty (and btw., if the latter was
>> right, you should also wire ->mask_ack() to disable_pirq() to avoid
>> a pointless indirect call).
>>
>> But even with ->mask() actually masking the IRQ I'm not certain
>> this is right. At the very least it'll make Xen see a potential
>> second instance of the same IRQ much earlier than you will
>> really be able to handle it. This is particularly bad for level
>> triggered ones, as Xen will see them right again after you
>> passed it the EOI notification - as a result there'll be twice as
>> many interrupts seen by Xen on the respective lines.
>>
>> The native I/O APIC can validly do this as ->ack() only gets
>> called for edge triggered interrupts (which is why ->eoi() is
>> wired to ack_apic_level()).
> 
> Yes.  The code as-is works OK, but I haven't checked to see if Xen it
> taking many spurious interrupts.  It probably helps that my test machine
> has MSI for almost everything.
> 
> But does that mean the pirq code needs to have different ack/eoi
> behaviour depending on the triggering of the ioapic interrupt?

If you want to continue to use handle_edge_irq(), I think you will.
With handle_fasteoi_irq(), you would leverage Xen's handling of
edge/level, and wouldn't need to make any distinction.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-27  8:56         ` Jan Beulich
@ 2010-08-27 20:43           ` Daniel Stodden
  2010-08-27 21:49             ` Daniel Stodden
                               ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Daniel Stodden @ 2010-08-27 20:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec

On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
> >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > On 08/25/2010 11:46 PM, Jan Beulich wrote:
> >>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >>> Note that this patch is specifically for upstream Xen, which doesn't
> >>> have any pirq support in it at present.
> >> I understand that, but saw that you had paralleling changes to the
> >> pirq handling in your Dom0 tree.
> >>
> >>> However,  I did consider using fasteoi, but I couldn't see how to make
> >>> it work.  The problem is that it only does a single call into the
> >>> irq_chip for EOI after calling the interrupt handler, but there is no
> >>> call beforehand to ack the interrupt (which means clear the event flag
> >>> in our case).  This leads to a race where an event can be lost after the
> >>> interrupt handler has returned, but before the event flag has been
> >>> cleared (because Xen won't set pending or call the upcall function if
> >>> the event is already set).  I guess I could pre-clear the event in the
> >>> upcall function, but I'm not sure that's any better.
> >> That's precisely what we're doing.
> > 
> > You mean pre-clearing the event?  OK.
> > 
> > But aren't you still subject to the bug the switch to handle_edge_irq fixed?
> > 
> > With handle_fasteoi_irq:
> > 
> > cpu A			cpu B
> > get event
> 
> mask and clear event

Argh. Right, I guess that's my fault, I was the one who came up with the
PENDING theory, but indeed I failed to see the event masking bits.

However, please read on.

> > set INPROGRESS
> > call action
> >    :
> >    :
> > <migrate event channel to B>
> >    :			get event
> 
> Cannot happen, event is masked (i.e. all that would happen is
> that the event occurrence would be logged evtchn_pending).
> 
> >    :			INPROGRESS set? -> EOI, return
> >    :
> > action returns
> > clear INPROGRESS
> > EOI
> 
> unmask event, checking for whether the event got re-bound (and
> doing the unmask through a hypercall if necessary), thus re-raising
> the event in any case

Yes. I agree. So let's come up with a new theory. Right now I'm still
looking at xen/next. Correct me if I'm mistaken:

mask_ack_pirq will:
 1. chip->mask
 2. chip->ack

Where chip->ack will:
 1. move_native_irq
 2. clear_evtchn.

Now if you look into move_native_irq, it will:
 1. chip->mask (gratuitous)
 2. move
 3. chip->unmask (aiiiiiie).

That explains why edge_irq still fixed the problem.

Price question is if that's the kind of fix we wanted then.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-27 20:43           ` Daniel Stodden
@ 2010-08-27 21:49             ` Daniel Stodden
  2010-08-27 23:43             ` Jeremy Fitzhardinge
  2010-08-30  8:03             ` Jan Beulich
  2 siblings, 0 replies; 27+ messages in thread
From: Daniel Stodden @ 2010-08-27 21:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec

On Fri, 2010-08-27 at 13:43 -0700, Daniel Stodden wrote:
> On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
> > >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > > On 08/25/2010 11:46 PM, Jan Beulich wrote:
> > >>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > >>> Note that this patch is specifically for upstream Xen, which doesn't
> > >>> have any pirq support in it at present.
> > >> I understand that, but saw that you had paralleling changes to the
> > >> pirq handling in your Dom0 tree.
> > >>
> > >>> However,  I did consider using fasteoi, but I couldn't see how to make
> > >>> it work.  The problem is that it only does a single call into the
> > >>> irq_chip for EOI after calling the interrupt handler, but there is no
> > >>> call beforehand to ack the interrupt (which means clear the event flag
> > >>> in our case).  This leads to a race where an event can be lost after the
> > >>> interrupt handler has returned, but before the event flag has been
> > >>> cleared (because Xen won't set pending or call the upcall function if
> > >>> the event is already set).  I guess I could pre-clear the event in the
> > >>> upcall function, but I'm not sure that's any better.
> > >> That's precisely what we're doing.
> > > 
> > > You mean pre-clearing the event?  OK.
> > > 
> > > But aren't you still subject to the bug the switch to handle_edge_irq fixed?
> > > 
> > > With handle_fasteoi_irq:
> > > 
> > > cpu A			cpu B
> > > get event
> > 
> > mask and clear event
> 
> Argh. Right, I guess that's my fault, I was the one who came up with the
> PENDING theory, but indeed I failed to see the event masking bits.
> 
> However, please read on.
> 
> > > set INPROGRESS
> > > call action
> > >    :
> > >    :
> > > <migrate event channel to B>
> > >    :			get event
> > 
> > Cannot happen, event is masked (i.e. all that would happen is
> > that the event occurrence would be logged evtchn_pending).
> > 
> > >    :			INPROGRESS set? -> EOI, return
> > >    :
> > > action returns
> > > clear INPROGRESS
> > > EOI
> > 
> > unmask event, checking for whether the event got re-bound (and
> > doing the unmask through a hypercall if necessary), thus re-raising
> > the event in any case
> 
> Yes. I agree. So let's come up with a new theory. Right now I'm still
> looking at xen/next. Correct me if I'm mistaken:
> 
> mask_ack_pirq will:
>  1. chip->mask
>  2. chip->ack
> 
> Where chip->ack will:
>  1. move_native_irq
>  2. clear_evtchn.
> 
> Now if you look into move_native_irq, it will:
>  1. chip->mask (gratuitous)
>  2. move
>  3. chip->unmask (aiiiiiie).
> 
> That explains why edge_irq still fixed the problem.
> 
> Price question is if that's the kind of fix we wanted then.

XCP has, presumably older, mask_ack() and ack() handlers in
core/evtchn.c. Those
1. move
2. mask
3. ack

and therefore don't have that problem. So maybe this was caused by some
pvops specific patch a while ago?

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-27 20:43           ` Daniel Stodden
  2010-08-27 21:49             ` Daniel Stodden
@ 2010-08-27 23:43             ` Jeremy Fitzhardinge
  2010-08-30  8:03             ` Jan Beulich
  2 siblings, 0 replies; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-27 23:43 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Xen-devel, Tom Kopec, Jan Beulich

 On 08/27/2010 01:43 PM, Daniel Stodden wrote:
> On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:
>>>>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> On 08/25/2010 11:46 PM, Jan Beulich wrote:
>>>>  >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>>>> Note that this patch is specifically for upstream Xen, which doesn't
>>>>> have any pirq support in it at present.
>>>> I understand that, but saw that you had paralleling changes to the
>>>> pirq handling in your Dom0 tree.
>>>>
>>>>> However,  I did consider using fasteoi, but I couldn't see how to make
>>>>> it work.  The problem is that it only does a single call into the
>>>>> irq_chip for EOI after calling the interrupt handler, but there is no
>>>>> call beforehand to ack the interrupt (which means clear the event flag
>>>>> in our case).  This leads to a race where an event can be lost after the
>>>>> interrupt handler has returned, but before the event flag has been
>>>>> cleared (because Xen won't set pending or call the upcall function if
>>>>> the event is already set).  I guess I could pre-clear the event in the
>>>>> upcall function, but I'm not sure that's any better.
>>>> That's precisely what we're doing.
>>> You mean pre-clearing the event?  OK.
>>>
>>> But aren't you still subject to the bug the switch to handle_edge_irq fixed?
>>>
>>> With handle_fasteoi_irq:
>>>
>>> cpu A			cpu B
>>> get event
>> mask and clear event
> Argh. Right, I guess that's my fault, I was the one who came up with the
> PENDING theory, but indeed I failed to see the event masking bits.
>
> However, please read on.
>
>>> set INPROGRESS
>>> call action
>>>    :
>>>    :
>>> <migrate event channel to B>
>>>    :			get event
>> Cannot happen, event is masked (i.e. all that would happen is
>> that the event occurrence would be logged evtchn_pending).
>>
>>>    :			INPROGRESS set? -> EOI, return
>>>    :
>>> action returns
>>> clear INPROGRESS
>>> EOI
>> unmask event, checking for whether the event got re-bound (and
>> doing the unmask through a hypercall if necessary), thus re-raising
>> the event in any case
> Yes. I agree. So let's come up with a new theory. Right now I'm still
> looking at xen/next. Correct me if I'm mistaken:
>
> mask_ack_pirq will:
>  1. chip->mask
>  2. chip->ack
>
> Where chip->ack will:
>  1. move_native_irq
>  2. clear_evtchn.
>
> Now if you look into move_native_irq, it will:
>  1. chip->mask (gratuitous)
>  2. move
>  3. chip->unmask (aiiiiiie).
>
> That explains why edge_irq still fixed the problem.

Good point.  I guess the simplest fix in that case would have been to
use move_masked_irq()...

The current fix is not wrong, so we can leave it as-is upstream for now.

But I think I will try Jan's idea about masking/clearing in the event
upcall then using fasteoi as the handler.

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-27 20:43           ` Daniel Stodden
  2010-08-27 21:49             ` Daniel Stodden
  2010-08-27 23:43             ` Jeremy Fitzhardinge
@ 2010-08-30  8:03             ` Jan Beulich
  2010-08-30  8:43               ` Jan Beulich
  2010-08-30 16:36               ` Jeremy Fitzhardinge
  2 siblings, 2 replies; 27+ messages in thread
From: Jan Beulich @ 2010-08-30  8:03 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec

>>> On 27.08.10 at 22:43, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> So let's come up with a new theory. Right now I'm still
> looking at xen/next. Correct me if I'm mistaken:
> 
> mask_ack_pirq will:
>  1. chip->mask
>  2. chip->ack
> 
> Where chip->ack will:
>  1. move_native_irq
>  2. clear_evtchn.
> 
> Now if you look into move_native_irq, it will:
>  1. chip->mask (gratuitous)
>  2. move
>  3. chip->unmask (aiiiiiie).
> 
> That explains why edge_irq still fixed the problem.

This totals to it having been wrong to break up -mask() and ->ack() -
when using the combined ->mask_ack() (like in XCP etc) it would have
been pretty obvious that move_native_irq() cannot go between
mask() and ack().

For us, using fasteoi, move_native_irq() sits in ->eoi(), before
un-masking. One could, as Jeremy suggests, call move_masked_irq()
here, but I didn't want to duplicate the IRQ_DISABLED check done
in move_native_irq(), mainly to not depend on following potential
future changes (additions) to the set of conditions checked there.

Helpful would be if the function returned whether it actually went
through the mask/unmask pair, as I'm not sure the double
unmasking really is a good idea, especially in the PIRQ case (for
the moment I'm considering putting an already-unmasked check
into both ->eoi() handlers).

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-30  8:03             ` Jan Beulich
@ 2010-08-30  8:43               ` Jan Beulich
  2010-08-30  8:48                 ` Keir Fraser
  2010-08-30 16:36               ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2010-08-30  8:43 UTC (permalink / raw)
  To: Daniel Stodden, Keir Fraser; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec

>>> On 30.08.10 at 10:03, "Jan Beulich" <JBeulich@novell.com> wrote:
> Helpful would be if the function returned whether it actually went
> through the mask/unmask pair, as I'm not sure the double
> unmasking really is a good idea, especially in the PIRQ case (for
> the moment I'm considering putting an already-unmasked check
> into both ->eoi() handlers).

Actually, it seems to me that this check really (also) belongs into
unmask_evtchn(). Keir (I think you wrote this code originally), is
there a reason (other than the implied assumption that it won't
get called on an already unmasked event channel) the function
uses sync_clear_bit() rather than sync_test_and_clear_bit(),
doing the other actions only if the bit wasn't already clear, just
like Xen itself does for the respective hypercall implementation?

Thanks, Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-30  8:43               ` Jan Beulich
@ 2010-08-30  8:48                 ` Keir Fraser
  2010-08-30  9:06                   ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Keir Fraser @ 2010-08-30  8:48 UTC (permalink / raw)
  To: Jan Beulich, Daniel Stodden; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec

On 30/08/2010 09:43, "Jan Beulich" <JBeulich@novell.com> wrote:

>>>> On 30.08.10 at 10:03, "Jan Beulich" <JBeulich@novell.com> wrote:
>> Helpful would be if the function returned whether it actually went
>> through the mask/unmask pair, as I'm not sure the double
>> unmasking really is a good idea, especially in the PIRQ case (for
>> the moment I'm considering putting an already-unmasked check
>> into both ->eoi() handlers).
> 
> Actually, it seems to me that this check really (also) belongs into
> unmask_evtchn(). Keir (I think you wrote this code originally), is
> there a reason (other than the implied assumption that it won't
> get called on an already unmasked event channel) the function
> uses sync_clear_bit() rather than sync_test_and_clear_bit(),
> doing the other actions only if the bit wasn't already clear, just
> like Xen itself does for the respective hypercall implementation?

Well, in 2.6.18 it was at least very unlikely that unmask_evtchn() would be
called on an already-unmasked port. And the implementation of
unmask-evtchn() is safe in the case that it is called for an
already-unmasked port -- it just does a bit more work than necessary in that
case.

 -- Keir

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-30  8:48                 ` Keir Fraser
@ 2010-08-30  9:06                   ` Jan Beulich
  2010-08-30  9:15                     ` Keir Fraser
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2010-08-30  9:06 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec, Daniel Stodden

>>> On 30.08.10 at 10:48, Keir Fraser <keir.fraser@eu.citrix.com> wrote:
> Well, in 2.6.18 it was at least very unlikely that unmask_evtchn() would be
> called on an already-unmasked port. And the implementation of
> unmask-evtchn() is safe in the case that it is called for an
> already-unmasked port -- it just does a bit more work than necessary in that
> case.

... plus possibly causes an unnecessary upcall.

However, do you also think that pirq_unmask_and_notify() is safe
to be called twice? I would think the double EOI potentially sent to
Xen could lead to an interrupt getting ack-ed that didn't even get
started to be serviced yet. And this, afaict, can happen in 2.6.18
as well (ack_pirq() -> move_native_irq() -> disable_pirq()/
enable_pirq() -> pirq_unmask_and_notify() followed by end_pirq()
-> pirq_unmask_and_notify()). Here, however, you couldn't even
use the mask bit to detect the situation, since the masking only
happens after already having called move_native_irq() (i.e. the
event channel will be masked when you get into
pirq_unmask_and_notify() the second time).

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen   event channels
  2010-08-30  9:06                   ` Jan Beulich
@ 2010-08-30  9:15                     ` Keir Fraser
  2010-08-30  9:22                       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Keir Fraser @ 2010-08-30  9:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec, Daniel Stodden

On 30/08/2010 10:06, "Jan Beulich" <JBeulich@novell.com> wrote:

> However, do you also think that pirq_unmask_and_notify() is safe
> to be called twice? I would think the double EOI potentially sent to
> Xen could lead to an interrupt getting ack-ed that didn't even get
> started to be serviced yet.

Erm, well if this is a race that happens only occasionally, does it matter?
Worst case you get another interrupt straight away. Only a problem if it
happens often enough to cause a performance issue or even livelock interrupt
storm.

The obvious fix would be for the kernel to privately keep track of which
event channels or pirqs are masked and/or disabled (e.g., with two bitflags
per port). Then have the evtchn_mask flag be the OR of the two. If this is
actually a real problem at all. I doubt move_native_irq() should be doing
work very often when it is called.

 -- Keir

> And this, afaict, can happen in 2.6.18
> as well (ack_pirq() -> move_native_irq() -> disable_pirq()/
> enable_pirq() -> pirq_unmask_and_notify() followed by end_pirq()
> -> pirq_unmask_and_notify()). Here, however, you couldn't even
> use the mask bit to detect the situation, since the masking only
> happens after already having called move_native_irq() (i.e. the
> event channel will be masked when you get into
> pirq_unmask_and_notify() the second time).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen event channels
  2010-08-30  9:15                     ` Keir Fraser
@ 2010-08-30  9:22                       ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2010-08-30  9:22 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Jeremy Fitzhardinge, Xen-devel, Tom Kopec, Daniel Stodden

>>> On 30.08.10 at 11:15, Keir Fraser <keir.fraser@eu.citrix.com> wrote:
> On 30/08/2010 10:06, "Jan Beulich" <JBeulich@novell.com> wrote:
> 
>> However, do you also think that pirq_unmask_and_notify() is safe
>> to be called twice? I would think the double EOI potentially sent to
>> Xen could lead to an interrupt getting ack-ed that didn't even get
>> started to be serviced yet.
> 
> Erm, well if this is a race that happens only occasionally, does it matter?
> Worst case you get another interrupt straight away. Only a problem if it
> happens often enough to cause a performance issue or even livelock interrupt
> storm.

I wasn't worried about the level case, but rather the edge one. I
forgot, however, that edge ones are ACKTYPE_NONE in Xen, so
indeed there shouldn't be a problem unless IRQs get moved
around at a very high rate (which clearly they shouldn't).

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-30  8:03             ` Jan Beulich
  2010-08-30  8:43               ` Jan Beulich
@ 2010-08-30 16:36               ` Jeremy Fitzhardinge
  2010-08-31  6:38                 ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-30 16:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Tom Kopec, Daniel Stodden

 On 08/30/2010 01:03 AM, Jan Beulich wrote:
> This totals to it having been wrong to break up -mask() and ->ack() -
> when using the combined ->mask_ack() (like in XCP etc) it would have
> been pretty obvious that move_native_irq() cannot go between
> mask() and ack().

Probably, but I just viewed the mask_ack() as an optimisation that
didn't need to be addressed on the first pass.  I converted the basic
event channels to fasteoi and it seems to work OK.  I'll look at pirq today.

> For us, using fasteoi, move_native_irq() sits in ->eoi(), before
> un-masking. One could, as Jeremy suggests, call move_masked_irq()
> here, but I didn't want to duplicate the IRQ_DISABLED check done
> in move_native_irq(), mainly to not depend on following potential
> future changes (additions) to the set of conditions checked there.

Is there actually a problem with moving a IRQ_DISABLED interrupt?  If
so, shouldn't that IRQ_DISABLED check also be in move_masked_irq()?

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] Fix lost interrupt race in Xen  event channels
  2010-08-30 16:36               ` Jeremy Fitzhardinge
@ 2010-08-31  6:38                 ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2010-08-31  6:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Tom Kopec, Daniel Stodden

 >>> On 30.08.10 at 18:36, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 08/30/2010 01:03 AM, Jan Beulich wrote:
>> For us, using fasteoi, move_native_irq() sits in ->eoi(), before
>> un-masking. One could, as Jeremy suggests, call move_masked_irq()
>> here, but I didn't want to duplicate the IRQ_DISABLED check done
>> in move_native_irq(), mainly to not depend on following potential
>> future changes (additions) to the set of conditions checked there.
> 
> Is there actually a problem with moving a IRQ_DISABLED interrupt?  If
> so, shouldn't that IRQ_DISABLED check also be in move_masked_irq()?

I don't know, all I do know is that it initially wasn't that way, but got
changed to this at some point. Maybe it's more like being pointless
to move a disabled interrupt?

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Using handle_fasteoi_irq for pirqs
  2010-08-25  7:52 ` [Xen-devel] " Jan Beulich
  2010-08-25 10:04   ` Daniel Stodden
  2010-08-25 17:54     ` Jeremy Fitzhardinge
@ 2010-09-03 18:46   ` Jeremy Fitzhardinge
  2010-09-06  7:58     ` Jan Beulich
  2 siblings, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-09-03 18:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Xen-devel, Tom Kopec, Daniel Stodden

 On 08/25/2010 12:52 AM, Jan Beulich wrote:
> I do however agree that using handle_level_irq() is problematic
> (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html),
> but as said there I think using the fasteoi logic is preferable.

I've been looking at this again.

For non-pirq interrupts, fasteoi seems like a solid win.  It looks like
an overall simplification and I haven't seen any problems.

However, I've had more trouble extending this to pirq.  My first attempt
appeared to work in emulation, but when I run it on real hardware, msi
interrupts are not getting through.  If I boot with "pci=nomsi" then it
sometimes works, but it often crashes Xen (see separate mail).

Part of the problem is that I'm not really sure what the various
irq_chip functions are really supposed to do, and the documentation is
awful.

.startup and .shutdown I understand, and I think they're being called
when we expect them to be (ie, when the driver registers an irq for the
first time.

Using .startup/.shutdown for enable/disable seems very heavyweight.  Do
we really want to be rebinding the pirq each time?  Isn't unmask/masking
the event channel sufficient?


At the moment my xen_evtchn_do_upcall() is masking and clearing the
event channel before calling into generic_handle_irq_desc(), which will
call handle_fasteoi_irq fairly directly.  That runs straight through and
the priq_chip's eoi just does an EOI on the pirq if Xen says it needs one.

But apparently this isn't enough.  Is there anything else I should be doing?

(I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq
eoi flags, but I haven't tested it yet.  I'm also not really sure what
the advantage of it is.)

Thanks,
    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Using handle_fasteoi_irq for pirqs
  2010-09-03 18:46   ` Using handle_fasteoi_irq for pirqs Jeremy Fitzhardinge
@ 2010-09-06  7:58     ` Jan Beulich
  2010-09-07  1:53       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2010-09-06  7:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, Xen-devel, Tom Kopec, Daniel Stodden

 >>> On 03.09.10 at 20:46, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Using .startup/.shutdown for enable/disable seems very heavyweight.  Do
> we really want to be rebinding the pirq each time?  Isn't unmask/masking
> the event channel sufficient?

Depends - the original (2.6.18) implementation only makes enable_pirq()
a conditional startup (and really startup_pirq() is conditional too), while
disable_pirq() does nothing at all. While forward porting, considering
the contexts in which ->disable() gets called (namely note_interrupt())
and after initially having had no ->enable()/->disable() methods at all
(for default_enable() calling ->unmask() anyway, and default_disable()
being a no-op as much as disable_pirq() was), I got to the conclusion
that it might be better to do a full shutdown/startup, since it isn't
known whether a disable is permanent or just temporary.

Now part of the question whether this is actually a good choice is
why default_disable() doesn't mask the affected IRQ - likely because
IRQ_DISABLED is checked for and handled accordingly in all non-
trivial flow handlers.

The other aspect is that with the (original) switch to use
handle_level_irq() we noticed at some point that the disabling of
bad IRQs (where e.g. storms are noticed) didn't work anymore,
due to that logic sitting in ->end(), which doesn't usually get
called at all (nor does any other method except ->unmask() for
the level case). Right now I don't really remember whether making
->disable() an alias of shutdown wasn't just a (failed iirc) attempt
at getting Xen to know of the need to shut down such a bad IRQ.
After the switch to fasteoi this logic should now be working again
independent of what ->disable() does, so I will have to consider
to un-alias disable_pirq() and shutdown_pirq() again.

> At the moment my xen_evtchn_do_upcall() is masking and clearing the
> event channel before calling into generic_handle_irq_desc(), which will
> call handle_fasteoi_irq fairly directly.  That runs straight through and
> the priq_chip's eoi just does an EOI on the pirq if Xen says it needs one.
> 
> But apparently this isn't enough.  Is there anything else I should be doing?

I can't see anything, and our kernel also doesn't.

> (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq
> eoi flags, but I haven't tested it yet.  I'm also not really sure what
> the advantage of it is.)

This is for you to avoid the EOI hypercall if it would be a no-op in
Xen anyway.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Using handle_fasteoi_irq for pirqs
  2010-09-06  7:58     ` Jan Beulich
@ 2010-09-07  1:53       ` Jeremy Fitzhardinge
  2010-09-07  6:58         ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-09-07  1:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Xen-devel, Tom Kopec, Daniel Stodden

 On 09/06/2010 05:58 PM, Jan Beulich wrote:
>  >>> On 03.09.10 at 20:46, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> Using .startup/.shutdown for enable/disable seems very heavyweight.  Do
>> we really want to be rebinding the pirq each time?  Isn't unmask/masking
>> the event channel sufficient?
> Depends - the original (2.6.18) implementation only makes enable_pirq()
> a conditional startup (and really startup_pirq() is conditional too), while
> disable_pirq() does nothing at all. While forward porting, considering
> the contexts in which ->disable() gets called (namely note_interrupt())
> and after initially having had no ->enable()/->disable() methods at all
> (for default_enable() calling ->unmask() anyway, and default_disable()
> being a no-op as much as disable_pirq() was), I got to the conclusion
> that it might be better to do a full shutdown/startup, since it isn't
> known whether a disable is permanent or just temporary.
>
> Now part of the question whether this is actually a good choice is
> why default_disable() doesn't mask the affected IRQ - likely because
> IRQ_DISABLED is checked for and handled accordingly in all non-
> trivial flow handlers.
>
> The other aspect is that with the (original) switch to use
> handle_level_irq() we noticed at some point that the disabling of
> bad IRQs (where e.g. storms are noticed) didn't work anymore,
> due to that logic sitting in ->end(), which doesn't usually get
> called at all (nor does any other method except ->unmask() for
> the level case). Right now I don't really remember whether making
> ->disable() an alias of shutdown wasn't just a (failed iirc) attempt
> at getting Xen to know of the need to shut down such a bad IRQ.
> After the switch to fasteoi this logic should now be working again
> independent of what ->disable() does, so I will have to consider
> to un-alias disable_pirq() and shutdown_pirq() again.

At the moment, I'm using this:

static struct irq_chip xen_pirq_chip __read_mostly = {
	.name		= "xen-pirq",

	.startup	= startup_pirq,
	.shutdown	= shutdown_pirq,

	.enable		= pirq_eoi,
	.unmask		= unmask_irq,

	.disable	= mask_irq,
	.mask		= mask_irq,

	.eoi		= ack_pirq,
	.end		= end_pirq,

	.set_affinity	= set_affinity_irq,

	.retrigger	= retrigger_irq,
};


which seems to work OK now.  The "enable=pirq_eoi" is essentially the
same as "enable=startup_pirq", because your startup_pirq just does an
EOI if the evtchn is already valid (and EOI always ends up unmasking too).

ack_pirq and pirq_eoi are almost the same, except the former also does
the call to move_masked_irq().

>> At the moment my xen_evtchn_do_upcall() is masking and clearing the
>> event channel before calling into generic_handle_irq_desc(), which will
>> call handle_fasteoi_irq fairly directly.  That runs straight through and
>> the priq_chip's eoi just does an EOI on the pirq if Xen says it needs one.
>>
>> But apparently this isn't enough.  Is there anything else I should be doing?
> I can't see anything, and our kernel also doesn't.

Where's the source to your kernel?  The one I looked at most recently
was using handle_level_irq for everything.

But anyway, I found my bug; I'd overlooked where MSI interrupts were
being set up, and they were still using handle_edge_irq; changing them
to fasteoi seems to have done trick.

>> (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq
>> eoi flags, but I haven't tested it yet.  I'm also not really sure what
>> the advantage of it is.)
> This is for you to avoid the EOI hypercall if it would be a no-op in
> Xen anyway.

Yes, but there's also PHYSDEVOP_irq_status_query call.  Does the "needs
EOI" actually change from moment to moment in a way where having a
shared page makes sense?

    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Using handle_fasteoi_irq for pirqs
  2010-09-07  1:53       ` Jeremy Fitzhardinge
@ 2010-09-07  6:58         ` Jan Beulich
  2010-09-07  8:02           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2010-09-07  6:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, Xen-devel, Tom Kopec, Daniel Stodden

 >>> On 07.09.10 at 03:53, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 09/06/2010 05:58 PM, Jan Beulich wrote:
>>  >>> On 03.09.10 at 20:46, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Where's the source to your kernel?  The one I looked at most recently
> was using handle_level_irq for everything.

Indeed, I did the switch in the master (and SLE11 SP1) kernels only
relatively recently (and only the master one is already visible to the
outside) - look at either ftp://ftp.suse.com/pub/projects/kernel/kotd/master/
or (un-expanded tree of patches) http://gitorious.org/opensuse/kernel-source/trees/master.

>>> (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq
>>> eoi flags, but I haven't tested it yet.  I'm also not really sure what
>>> the advantage of it is.)
>> This is for you to avoid the EOI hypercall if it would be a no-op in
>> Xen anyway.
> 
> Yes, but there's also PHYSDEVOP_irq_status_query call.  Does the "needs
> EOI" actually change from moment to moment in a way where having a
> shared page makes sense?

No, it doesn't - using this function you can of course maintain the
bitmap on your own (which we also fall back to if PHYSDEVOP_pirq_eoi_gmfn
is not supported by the hypervisor). The actual advantage of using
PHYSDEVOP_pirq_eoi_gmfn is that it implies an unmask of the
corresponding event channel - see
http://xenbits.xensource.com/xen-unstable.hg?rev/c820bf73a914.

Also, regarding your earlier question on .disable - I just ran across
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/51b2b0d0921c,
which makes me think that .enable indeed should remain an alias of
.startup, but I also think that .disable should nevertheless do the
masking (i.e. be an alias of .mask)

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Using handle_fasteoi_irq for pirqs
  2010-09-07  6:58         ` Jan Beulich
@ 2010-09-07  8:02           ` Jeremy Fitzhardinge
  2010-09-07  8:58             ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Jeremy Fitzhardinge @ 2010-09-07  8:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Xen-devel, Tom Kopec, Daniel Stodden

 On 09/07/2010 04:58 PM, Jan Beulich wrote:
>> Yes, but there's also PHYSDEVOP_irq_status_query call.  Does the "needs
>> EOI" actually change from moment to moment in a way where having a
>> shared page makes sense?
> No, it doesn't - using this function you can of course maintain the
> bitmap on your own (which we also fall back to if PHYSDEVOP_pirq_eoi_gmfn
> is not supported by the hypervisor). The actual advantage of using
> PHYSDEVOP_pirq_eoi_gmfn is that it implies an unmask of the
> corresponding event channel - see
> http://xenbits.xensource.com/xen-unstable.hg?rev/c820bf73a914.

Yes, though it seems like an odd way of implementing it.  The shared
memory region for EOI flags looks like overkill when all it saves is one
hypercall on pirq setup, and making that also have the side-effect of
changing an existing hypercall's behaviour is surprising.  Too late now,
I suppose, but it would seem that just adding a new PHYSDEVOP_eoi_unmask
hypercall would have been a simpler approach.  (But unmask typically
doesn't need a hypercall anyway, unless you've managed to get into a
cross-cpu unmask situation, so it doesn't seem like a big saving?)

Also, in the restore path, the code does a BUG if it fails to call
PHYSDEVOP_pirq_eoi_gmfn again.  Couldn't it just clear
pirq_eoi_does_unmask (and re-initialize all the needs-eoi flags using
PHYSDEVOP_irq_status_query) and carry on the old way?

But the comment in PHYSDEVOP_irq_status_query says:

        /*
         * Even edge-triggered or message-based IRQs can need masking from
         * time to time. If teh guest is not dynamically checking for this
         * via the new pirq_eoi_map mechanism, it must conservatively always
         * execute the EOI hypercall. In practice, this only really makes a
         * difference for maskable MSI sources, and if those are supported
         * then dom0 is probably modern anyway.
         */

which suggests that the "needs EOI" status for each pirq can change
after the pirq has been initialized (or if not, why isn't using
PHYSDEVOP_irq_status_query sufficient?).  OTOH, if it can change
spontaneously, then it all seems a bit racy...

IOW, what does "from time to time" mean?

> Also, regarding your earlier question on .disable - I just ran across
> http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/51b2b0d0921c,
> which makes me think that .enable indeed should remain an alias of
> .startup, but I also think that .disable should nevertheless do the
> masking (i.e. be an alias of .mask)

That particular change has the strange asymmetry of making .enable do a
startup (which only does eoi if the event channel is still valid),
.disable a no-op, and .end shuts the pirq down iff the irq is pending
and disabled.  I see how this works in the context of the old __do_IRQ
stuff in 2.6.18 though.

What's the specific deadlock mentioned in the commit comment?  Is that
still an issue with Xen's auto-EOI-after-timeout behaviour?

Thanks,
    J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Using handle_fasteoi_irq for pirqs
  2010-09-07  8:02           ` Jeremy Fitzhardinge
@ 2010-09-07  8:58             ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2010-09-07  8:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, Xen-devel, Keir Fraser, Tom Kopec, Daniel Stodden

 >>> On 07.09.10 at 10:02, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 09/07/2010 04:58 PM, Jan Beulich wrote:
> Also, in the restore path, the code does a BUG if it fails to call
> PHYSDEVOP_pirq_eoi_gmfn again.  Couldn't it just clear
> pirq_eoi_does_unmask (and re-initialize all the needs-eoi flags using
> PHYSDEVOP_irq_status_query) and carry on the old way?

That would seem possible, yes.

> But the comment in PHYSDEVOP_irq_status_query says:
> 
>         /*
>          * Even edge-triggered or message-based IRQs can need masking from
>          * time to time. If teh guest is not dynamically checking for this
>          * via the new pirq_eoi_map mechanism, it must conservatively always
>          * execute the EOI hypercall. In practice, this only really makes a
>          * difference for maskable MSI sources, and if those are supported
>          * then dom0 is probably modern anyway.
>          */
> 
> which suggests that the "needs EOI" status for each pirq can change
> after the pirq has been initialized (or if not, why isn't using
> PHYSDEVOP_irq_status_query sufficient?).  OTOH, if it can change
> spontaneously, then it all seems a bit racy...
> 
> IOW, what does "from time to time" mean?

Just look at the places where set_pirq_eoi() gets called in
xen/arch/x86/irq.c: The flag (obviously) always gets set for IRQs
that aren't ACKTYPE_NONE, but there is an additional case in
__do_IRQ_guest() where Xen wants the notification to re-enable
the interrupt after disabling it due to all possible handler domains
having it pending already.

And you see that Xen would force you to always execute the EOI
notification hypercall without the shared page
(PHYSDEVOP_irq_status_query setting XENIRQSTAT_needs_eoi
unconditionally), so using it you save a hypercall on each interrupt
that doesn't need an EOI (namely MSI-X and maskable MSI; I don't
think edge triggered ones are very important), and if you (as usual)
don't need to unmask via hypercall, you don't need any hypercall
at all in the IRQ exit path in these cases.

>> Also, regarding your earlier question on .disable - I just ran across
>> http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/51b2b0d0921c,
>> which makes me think that .enable indeed should remain an alias of
>> .startup, but I also think that .disable should nevertheless do the
>> masking (i.e. be an alias of .mask)
> 
> That particular change has the strange asymmetry of making .enable do a
> startup (which only does eoi if the event channel is still valid),
> .disable a no-op, and .end shuts the pirq down iff the irq is pending
> and disabled.  I see how this works in the context of the old __do_IRQ
> stuff in 2.6.18 though.

And it should similarly be fine with handle_fasteoi_irq() afaict,
provided .end and .eoi reference the same function.

The asymmetry was part of what made me change .disable to
alias .shutdown.

> What's the specific deadlock mentioned in the commit comment?  Is that
> still an issue with Xen's auto-EOI-after-timeout behaviour?

Honestly, I don't know. Keir?

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-09-07  8:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-24 21:35 [GIT PULL] Fix lost interrupt race in Xen event channels Jeremy Fitzhardinge
2010-08-24 21:35 ` Jeremy Fitzhardinge
2010-08-25  7:52 ` [Xen-devel] " Jan Beulich
2010-08-25 10:04   ` Daniel Stodden
2010-08-25 11:24     ` Jan Beulich
2010-08-25 17:54   ` Jeremy Fitzhardinge
2010-08-25 17:54     ` Jeremy Fitzhardinge
2010-08-26  6:46     ` Jan Beulich
2010-08-26 16:32       ` Jeremy Fitzhardinge
2010-08-27  8:56         ` Jan Beulich
2010-08-27 20:43           ` Daniel Stodden
2010-08-27 21:49             ` Daniel Stodden
2010-08-27 23:43             ` Jeremy Fitzhardinge
2010-08-30  8:03             ` Jan Beulich
2010-08-30  8:43               ` Jan Beulich
2010-08-30  8:48                 ` Keir Fraser
2010-08-30  9:06                   ` Jan Beulich
2010-08-30  9:15                     ` Keir Fraser
2010-08-30  9:22                       ` Jan Beulich
2010-08-30 16:36               ` Jeremy Fitzhardinge
2010-08-31  6:38                 ` Jan Beulich
2010-09-03 18:46   ` Using handle_fasteoi_irq for pirqs Jeremy Fitzhardinge
2010-09-06  7:58     ` Jan Beulich
2010-09-07  1:53       ` Jeremy Fitzhardinge
2010-09-07  6:58         ` Jan Beulich
2010-09-07  8:02           ` Jeremy Fitzhardinge
2010-09-07  8:58             ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.