All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Jan Kiszka <jan.kiszka@web.de>,
	qemu-devel@nongnu.org, imammedo@redhat.com, rth@twiddle.net,
	ehabkost@redhat.com, jasowang@redhat.com, marcel@redhat.com,
	mst@redhat.com, pbonzini@redhat.com, alex.williamson@redhat.com,
	wexu@redhat.com
Subject: Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU
Date: Wed, 27 Apr 2016 15:29:23 +0800	[thread overview]
Message-ID: <20160427072923.GG28545@pxdev.xzpeter.org> (raw)
In-Reply-To: <20160426141859.GB19789@potion>

On Tue, Apr 26, 2016 at 04:19:00PM +0200, Radim Krčmář wrote:
> 2016-04-26 15:34+0800, Peter Xu:
> > Hi, Jan,
> > 
> > The above issue should be caused by EOI missing of level-triggered
> > interrupts. Before that, I was always using edge-triggered
> > interrupts for test, so didn't encounter this one. Would you please
> > help try below patch? It can be applied directly onto the series,
> > and should solve the issue (it works on my test vm, and I'll take it
> > in v5 as well if it also works for you):
> > 
> > -------------------------
> > 
> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
> > +/*
> > + * This is to satisfy the hack in Linux kernel. One hack of it is to
> > + * simulate clearing the Remote IRR bit of IOAPIC entry using the
> > + * following:
> > + *
> > + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
> > + * Otherwise, we simulate the EOI message manually by changing the trigger
> > + * mode to edge and then back to level, with RTE being masked during
> > + * this."
> > + *
> > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
> > + *
> > + * This is based on the assumption that, Remote IRR bit will be
> > + * cleared by IOAPIC hardware for edge-triggered interrupts (I
> > + * believe that's what the IOAPIC version 0x1X hardware does).
> 
> I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
> on EOI broadcast from LAPIC -- does that change with IR?

IIUC, ioapic_ack_level() should be the one to handle EOI when IR is
disabled. And, the EOI broadcast should be happening at:

	ack_APIC_irq();

While, after that, we can see some more lines:

	/*
	 * Tail end of clearing remote IRR bit (either by delivering the EOI
	 * message via io-apic EOI register write or simulating it using
	 * mask+edge followed by unnask+level logic) manually when the
	 * level triggered interrupt is seen as the edge triggered interrupt
	 * at the cpu.
	 */
	if (!(v & (1 << (i & 0x1f)))) {
		atomic_inc(&irq_mis_count);
		eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
	}

What I understand the above is that: first of all, we will do EOI
broadcast. However, if we found that one level-triggered interrupt
is treated as edge-triggered interrupt (that is exactly what I have
encountered below), we will do one more explicit EOI in
eoi_ioapic_pin(), in which we played the edge-mask/level-unmask
trick for IOAPIC with version 0x1X.

For IR enabled case, we just do both without checking (see
ioapic_ir_ack_level()).

So that's why I think this should not happen if either way
works... Or say, if without this patch, both "EOI broadcast" and
"explicit EOI (hacky version)" are not working for IR case. And I am
still looking for the reason for previous one (this patch fix the
latter one).

> 
> > + *                                                             So
> > + * if we are emulating it, we'd better do it the same here, so that
> > + * the guest kernel hack will work as well on QEMU.
> 
> Totally.
> 
> > + * Without this, level-triggered interrupts in IR mode might fail to
> > + * work correctly.
> 
> (I don't really understand why it worked before.)

Yes, actually what I want to try is to have one IOMMU hardware
machine, plug e1000 (I mean real hardware) into it, and see whether
current Linux kernel IOMMU driver can cope well with level-triggered
devices (I suppose this scenario is rarely used, since
level-triggered interrupts are most legacy IIUC).

> 
> > + */
> > +static inline void
> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
> > +{
> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> > +        /* Level triggered interrupts, make sure remote IRR is zero */
> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> 
> (You can just unconditionally zero it, edge doesn't care.)

Ah! I made a mistake. I suppose what I really want is:

+    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
+        /* Edge-triggered interrupts, make sure remote IRR is zero */
+        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
+    }

Though both should help do the trick, I should be using this new
one in v5.

> 
> > +    }
> > +}
> > +
> > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
> >                      s->ioredtbl[index] &= ~0xffffffffULL;
> >                      s->ioredtbl[index] |= val;
> >                  }
> > +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
> 
> I think this can be done only in the else branch of (s->ioregsel & 1).

Yes. I can move it there, but there will be hidden assumption (or
say, truth...) that these magic bits are inside entry bits 31-0, and
people might be confused if we do not know that.  IMHO, for better
readability of code, I would still prefer to put it here (it means
"we need to make sure the entry satisfy some kind of rule, but we do
not need to know further about what the rule is"). If you still
insist, I'd like to take your advice though. :)

> 
> (If the guest kernel does level->edge->level, then remote_irr probably
>  should be cleared only on edge->level transition and not on
>  level->level, but I haven't seen that in the spec ...)

Agree. That's what my above diff is trying to fix. Thanks to point out.

> 
> >                  ioapic_service(s);
> > ------------------------
> > 
> > I am still looking into guest part codes. Although the above patch
> > should solve the issue, there are still issues in guest codes when
> > IR is enabled:
> > 
> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
> 
> "required" is a way of saying that the opposite is undefined.
> No need to think about it in IOMMU.

Why? Without correct vector information, IOAPIC will not be able to
know which entry to clear the Remote IRR bit (please check
ioapic_eoi_broadcast())?

> 
> > - I encountered that level-triggered entries in IOAPIC is marked as
> >   edge-triggered interrupt in APIC (which is strange)...
> 
> What/where do you mean?
> (The only difference I know of is that level triggered vectors in LAPIC
>  have their respective TMR bit set while edge do not.)

Exactly. Here is what I mean:

static void apic_eoi(APICCommonState *s)
{
    int isrv;
    isrv = get_highest_priority_int(s->isr);
    if (isrv < 0)
        return;
    apic_reset_bit(s->isr, isrv);
    if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr, isrv)) {
        ioapic_eoi_broadcast(isrv);
    }
    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
    apic_update_irq(s);
}

APIC will notify IOAPIC only if the corresponding vector in TMR bit
is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a
level-triggered interrupt in APIC registers). What I have traced is
that, the EOI broadcast is missing because this bit is cleared in
APIC TMR while it should be set. I need some more tests to double
confirm this though, in case I made any mistake.

(P.S. Actually I saw some similiar comments in kernel codes around,
please check the long comments in ioapic_ack_level().  Not sure
whether these are related.)

Thanks!

-- peterx

  reply	other threads:[~2016-04-27  7:29 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19  8:38 [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 01/16] acpi: enable INTR for DMAR report structure Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 02/16] intel_iommu: allow queued invalidation for IR Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 03/16] intel_iommu: set IR bit for ECAP register Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 04/16] acpi: add DMAR scope definition for root IOAPIC Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 05/16] intel_iommu: define interrupt remap table addr register Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 06/16] intel_iommu: handle interrupt remap enable Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 07/16] intel_iommu: define several structs for IOMMU IR Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 08/16] intel_iommu: provide helper function vtd_get_iommu Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 09/16] intel_iommu: add IR translation faults defines Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 10/16] intel_iommu: Add support for PCI MSI remap Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 11/16] q35: ioapic: add support for emulated IOAPIC IR Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 12/16] ioapic: introduce ioapic_entry_parse() helper Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 13/16] intel_iommu: add support for split irqchip Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 14/16] q35: add "int-remap" flag to enable intr Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 15/16] intel_iommu: introduce IEC notifiers Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 16/16] ioapic: register VT-d IEC invalidate notifier Peter Xu
2016-04-25  5:16 ` [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU Jan Kiszka
2016-04-25  7:18   ` Peter Xu
2016-04-25  7:24     ` Jan Kiszka
2016-04-25 16:38       ` Radim Krčmář
2016-04-26  7:34       ` Peter Xu
2016-04-26  7:57         ` Jan Kiszka
2016-04-26  8:15           ` Jan Kiszka
2016-04-26 10:38             ` Peter Xu
2016-04-26 10:51               ` Jan Kiszka
2016-04-26 11:40                 ` Peter Xu
2016-04-26 14:24                   ` Jan Kiszka
2016-04-26 14:59                     ` Radim Krčmář
2016-04-26 15:28                       ` Jan Kiszka
2016-04-26 16:07                         ` Radim Krčmář
2016-04-26 17:47                           ` Jan Kiszka
2016-04-26 14:19         ` Radim Krčmář
2016-04-27  7:29           ` Peter Xu [this message]
2016-04-27 14:31             ` Radim Krčmář
2016-04-28  5:27               ` Peter Xu
2016-04-28 16:24                 ` Radim Krčmář
2016-04-28  6:06               ` Peter Xu
2016-04-28  6:44                 ` Peter Xu
2016-05-09 11:58         ` Paolo Bonzini
2016-05-10  6:09           ` Peter Xu
2016-05-10  8:58             ` Paolo Bonzini
2016-05-10 10:10               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160427072923.GG28545@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=jan.kiszka@web.de \
    --cc=jasowang@redhat.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkrcmar@redhat.com \
    --cc=rth@twiddle.net \
    --cc=wexu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.