All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Jan Kiszka <jan.kiszka@web.de>,
	qemu-devel@nongnu.org, imammedo@redhat.com, rth@twiddle.net,
	ehabkost@redhat.com, jasowang@redhat.com, marcel@redhat.com,
	mst@redhat.com, pbonzini@redhat.com, alex.williamson@redhat.com,
	wexu@redhat.com
Subject: Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU
Date: Wed, 27 Apr 2016 16:31:13 +0200	[thread overview]
Message-ID: <20160427143109.GA6496@potion> (raw)
In-Reply-To: <20160427072923.GG28545@pxdev.xzpeter.org>

2016-04-27 15:29+0800, Peter Xu:
> On Tue, Apr 26, 2016 at 04:19:00PM +0200, Radim Krčmář wrote:
>> 2016-04-26 15:34+0800, Peter Xu:
>> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
>> > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
>> > +/*
>> > + * This is to satisfy the hack in Linux kernel. One hack of it is to
>> > + * simulate clearing the Remote IRR bit of IOAPIC entry using the
>> > + * following:
>> > + *
>> > + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
>> > + * Otherwise, we simulate the EOI message manually by changing the trigger
>> > + * mode to edge and then back to level, with RTE being masked during
>> > + * this."
>> > + *
>> > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
>> > + *
>> > + * This is based on the assumption that, Remote IRR bit will be
>> > + * cleared by IOAPIC hardware for edge-triggered interrupts (I
>> > + * believe that's what the IOAPIC version 0x1X hardware does).
>> 
>> I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
>> on EOI broadcast from LAPIC -- does that change with IR?
> 
> IIUC, ioapic_ack_level() should be the one to handle EOI when IR is
> disabled. And, the EOI broadcast should be happening at:
> 
> 	ack_APIC_irq();
> 
> While, after that, we can see some more lines:
> 
> 	/*
> 	 * Tail end of clearing remote IRR bit (either by delivering the EOI
> 	 * message via io-apic EOI register write or simulating it using
> 	 * mask+edge followed by unnask+level logic) manually when the
> 	 * level triggered interrupt is seen as the edge triggered interrupt
> 	 * at the cpu.
> 	 */
> 	if (!(v & (1 << (i & 0x1f)))) {
> 		atomic_inc(&irq_mis_count);
> 		eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
> 	}
> 
> What I understand the above is that: first of all, we will do EOI
> broadcast. However, if we found that one level-triggered interrupt
> is treated as edge-triggered interrupt (that is exactly what I have
> encountered below), we will do one more explicit EOI in
> eoi_ioapic_pin(), in which we played the edge-mask/level-unmask
> trick for IOAPIC with version 0x1X.

Indeed, thanks for the explanation.

> For IR enabled case, we just do both without checking (see
> ioapic_ir_ack_level()).

(IR with IO-APIC below version 0x20 probably does not exist in the wild.
 I don't find any reason why the interaction would bug, though.)

>> > + */
>> > +static inline void
>> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
>> > +{
>> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
>> > +        /* Level triggered interrupts, make sure remote IRR is zero */
>> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
>> 
>> (You can just unconditionally zero it, edge doesn't care.)
> 
> Ah! I made a mistake. I suppose what I really want is:
> 
> +    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
> +        /* Edge-triggered interrupts, make sure remote IRR is zero */
> +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> +    }
> 
> Though both should help do the trick, I should be using this new
> one in v5.

(You'd need to look at the old value for this to work.)

>> > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>> >                      s->ioredtbl[index] &= ~0xffffffffULL;
>> >                      s->ioredtbl[index] |= val;
>> >                  }
>> > +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
>> 
>> I think this can be done only in the else branch of (s->ioregsel & 1).
> 
> Yes. I can move it there, but there will be hidden assumption (or
> say, truth...) that these magic bits are inside entry bits 31-0, and
> people might be confused if we do not know that.  IMHO, for better
> readability of code, I would still prefer to put it here (it means
> "we need to make sure the entry satisfy some kind of rule, but we do
> not need to know further about what the rule is"). If you still
> insist, I'd like to take your advice though. :)

I don't.  If you clear it only on edge->level transition, then those two
also behave the same.

>> > I am still looking into guest part codes. Although the above patch
>> > should solve the issue, there are still issues in guest codes when
>> > IR is enabled:
>> > 
>> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
>> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
>> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
>> 
>> "required" is a way of saying that the opposite is undefined.
>> No need to think about it in IOMMU.
> 
> Why? Without correct vector information, IOAPIC will not be able to
> know which entry to clear the Remote IRR bit (please check
> ioapic_eoi_broadcast())?

IOAPIC won't get correct EOI and Intel made it into an OS bug, because
there was no good action that the hardware could take.  (We have a lot
more freedom, but I think that partially fixing something that doesn't
work on real hardware is a wasted effort.)

Or did you mean that mismatched vector is a possible source of the fixed
bug?  (I originally dismissed it, because real hardware works.)

>> > - I encountered that level-triggered entries in IOAPIC is marked as
>> >   edge-triggered interrupt in APIC (which is strange)...
>> 
>> What/where do you mean?
>> (The only difference I know of is that level triggered vectors in LAPIC
>>  have their respective TMR bit set while edge do not.)
> 
> Exactly. Here is what I mean:
> 
> static void apic_eoi(APICCommonState *s)
> {
>     int isrv;
>     isrv = get_highest_priority_int(s->isr);
>     if (isrv < 0)
>         return;
>     apic_reset_bit(s->isr, isrv);
>     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr, isrv)) {
>         ioapic_eoi_broadcast(isrv);
>     }
>     apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
>     apic_update_irq(s);
> }
> 
> APIC will notify IOAPIC only if the corresponding vector in TMR bit
> is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a
> level-triggered interrupt in APIC registers). What I have traced is
> that, the EOI broadcast is missing because this bit is cleared in
> APIC TMR while it should be set. I need some more tests to double
> confirm this though, in case I made any mistake.

(There are two "legal" situations where TMR can be 0 and IOAPIC sets
 remote IRR -- if edge and level interrupts are assigned to the same
 vector and if IOAPIC is level while IR and OS edge, both would bug on
 real hardware too ...)

Does QEMU bug with TCG?

> (P.S. Actually I saw some similiar comments in kernel codes around,
> please check the long comments in ioapic_ack_level().  Not sure
> whether these are related.)

I hope we didn't emulate the hardware bug. :)

  reply	other threads:[~2016-04-27 14:31 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19  8:38 [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 01/16] acpi: enable INTR for DMAR report structure Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 02/16] intel_iommu: allow queued invalidation for IR Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 03/16] intel_iommu: set IR bit for ECAP register Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 04/16] acpi: add DMAR scope definition for root IOAPIC Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 05/16] intel_iommu: define interrupt remap table addr register Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 06/16] intel_iommu: handle interrupt remap enable Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 07/16] intel_iommu: define several structs for IOMMU IR Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 08/16] intel_iommu: provide helper function vtd_get_iommu Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 09/16] intel_iommu: add IR translation faults defines Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 10/16] intel_iommu: Add support for PCI MSI remap Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 11/16] q35: ioapic: add support for emulated IOAPIC IR Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 12/16] ioapic: introduce ioapic_entry_parse() helper Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 13/16] intel_iommu: add support for split irqchip Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 14/16] q35: add "int-remap" flag to enable intr Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 15/16] intel_iommu: introduce IEC notifiers Peter Xu
2016-04-19  8:38 ` [Qemu-devel] [PATCH v4 16/16] ioapic: register VT-d IEC invalidate notifier Peter Xu
2016-04-25  5:16 ` [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU Jan Kiszka
2016-04-25  7:18   ` Peter Xu
2016-04-25  7:24     ` Jan Kiszka
2016-04-25 16:38       ` Radim Krčmář
2016-04-26  7:34       ` Peter Xu
2016-04-26  7:57         ` Jan Kiszka
2016-04-26  8:15           ` Jan Kiszka
2016-04-26 10:38             ` Peter Xu
2016-04-26 10:51               ` Jan Kiszka
2016-04-26 11:40                 ` Peter Xu
2016-04-26 14:24                   ` Jan Kiszka
2016-04-26 14:59                     ` Radim Krčmář
2016-04-26 15:28                       ` Jan Kiszka
2016-04-26 16:07                         ` Radim Krčmář
2016-04-26 17:47                           ` Jan Kiszka
2016-04-26 14:19         ` Radim Krčmář
2016-04-27  7:29           ` Peter Xu
2016-04-27 14:31             ` Radim Krčmář [this message]
2016-04-28  5:27               ` Peter Xu
2016-04-28 16:24                 ` Radim Krčmář
2016-04-28  6:06               ` Peter Xu
2016-04-28  6:44                 ` Peter Xu
2016-05-09 11:58         ` Paolo Bonzini
2016-05-10  6:09           ` Peter Xu
2016-05-10  8:58             ` Paolo Bonzini
2016-05-10 10:10               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160427143109.GA6496@potion \
    --to=rkrcmar@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=jan.kiszka@web.de \
    --cc=jasowang@redhat.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=wexu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.