From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37230) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1avQUs-0006Sx-9Q for qemu-devel@nongnu.org; Wed, 27 Apr 2016 10:31:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1avQUp-0005Te-07 for qemu-devel@nongnu.org; Wed, 27 Apr 2016 10:31:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54293) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1avQUo-0005Sz-Ov for qemu-devel@nongnu.org; Wed, 27 Apr 2016 10:31:18 -0400 Date: Wed, 27 Apr 2016 16:31:13 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Message-ID: <20160427143109.GA6496@potion> References: <1461055122-32378-1-git-send-email-peterx@redhat.com> <571DA823.1030003@web.de> <20160425071806.GF3261@pxdev.xzpeter.org> <571DC61C.9020006@web.de> <20160426073426.GD28545@pxdev.xzpeter.org> <20160426141859.GB19789@potion> <20160427072923.GG28545@pxdev.xzpeter.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160427072923.GG28545@pxdev.xzpeter.org> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v4 00/16] IOMMU: Enable interrupt remapping for Intel IOMMU List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: Jan Kiszka , qemu-devel@nongnu.org, imammedo@redhat.com, rth@twiddle.net, ehabkost@redhat.com, jasowang@redhat.com, marcel@redhat.com, mst@redhat.com, pbonzini@redhat.com, alex.williamson@redhat.com, wexu@redhat.com 2016-04-27 15:29+0800, Peter Xu: > On Tue, Apr 26, 2016 at 04:19:00PM +0200, Radim Kr=C4=8Dm=C3=A1=C5=99 w= rote: >> 2016-04-26 15:34+0800, Peter Xu: >> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c >> > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsi= gned int size) >> > +/* >> > + * This is to satisfy the hack in Linux kernel. One hack of it is t= o >> > + * simulate clearing the Remote IRR bit of IOAPIC entry using the >> > + * following: >> > + * >> > + * "For IO-APIC's with EOI register, we use that to do an explicit = EOI. >> > + * Otherwise, we simulate the EOI message manually by changing the = trigger >> > + * mode to edge and then back to level, with RTE being masked durin= g >> > + * this." >> > + * >> > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701) >> > + * >> > + * This is based on the assumption that, Remote IRR bit will be >> > + * cleared by IOAPIC hardware for edge-triggered interrupts (I >> > + * believe that's what the IOAPIC version 0x1X hardware does). >>=20 >> I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies >> on EOI broadcast from LAPIC -- does that change with IR? >=20 > IIUC, ioapic_ack_level() should be the one to handle EOI when IR is > disabled. And, the EOI broadcast should be happening at: >=20 > ack_APIC_irq(); >=20 > While, after that, we can see some more lines: >=20 > /* > * Tail end of clearing remote IRR bit (either by delivering the EOI > * message via io-apic EOI register write or simulating it using > * mask+edge followed by unnask+level logic) manually when the > * level triggered interrupt is seen as the edge triggered interrupt > * at the cpu. > */ > if (!(v & (1 << (i & 0x1f)))) { > atomic_inc(&irq_mis_count); > eoi_ioapic_pin(cfg->vector, irq_data->chip_data); > } >=20 > What I understand the above is that: first of all, we will do EOI > broadcast. However, if we found that one level-triggered interrupt > is treated as edge-triggered interrupt (that is exactly what I have > encountered below), we will do one more explicit EOI in > eoi_ioapic_pin(), in which we played the edge-mask/level-unmask > trick for IOAPIC with version 0x1X. Indeed, thanks for the explanation. > For IR enabled case, we just do both without checking (see > ioapic_ir_ack_level()). (IR with IO-APIC below version 0x20 probably does not exist in the wild. I don't find any reason why the interaction would bug, though.) >> > + */ >> > +static inline void >> > +ioapic_fix_edge_remote_irr(uint64_t *entry) >> > +{ >> > + if (*entry & IOAPIC_LVT_TRIGGER_MODE) { >> > + /* Level triggered interrupts, make sure remote IRR is zero= */ >> > + *entry &=3D ~((uint64_t)IOAPIC_LVT_REMOTE_IRR); >>=20 >> (You can just unconditionally zero it, edge doesn't care.) >=20 > Ah! I made a mistake. I suppose what I really want is: >=20 > + if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) { > + /* Edge-triggered interrupts, make sure remote IRR is zero */ > + *entry &=3D ~((uint64_t)IOAPIC_LVT_REMOTE_IRR); > + } >=20 > Though both should help do the trick, I should be using this new > one in v5. (You'd need to look at the old value for this to work.) >> > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint= 64_t val, >> > s->ioredtbl[index] &=3D ~0xffffffffULL; >> > s->ioredtbl[index] |=3D val; >> > } >> > + ioapic_fix_edge_remote_irr(&s->ioredtbl[index]); >>=20 >> I think this can be done only in the else branch of (s->ioregsel & 1). >=20 > Yes. I can move it there, but there will be hidden assumption (or > say, truth...) that these magic bits are inside entry bits 31-0, and > people might be confused if we do not know that. IMHO, for better > readability of code, I would still prefer to put it here (it means > "we need to make sure the entry satisfy some kind of rule, but we do > not need to know further about what the rule is"). If you still > insist, I'd like to take your advice though. :) I don't. If you clear it only on edge->level transition, then those two also behave the same. >> > I am still looking into guest part codes. Although the above patch >> > should solve the issue, there are still issues in guest codes when >> > IR is enabled: >> >=20 >> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is >> > required in vt-d spec 5.1.5.1, and required to correctly deliver >> > EOI broadcast I guess). See intel_irq_remapping_prepare_irte(): >>=20 >> "required" is a way of saying that the opposite is undefined. >> No need to think about it in IOMMU. >=20 > Why? Without correct vector information, IOAPIC will not be able to > know which entry to clear the Remote IRR bit (please check > ioapic_eoi_broadcast())? IOAPIC won't get correct EOI and Intel made it into an OS bug, because there was no good action that the hardware could take. (We have a lot more freedom, but I think that partially fixing something that doesn't work on real hardware is a wasted effort.) Or did you mean that mismatched vector is a possible source of the fixed bug? (I originally dismissed it, because real hardware works.) >> > - I encountered that level-triggered entries in IOAPIC is marked as >> > edge-triggered interrupt in APIC (which is strange)... >>=20 >> What/where do you mean? >> (The only difference I know of is that level triggered vectors in LAPI= C >> have their respective TMR bit set while edge do not.) >=20 > Exactly. Here is what I mean: >=20 > static void apic_eoi(APICCommonState *s) > { > int isrv; > isrv =3D get_highest_priority_int(s->isr); > if (isrv < 0) > return; > apic_reset_bit(s->isr, isrv); > if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr= , isrv)) { > ioapic_eoi_broadcast(isrv); > } > apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC); > apic_update_irq(s); > } >=20 > APIC will notify IOAPIC only if the corresponding vector in TMR bit > is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a > level-triggered interrupt in APIC registers). What I have traced is > that, the EOI broadcast is missing because this bit is cleared in > APIC TMR while it should be set. I need some more tests to double > confirm this though, in case I made any mistake. (There are two "legal" situations where TMR can be 0 and IOAPIC sets remote IRR -- if edge and level interrupts are assigned to the same vector and if IOAPIC is level while IR and OS edge, both would bug on real hardware too ...) Does QEMU bug with TCG? > (P.S. Actually I saw some similiar comments in kernel codes around, > please check the long comments in ioapic_ack_level(). Not sure > whether these are related.) I hope we didn't emulate the hardware bug. :)