From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock Date: Mon, 15 Aug 2011 11:02:19 +0100 Message-ID: <4E490ACB020000780005142A@nat28.tlf.novell.com> References: <20037.10841.995717.397090@mariner.uk.xensource.com> <4E454C880200007800051000@nat28.tlf.novell.com> <20110812140901.GC11708@ocelot.phlegethon.org> <4E4559440200007800051062@nat28.tlf.novell.com> <20110815092608.GD11708@ocelot.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20110815092608.GD11708@ocelot.phlegethon.org> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Tim Deegan Cc: xen-devel@lists.xensource.com, "Xen.org security team" List-Id: xen-devel@lists.xenproject.org >>> On 15.08.11 at 11:26, Tim Deegan wrote: > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote: >> >>> On 12.08.11 at 16:09, Tim Deegan wrote: >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote: >> >> > This issue is resolved in changeset 23762:537ed3b74b3f of >> >> > xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg. >> >>=20 >> >> Do you really think this helps much? Direct control of the device = means >> >> it could also (perhaps on a second vCPU) constantly re-enable the = bus >> >> mastering bit.=20 >> >=20 >> > That path goes through qemu/pciback, so at least lets Xen schedule = the >> > dom0 tools. >>=20 >> Are you sure? If (as said) the guest uses a second vCPU for doing the >> config space accesses, I can't see how this would save the pCPU the >> fault storm is occurring on. >=20 > Hmmm. Yes, I see what you mean. What was your concern about > memory-mapped config registers? That PCIback would need to be involved > somehow? Yes, unless we want to get into the business of intercepting Dom0's writes to mmcfg space. >> > The particular failure that this patch fixes was locking up >> > cpu0 so hard that it couldn't even service softirqs, and the NMI >> > watchdog rebooted the machine. >>=20 >> Hmm, that would point at a flaw in the interrupt exit path, on which >> softirqs shouldn't be ignored. >=20 > Are you suggesting that we should handle softirqs before re-enabling > interrupts? That sounds perilous. Ah, okay, I was assuming execution would get back into the guest at least, but you're saying the interrupts hit right after the sti. Indeed, in that case there's not much else we can do. Or maybe we could: How about moving the whole fault handling into a softirq, and make the low level handler just raise that one? Provided this isn't a performance critical operation (and it really can't given that now you basically knock the offending device in the face when one happens), having to iterate through all IOMMUs shouldn't be that bad. Jan Jan