From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Kay, Allen M" <allen.m.kay@intel.com>
Subject: RE: Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock
Date: Tue, 20 Sep 2011 17:07:37 -0700
Message-ID: <987664A83D2D224EAE907B061CE93D5301EDED333E@orsmsx505.amr.corp.intel.com>
References: <20037.10841.995717.397090@mariner.uk.xensource.com>
	<4E454C880200007800051000@nat28.tlf.novell.com>
	<20110812140901.GC11708@ocelot.phlegethon.org>
	<4E4559440200007800051062@nat28.tlf.novell.com>
	<20110815092608.GD11708@ocelot.phlegethon.org>
	<4E4A32650200007800051651@nat28.tlf.novell.com>
	<20110816150621.GM11708@ocelot.phlegethon.org>
	<4E4AAFF402000078000518CF@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <4E4AAFF402000078000518CF@nat28.tlf.novell.com>
Content-Language: en-US
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Jan Beulich <JBeulich@novell.com>, Tim Deegan <tim@xen.org>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, "Dugger,
	Donald D" <donald.d.dugger@intel.com>, Xen.org, security team <security@xen.org>
List-Id: xen-devel@lists.xenproject.org

Catching up on an old thread ...

If I understand correctly, the proposal is to check for VT-d faults in do_s=
oftirq() handler.  If so, we probably don't even need to enable VT-d MSI in=
terrupt at all if iommu_debug is not set, basically handling VT-d faults wi=
th polling method.

This sounds fine to me as long as we still turn on VT-d MSI interrupt for i=
ommu_debug case.

Allen

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists=
.xensource.com] On Behalf Of Jan Beulich
Sent: Tuesday, August 16, 2011 8:59 AM
To: Tim Deegan
Cc: xen-devel@lists.xensource.com; Xen.org security team
Subject: Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault liveloc=
k

>>> On 16.08.11 at 17:06, Tim Deegan <tim@xen.org> wrote:
> At 08:03 +0100 on 16 Aug (1313481813), Jan Beulich wrote:
>> >>> On 15.08.11 at 11:26, Tim Deegan <tim@xen.org> wrote:
>> > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xen.org> wrote:
>> >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> >> > This issue is resolved in changeset 23762:537ed3b74b3f of
>> >> >> > xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
>> >> >>=20
>> >> >> Do you really think this helps much? Direct control of the device =
means
>> >> >> it could also (perhaps on a second vCPU) constantly re-enable the =
bus
>> >> >> mastering bit.=20
>> >> >=20
>> >> > That path goes through qemu/pciback, so at least lets Xen schedule =
the
>> >> > dom0 tools.
>> >>=20
>> >> Are you sure? If (as said) the guest uses a second vCPU for doing the
>> >> config space accesses, I can't see how this would save the pCPU the
>> >> fault storm is occurring on.
>> >=20
>> > Hmmm.  Yes, I see what you mean.
>>=20
>> Actually, a second vCPU may not even be needed: Since the "fault"
>> really is an external interrupt, if that one gets handled on a pCPU othe=
r
>> than the one the guest's vCPU is running on, it could execute such a
>> loop even in that case.
>>=20
>> As to yesterdays softirq-based handling thoughts - perhaps the clearing
>> of the bus master bit on the device should still be done in the actual I=
RQ
>> handler, while the processing of the fault records could be moved out to
>> a softirq.
>=20
> Hmmm.  I like the idea of using a softirq but in fact by the time we've
> figured out which BDF to silence we've pretty much done handling the
> fault.

Ugly, but yes, indeed.

> Reading the VTd docs it looks like we can just ack the IOMMU fault
> interrupt and it won't send any more until we clear the log, so we can
> leave the whole business to a softirq.  Delaying that might cause the
> log to overflow, but that's not necessarily the end of the world.
> Looks like we can do the same on AMD by disabling interrupt generation
> in the main handler and reenabling it in the softirq.
>=20
> Is there any situation where we rally care terribly about the IOfault
> logs overflowing?

As long as older entries don't get overwritten, I don't think that's
going to be problematic. The more that we basically shut off the
offending device(s).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel