Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier

From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	qemu-devel@nongnu.org, "Bandan Das" <bsd@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
Date: Mon, 12 Aug 2019 23:16:26 +0200	[thread overview]
Message-ID: <20190812211626.GA9073@xz-x1> (raw)
In-Reply-To: <20190812102453.3c05ab43@x1.home>

On Mon, Aug 12, 2019 at 10:24:53AM -0600, Alex Williamson wrote:
> On Mon, 12 Aug 2019 09:45:27 +0200
> Peter Xu <peterx@redhat.com> wrote:
> 
> > This is a RFC series.
> > 
> > The VT-d code has some defects, one of them is that we cannot detect
> > the misuse of vIOMMU and vfio-pci early enough.
> > 
> > For example, logically this is not allowed:
> > 
> >   -device intel-iommu,caching-mode=off \
> >   -device vfio-pci,host=05:00.0
> 
> Do we require intel-iommu with intremap=on in order to get x2apic for
> large vCPU count guests?  If so, wouldn't it be a valid configuration
> for the user to specify:
> 
>    -device intel-iommu,caching-mode=off,intremap=on \
>    -device vfio-pci,host=05:00.0
> 
> so long as they never have any intention of the guest enabling DMA
> translation?  Would there be any advantage to this config versus
> caching-mode=on?  I suspect the overhead of CM=1 when only using
> interrupt remapping is small to non-existent, but are there other
> reasons for running with CM=0, perhaps guest drivers not supporting it?

AFAIU the major users of the vIOMMU should be guest DPDK apps and
nested device assignments.  For these users I would just make bold to
guess they are mostly using Linux so the majority should be safe.

For the minority, I do agree that above question is valid.  IMHO the
hard point is to find out those users and let them join the
discussion, then we can know how many will be affected and how.  I
think one way to achieve it could be that we merge the patchset like
this, then people will start to complain if there is any. :) I'm not
sure whether that's the best way to go.  I think that could still be a
serious option considering that it could potentially fix a more severe
issue (unexpected QEMU quits), and also reverting the patchset like
this one could be easy as well when really necessary (e.g., the
patchset will not bring machine state changes which might cause
migration issues, or so on).

> 
> I like the idea of being able to nak an incompatible hot-add rather
> than kill the VM, we could narrow that even further to look at not only
> whether caching-mode support is enabled, but also whether translation
> is enabled on the vIOMMU.  Ideally we might disallow the guest from
> enabling translation in such a configuration, but the Linux code is not
> written with the expectation that the hardware can refuse to enable
> translation and there are no capability bits to remove the DMA
> translation capability of the IOMMU.

This is an interesting view at least to me, while... I'm not sure we
should allow that even for emulation.  I'm just imaging such a patch
for the Linux kernel to allow failures on enabling DMAR - it'll be
only for QEMU emulation and I'm not sure whether upstream would like
such a patch.  After all, we are emulating the hardwares, and the
hardware will always succeed in enabling DMAR, AFAICT.  For Windows
and other OSs it could be even harder.  If without the support of all
these, we could simply have other risks of having hanging guests when
the driver is busy waiting for the DMAR status bit to be set.

> Still, we might want to think
> about which is the better user experience, to have the guest panic when
> DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have
> QEMU exit, or as proposed here, prevent all configurations where this
> might occur.  Thanks,

Agreed.  So far, a stricter rule could be a bit better than a hanging
guest to me.  Though that could be very subjective.

Thanks!

-- 
Peter Xu