[Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
@ 2019-08-12  7:45 Peter Xu
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done Peter Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Peter Xu @ 2019-08-12  7:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	peterx, Bandan Das, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

This is a RFC series.

The VT-d code has some defects, one of them is that we cannot detect
the misuse of vIOMMU and vfio-pci early enough.

For example, logically this is not allowed:

  -device intel-iommu,caching-mode=off \
  -device vfio-pci,host=05:00.0

Because the caching mode is required to make vfio-pci devices
functional.

Previously we did this sanity check in vtd_iommu_notify_flag_changed()
as when the memory regions change their attributes.  However that's
too late in most cases!  Because the memory region layouts will only
change after IOMMU is enabled, and that's in most cases during the
guest OS boots.  So when the configuration is wrong, we will only bail
out during the guest boots rather than simply telling the user before
QEMU starts.

The same problem happens on device hotplug, say, when we have this:

  -device intel-iommu,caching-mode=off

Then we do something like:

  (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1

If at that time the vIOMMU is enabled in the guest then the QEMU
process will simply quit directly due to this hotplug event.  This is
a bit insane...

This series tries to solve above two problems by introducing two
sanity checks upon these places separately:

  - machine done
  - hotplug device

This is a bit awkward but I hope this could be better than before.
There is of course other solutions like hard-code the check into
vfio-pci but I feel it even more unpretty.  I didn't think out any
better way to do this, if there is please kindly shout out.

Please have a look to see whether this would be acceptable, thanks.

Peter Xu (4):
  intel_iommu: Sanity check vfio-pci config on machine init done
  qdev/machine: Introduce hotplug_allowed hook
  pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
  intel_iommu: Remove the caching-mode check during flag change

 hw/core/qdev.c         | 17 +++++++++++++++++
 hw/i386/intel_iommu.c  | 40 ++++++++++++++++++++++++++++++++++------
 hw/i386/pc.c           | 21 +++++++++++++++++++++
 include/hw/boards.h    |  9 +++++++++
 include/hw/qdev-core.h |  1 +
 qdev-monitor.c         |  7 +++++++
 6 files changed, 89 insertions(+), 6 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
@ 2019-08-12  7:45 ` Peter Xu
  2019-09-16  7:11   ` Auger Eric
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 2/4] qdev/machine: Introduce hotplug_allowed hook Peter Xu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-12  7:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	peterx, Bandan Das, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

This check was previously only happened when the IOMMU is enabled in
the guest.  It was always too late because the enabling of IOMMU
normally only happens during the boot of guest OS.  It means that we
can bail out and exit directly during the guest OS boots if the
configuration of devices are not supported.  Or, if the guest didn't
enable vIOMMU at all, then the user can use the guest normally but as
long as it reconfigure the guest OS to enable the vIOMMU then reboot,
the user will see the panic right after the reset when the next boot
starts.

Let's make this failure even earlier so that we force the user to use
caching-mode for vfio-pci devices when with the vIOMMU.  So the user
won't get surprise at least during execution of the guest, which seems
a bit nicer.

This will affect some user who didn't enable vIOMMU in the guest OS
but was using vfio-pci and the vtd device in the past.  However I hope
it's not a majority because not enabling vIOMMU with the device
attached is actually meaningless.

We still keep the old assertion for safety so far because the hotplug
path could still reach it, so far.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index de86f53b4e..642dd595ed 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,13 @@
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
+static void vtd_panic_require_caching_mode(void)
+{
+    error_report("We need to set caching-mode=on for intel-iommu to enable "
+                 "device assignment with IOMMU protection.");
+    exit(1);
+}
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
@@ -2926,9 +2933,7 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
     IntelIOMMUState *s = vtd_as->iommu_state;
 
     if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
-        error_report("We need to set caching-mode=on for intel-iommu to enable "
-                     "device assignment with IOMMU protection.");
-        exit(1);
+        vtd_panic_require_caching_mode();
     }
 
     /* Update per-address-space notifier flags */
@@ -3696,6 +3701,32 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
     return true;
 }
 
+static int vtd_machine_done_notify_one(Object *child, void *unused)
+{
+    IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
+
+    /*
+     * We hard-coded here because vfio-pci is the only special case
+     * here.  Let's be more elegant in the future when we can, but so
+     * far there seems to be no better way.
+     */
+    if (object_dynamic_cast(child, "vfio-pci") && !iommu->caching_mode) {
+        vtd_panic_require_caching_mode();
+    }
+
+    return 0;
+}
+
+static void vtd_machine_done_hook(Notifier *notifier, void *unused)
+{
+    object_child_foreach_recursive(object_get_root(),
+                                   vtd_machine_done_notify_one, NULL);
+}
+
+static Notifier vtd_machine_done_notify = {
+    .notify = vtd_machine_done_hook,
+};
+
 static void vtd_realize(DeviceState *dev, Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -3741,6 +3772,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
+    qemu_add_machine_init_done_notifier(&vtd_machine_done_notify);
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 2/4] qdev/machine: Introduce hotplug_allowed hook
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done Peter Xu
@ 2019-08-12  7:45 ` Peter Xu
  2019-09-16  7:23   ` Auger Eric
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 3/4] pc/q35: Disallow vfio-pci hotplug without VT-d caching mode Peter Xu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-12  7:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	peterx, Bandan Das, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Introduce this new per-machine hook to give any machine class a chance
to do a sanity check on the to-be-hotplugged device as a sanity test.
This will be used for x86 to try to detect some illegal configuration
of devices, e.g., possible conflictions between vfio-pci and x86
vIOMMU.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/core/qdev.c         | 17 +++++++++++++++++
 include/hw/boards.h    |  9 +++++++++
 include/hw/qdev-core.h |  1 +
 qdev-monitor.c         |  7 +++++++
 4 files changed, 34 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 94ebc0a4a1..d792b43c37 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -236,6 +236,23 @@ HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev)
     return NULL;
 }
 
+bool qdev_hotplug_allowed(DeviceState *dev, Error **errp)
+{
+    MachineState *machine;
+    MachineClass *mc;
+    Object *m_obj = qdev_get_machine();
+
+    if (object_dynamic_cast(m_obj, TYPE_MACHINE)) {
+        machine = MACHINE(m_obj);
+        mc = MACHINE_GET_CLASS(machine);
+        if (mc->hotplug_allowed) {
+            return mc->hotplug_allowed(machine, dev, errp);
+        }
+    }
+
+    return true;
+}
+
 HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev)
 {
     if (dev->parent_bus) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a71d1a53a5..1cf63be45d 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -166,6 +166,13 @@ typedef struct {
  *    The function pointer to hook different machine specific functions for
  *    parsing "smp-opts" from QemuOpts to MachineState::CpuTopology and more
  *    machine specific topology fields, such as smp_dies for PCMachine.
+ * @hotplug_allowed:
+ *    If the hook is provided, then it'll be called for each device
+ *    hotplug to check whether the device hotplug is allowed.  Return
+ *    true to grant allowance or false to reject the hotplug.  When
+ *    false is returned, an error must be set to show the reason of
+ *    the rejection.  If the hook is not provided, all hotplug will be
+ *    allowed.
  */
 struct MachineClass {
     /*< private >*/
@@ -223,6 +230,8 @@ struct MachineClass {
 
     HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
                                            DeviceState *dev);
+    bool (*hotplug_allowed)(MachineState *state, DeviceState *dev,
+                            Error **errp);
     CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
                                                          unsigned cpu_index);
     const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 136df7774c..88e7ec4b60 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -284,6 +284,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev);
 HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev);
+bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
 /**
  * qdev_get_hotplug_handler: Get handler responsible for device wiring
  *
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 58222c2211..6c80602771 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -614,6 +614,13 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
     /* create device */
     dev = DEVICE(object_new(driver));
 
+    /* Check whether the hotplug is allowed by the machine */
+    if (qdev_hotplug && !qdev_hotplug_allowed(dev, &err)) {
+        /* Error must be set in the machine hook */
+        assert(err);
+        goto err_del_dev;
+    }
+
     if (bus) {
         qdev_set_parent_bus(dev, bus);
     } else if (qdev_hotplug && !qdev_get_machine_hotplug_handler(dev)) {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 3/4] pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done Peter Xu
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 2/4] qdev/machine: Introduce hotplug_allowed hook Peter Xu
@ 2019-08-12  7:45 ` Peter Xu
  2019-09-16  7:23   ` Auger Eric
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 4/4] intel_iommu: Remove the caching-mode check during flag change Peter Xu
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-12  7:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	peterx, Bandan Das, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Instead of bailing out when trying to hotplug a vfio-pci device with
below configuration:

  -device intel-iommu,caching-mode=off

With this we can return a warning message to the user via QMP/HMP and
the VM will continue to work after failing the hotplug:

  (qemu) device_add vfio-pci,bus=root.3,host=05:00.0,id=vfio1
  Error: Device assignment is not allowed without enabling caching-mode=on for Intel IOMMU.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/pc.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 549c437050..4ea00c7bd2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2905,6 +2905,26 @@ static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
+
+static bool pc_hotplug_allowed(MachineState *ms, DeviceState *dev, Error **errp)
+{
+    X86IOMMUState *iommu = x86_iommu_get_default();
+    IntelIOMMUState *intel_iommu;
+
+    if (iommu &&
+        object_dynamic_cast((Object *)iommu, TYPE_INTEL_IOMMU_DEVICE) &&
+        object_dynamic_cast((Object *)dev, "vfio-pci")) {
+        intel_iommu = INTEL_IOMMU_DEVICE(iommu);
+        if (!intel_iommu->caching_mode) {
+            error_setg(errp, "Device assignment is not allowed without "
+                       "enabling caching-mode=on for Intel IOMMU.");
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static void pc_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -2929,6 +2949,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     pcmc->pvh_enabled = true;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = pc_get_hotplug_handler;
+    mc->hotplug_allowed = pc_hotplug_allowed;
     mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
     mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH RFC 4/4] intel_iommu: Remove the caching-mode check during flag change
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
                   ` (2 preceding siblings ...)
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 3/4] pc/q35: Disallow vfio-pci hotplug without VT-d caching mode Peter Xu
@ 2019-08-12  7:45 ` Peter Xu
  2019-09-16  7:24   ` Auger Eric
  2019-08-12 16:24 ` [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Alex Williamson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-12  7:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	peterx, Bandan Das, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

That's never a good place to stop QEMU process... Since now we have
both the machine done sanity check and also the hotplug handler, we
can safely remove this to avoid that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 642dd595ed..93b26b633b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2932,10 +2932,6 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
 
-    if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
-        vtd_panic_require_caching_mode();
-    }
-
     /* Update per-address-space notifier flags */
     vtd_as->notifier_flags = new;
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
                   ` (3 preceding siblings ...)
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 4/4] intel_iommu: Remove the caching-mode check during flag change Peter Xu
@ 2019-08-12 16:24 ` Alex Williamson
  2019-08-12 21:16   ` Peter Xu
  2019-08-13  8:41 ` Jason Wang
  2019-08-20  5:22 ` Peter Xu
  6 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2019-08-12 16:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Bandan Das, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Mon, 12 Aug 2019 09:45:27 +0200
Peter Xu <peterx@redhat.com> wrote:

> This is a RFC series.
> 
> The VT-d code has some defects, one of them is that we cannot detect
> the misuse of vIOMMU and vfio-pci early enough.
> 
> For example, logically this is not allowed:
> 
>   -device intel-iommu,caching-mode=off \
>   -device vfio-pci,host=05:00.0

Do we require intel-iommu with intremap=on in order to get x2apic for
large vCPU count guests?  If so, wouldn't it be a valid configuration
for the user to specify:

   -device intel-iommu,caching-mode=off,intremap=on \
   -device vfio-pci,host=05:00.0

so long as they never have any intention of the guest enabling DMA
translation?  Would there be any advantage to this config versus
caching-mode=on?  I suspect the overhead of CM=1 when only using
interrupt remapping is small to non-existent, but are there other
reasons for running with CM=0, perhaps guest drivers not supporting it?

I like the idea of being able to nak an incompatible hot-add rather
than kill the VM, we could narrow that even further to look at not only
whether caching-mode support is enabled, but also whether translation
is enabled on the vIOMMU.  Ideally we might disallow the guest from
enabling translation in such a configuration, but the Linux code is not
written with the expectation that the hardware can refuse to enable
translation and there are no capability bits to remove the DMA
translation capability of the IOMMU.  Still, we might want to think
about which is the better user experience, to have the guest panic when
DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have
QEMU exit, or as proposed here, prevent all configurations where this
might occur.  Thanks,

Alex

> Because the caching mode is required to make vfio-pci devices
> functional.
> 
> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
> as when the memory regions change their attributes.  However that's
> too late in most cases!  Because the memory region layouts will only
> change after IOMMU is enabled, and that's in most cases during the
> guest OS boots.  So when the configuration is wrong, we will only bail
> out during the guest boots rather than simply telling the user before
> QEMU starts.
> 
> The same problem happens on device hotplug, say, when we have this:
> 
>   -device intel-iommu,caching-mode=off
> 
> Then we do something like:
> 
>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
> 
> If at that time the vIOMMU is enabled in the guest then the QEMU
> process will simply quit directly due to this hotplug event.  This is
> a bit insane...
> 
> This series tries to solve above two problems by introducing two
> sanity checks upon these places separately:
> 
>   - machine done
>   - hotplug device
> 
> This is a bit awkward but I hope this could be better than before.
> There is of course other solutions like hard-code the check into
> vfio-pci but I feel it even more unpretty.  I didn't think out any
> better way to do this, if there is please kindly shout out.
> 
> Please have a look to see whether this would be acceptable, thanks.
> 
> Peter Xu (4):
>   intel_iommu: Sanity check vfio-pci config on machine init done
>   qdev/machine: Introduce hotplug_allowed hook
>   pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
>   intel_iommu: Remove the caching-mode check during flag change
> 
>  hw/core/qdev.c         | 17 +++++++++++++++++
>  hw/i386/intel_iommu.c  | 40 ++++++++++++++++++++++++++++++++++------
>  hw/i386/pc.c           | 21 +++++++++++++++++++++
>  include/hw/boards.h    |  9 +++++++++
>  include/hw/qdev-core.h |  1 +
>  qdev-monitor.c         |  7 +++++++
>  6 files changed, 89 insertions(+), 6 deletions(-)
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-12 16:24 ` [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Alex Williamson
@ 2019-08-12 21:16   ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2019-08-12 21:16 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Bandan Das, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Mon, Aug 12, 2019 at 10:24:53AM -0600, Alex Williamson wrote:
> On Mon, 12 Aug 2019 09:45:27 +0200
> Peter Xu <peterx@redhat.com> wrote:
> 
> > This is a RFC series.
> > 
> > The VT-d code has some defects, one of them is that we cannot detect
> > the misuse of vIOMMU and vfio-pci early enough.
> > 
> > For example, logically this is not allowed:
> > 
> >   -device intel-iommu,caching-mode=off \
> >   -device vfio-pci,host=05:00.0
> 
> Do we require intel-iommu with intremap=on in order to get x2apic for
> large vCPU count guests?  If so, wouldn't it be a valid configuration
> for the user to specify:
> 
>    -device intel-iommu,caching-mode=off,intremap=on \
>    -device vfio-pci,host=05:00.0
> 
> so long as they never have any intention of the guest enabling DMA
> translation?  Would there be any advantage to this config versus
> caching-mode=on?  I suspect the overhead of CM=1 when only using
> interrupt remapping is small to non-existent, but are there other
> reasons for running with CM=0, perhaps guest drivers not supporting it?

AFAIU the major users of the vIOMMU should be guest DPDK apps and
nested device assignments.  For these users I would just make bold to
guess they are mostly using Linux so the majority should be safe.

For the minority, I do agree that above question is valid.  IMHO the
hard point is to find out those users and let them join the
discussion, then we can know how many will be affected and how.  I
think one way to achieve it could be that we merge the patchset like
this, then people will start to complain if there is any. :) I'm not
sure whether that's the best way to go.  I think that could still be a
serious option considering that it could potentially fix a more severe
issue (unexpected QEMU quits), and also reverting the patchset like
this one could be easy as well when really necessary (e.g., the
patchset will not bring machine state changes which might cause
migration issues, or so on).

> 
> I like the idea of being able to nak an incompatible hot-add rather
> than kill the VM, we could narrow that even further to look at not only
> whether caching-mode support is enabled, but also whether translation
> is enabled on the vIOMMU.  Ideally we might disallow the guest from
> enabling translation in such a configuration, but the Linux code is not
> written with the expectation that the hardware can refuse to enable
> translation and there are no capability bits to remove the DMA
> translation capability of the IOMMU.

This is an interesting view at least to me, while... I'm not sure we
should allow that even for emulation.  I'm just imaging such a patch
for the Linux kernel to allow failures on enabling DMAR - it'll be
only for QEMU emulation and I'm not sure whether upstream would like
such a patch.  After all, we are emulating the hardwares, and the
hardware will always succeed in enabling DMAR, AFAICT.  For Windows
and other OSs it could be even harder.  If without the support of all
these, we could simply have other risks of having hanging guests when
the driver is busy waiting for the DMAR status bit to be set.

> Still, we might want to think
> about which is the better user experience, to have the guest panic when
> DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have
> QEMU exit, or as proposed here, prevent all configurations where this
> might occur.  Thanks,

Agreed.  So far, a stricter rule could be a bit better than a hanging
guest to me.  Though that could be very subjective.

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
                   ` (4 preceding siblings ...)
  2019-08-12 16:24 ` [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Alex Williamson
@ 2019-08-13  8:41 ` Jason Wang
  2019-08-13 14:04   ` Peter Xu
  2019-08-28 12:59   ` Auger Eric
  2019-08-20  5:22 ` Peter Xu
  6 siblings, 2 replies; 25+ messages in thread
From: Jason Wang @ 2019-08-13  8:41 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Alex Williamson, Bandan Das,
	Igor Mammedov, Paolo Bonzini, Richard Henderson


On 2019/8/12 下午3:45, Peter Xu wrote:
> This is a RFC series.
>
> The VT-d code has some defects, one of them is that we cannot detect
> the misuse of vIOMMU and vfio-pci early enough.
>
> For example, logically this is not allowed:
>
>    -device intel-iommu,caching-mode=off \
>    -device vfio-pci,host=05:00.0
>
> Because the caching mode is required to make vfio-pci devices
> functional.
>
> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
> as when the memory regions change their attributes.  However that's
> too late in most cases!  Because the memory region layouts will only
> change after IOMMU is enabled, and that's in most cases during the
> guest OS boots.  So when the configuration is wrong, we will only bail
> out during the guest boots rather than simply telling the user before
> QEMU starts.
>
> The same problem happens on device hotplug, say, when we have this:
>
>    -device intel-iommu,caching-mode=off
>
> Then we do something like:
>
>    (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
>
> If at that time the vIOMMU is enabled in the guest then the QEMU
> process will simply quit directly due to this hotplug event.  This is
> a bit insane...
>
> This series tries to solve above two problems by introducing two
> sanity checks upon these places separately:
>
>    - machine done
>    - hotplug device
>
> This is a bit awkward but I hope this could be better than before.
> There is of course other solutions like hard-code the check into
> vfio-pci but I feel it even more unpretty.  I didn't think out any
> better way to do this, if there is please kindly shout out.
>
> Please have a look to see whether this would be acceptable, thanks.
>
> Peter Xu (4):
>    intel_iommu: Sanity check vfio-pci config on machine init done
>    qdev/machine: Introduce hotplug_allowed hook
>    pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
>    intel_iommu: Remove the caching-mode check during flag change
>
>   hw/core/qdev.c         | 17 +++++++++++++++++
>   hw/i386/intel_iommu.c  | 40 ++++++++++++++++++++++++++++++++++------
>   hw/i386/pc.c           | 21 +++++++++++++++++++++
>   include/hw/boards.h    |  9 +++++++++
>   include/hw/qdev-core.h |  1 +
>   qdev-monitor.c         |  7 +++++++
>   6 files changed, 89 insertions(+), 6 deletions(-)
>

Do we need a generic solution other than an Intel specific one?

Thanks



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-13  8:41 ` Jason Wang
@ 2019-08-13 14:04   ` Peter Xu
  2019-08-28 12:59   ` Auger Eric
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Xu @ 2019-08-13 14:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: Alex Williamson, Eduardo Habkost, Michael S. Tsirkin, qemu-devel,
	Bandan Das, Daniel P. Berrangé,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

On Tue, Aug 13, 2019 at 04:41:49PM +0800, Jason Wang wrote:
> Do we need a generic solution other than an Intel specific one?

I assume you're asking about ARM not AMD, right? :)

Yes I think we should have a generic solution.  Though I'd like to see
whether this idea can be accepted first, then we can expand to ARM or
other IOMMUs.  After all we've got some existing duplications already
between at least x86 and arm on vIOMMU so we can actually do more
things like that when time comes.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
                   ` (5 preceding siblings ...)
  2019-08-13  8:41 ` Jason Wang
@ 2019-08-20  5:22 ` Peter Xu
  2019-08-20  6:24   ` Paolo Bonzini
  6 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-20  5:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	Bandan Das, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote:
> This is a RFC series.
> 
> The VT-d code has some defects, one of them is that we cannot detect
> the misuse of vIOMMU and vfio-pci early enough.
> 
> For example, logically this is not allowed:
> 
>   -device intel-iommu,caching-mode=off \
>   -device vfio-pci,host=05:00.0
> 
> Because the caching mode is required to make vfio-pci devices
> functional.
> 
> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
> as when the memory regions change their attributes.  However that's
> too late in most cases!  Because the memory region layouts will only
> change after IOMMU is enabled, and that's in most cases during the
> guest OS boots.  So when the configuration is wrong, we will only bail
> out during the guest boots rather than simply telling the user before
> QEMU starts.
> 
> The same problem happens on device hotplug, say, when we have this:
> 
>   -device intel-iommu,caching-mode=off
> 
> Then we do something like:
> 
>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
> 
> If at that time the vIOMMU is enabled in the guest then the QEMU
> process will simply quit directly due to this hotplug event.  This is
> a bit insane...
> 
> This series tries to solve above two problems by introducing two
> sanity checks upon these places separately:
> 
>   - machine done
>   - hotplug device
> 
> This is a bit awkward but I hope this could be better than before.
> There is of course other solutions like hard-code the check into
> vfio-pci but I feel it even more unpretty.  I didn't think out any
> better way to do this, if there is please kindly shout out.
> 
> Please have a look to see whether this would be acceptable, thanks.

Any more comment on this?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-20  5:22 ` Peter Xu
@ 2019-08-20  6:24   ` Paolo Bonzini
  2019-08-21  5:03     ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2019-08-20  6:24 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	Bandan Das, Igor Mammedov, Richard Henderson

On 20/08/19 07:22, Peter Xu wrote:
> On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote:
>> This is a RFC series.
>>
>> The VT-d code has some defects, one of them is that we cannot detect
>> the misuse of vIOMMU and vfio-pci early enough.
>>
>> For example, logically this is not allowed:
>>
>>   -device intel-iommu,caching-mode=off \
>>   -device vfio-pci,host=05:00.0
>>
>> Because the caching mode is required to make vfio-pci devices
>> functional.
>>
>> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
>> as when the memory regions change their attributes.  However that's
>> too late in most cases!  Because the memory region layouts will only
>> change after IOMMU is enabled, and that's in most cases during the
>> guest OS boots.  So when the configuration is wrong, we will only bail
>> out during the guest boots rather than simply telling the user before
>> QEMU starts.
>>
>> The same problem happens on device hotplug, say, when we have this:
>>
>>   -device intel-iommu,caching-mode=off
>>
>> Then we do something like:
>>
>>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
>>
>> If at that time the vIOMMU is enabled in the guest then the QEMU
>> process will simply quit directly due to this hotplug event.  This is
>> a bit insane...
>>
>> This series tries to solve above two problems by introducing two
>> sanity checks upon these places separately:
>>
>>   - machine done
>>   - hotplug device
>>
>> This is a bit awkward but I hope this could be better than before.
>> There is of course other solutions like hard-code the check into
>> vfio-pci but I feel it even more unpretty.  I didn't think out any
>> better way to do this, if there is please kindly shout out.
>>
>> Please have a look to see whether this would be acceptable, thanks.
> 
> Any more comment on this?

No problem from me, but I wouldn't mind if someone else merged it. :)

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-20  6:24   ` Paolo Bonzini
@ 2019-08-21  5:03     ` Peter Xu
  2019-08-21  7:50       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-21  5:03 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: Alex Williamson, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	qemu-devel, Bandan Das, Daniel P. Berrangé,
	Igor Mammedov, Richard Henderson

On Tue, Aug 20, 2019 at 08:24:49AM +0200, Paolo Bonzini wrote:
> On 20/08/19 07:22, Peter Xu wrote:
> > On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote:
> >> This is a RFC series.
> >>
> >> The VT-d code has some defects, one of them is that we cannot detect
> >> the misuse of vIOMMU and vfio-pci early enough.
> >>
> >> For example, logically this is not allowed:
> >>
> >>   -device intel-iommu,caching-mode=off \
> >>   -device vfio-pci,host=05:00.0
> >>
> >> Because the caching mode is required to make vfio-pci devices
> >> functional.
> >>
> >> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
> >> as when the memory regions change their attributes.  However that's
> >> too late in most cases!  Because the memory region layouts will only
> >> change after IOMMU is enabled, and that's in most cases during the
> >> guest OS boots.  So when the configuration is wrong, we will only bail
> >> out during the guest boots rather than simply telling the user before
> >> QEMU starts.
> >>
> >> The same problem happens on device hotplug, say, when we have this:
> >>
> >>   -device intel-iommu,caching-mode=off
> >>
> >> Then we do something like:
> >>
> >>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
> >>
> >> If at that time the vIOMMU is enabled in the guest then the QEMU
> >> process will simply quit directly due to this hotplug event.  This is
> >> a bit insane...
> >>
> >> This series tries to solve above two problems by introducing two
> >> sanity checks upon these places separately:
> >>
> >>   - machine done
> >>   - hotplug device
> >>
> >> This is a bit awkward but I hope this could be better than before.
> >> There is of course other solutions like hard-code the check into
> >> vfio-pci but I feel it even more unpretty.  I didn't think out any
> >> better way to do this, if there is please kindly shout out.
> >>
> >> Please have a look to see whether this would be acceptable, thanks.
> > 
> > Any more comment on this?
> 
> No problem from me, but I wouldn't mind if someone else merged it. :)

Can I read this as an "acked-by"? :)

Michael, should this be for your tree?  What do you think about the
series?  Please let me know what I need to do to move this forward.  I
can repost a non-rfc series if needed, but it'll be exactly the same
content.

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-21  5:03     ` Peter Xu
@ 2019-08-21  7:50       ` Paolo Bonzini
  2019-09-16  3:35         ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2019-08-21  7:50 UTC (permalink / raw)
  To: Peter Xu, Michael S. Tsirkin
  Cc: Alex Williamson, Eduardo Habkost, Jason Wang, qemu-devel,
	Bandan Das, Daniel P. Berrangé,
	Igor Mammedov, Richard Henderson

On 21/08/19 07:03, Peter Xu wrote:
> On Tue, Aug 20, 2019 at 08:24:49AM +0200, Paolo Bonzini wrote:
>> On 20/08/19 07:22, Peter Xu wrote:
>>> On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote:
>>>> This is a RFC series.
>>>>
>>>> The VT-d code has some defects, one of them is that we cannot detect
>>>> the misuse of vIOMMU and vfio-pci early enough.
>>>>
>>>> For example, logically this is not allowed:
>>>>
>>>>   -device intel-iommu,caching-mode=off \
>>>>   -device vfio-pci,host=05:00.0
>>>>
>>>> Because the caching mode is required to make vfio-pci devices
>>>> functional.
>>>>
>>>> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
>>>> as when the memory regions change their attributes.  However that's
>>>> too late in most cases!  Because the memory region layouts will only
>>>> change after IOMMU is enabled, and that's in most cases during the
>>>> guest OS boots.  So when the configuration is wrong, we will only bail
>>>> out during the guest boots rather than simply telling the user before
>>>> QEMU starts.
>>>>
>>>> The same problem happens on device hotplug, say, when we have this:
>>>>
>>>>   -device intel-iommu,caching-mode=off
>>>>
>>>> Then we do something like:
>>>>
>>>>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
>>>>
>>>> If at that time the vIOMMU is enabled in the guest then the QEMU
>>>> process will simply quit directly due to this hotplug event.  This is
>>>> a bit insane...
>>>>
>>>> This series tries to solve above two problems by introducing two
>>>> sanity checks upon these places separately:
>>>>
>>>>   - machine done
>>>>   - hotplug device
>>>>
>>>> This is a bit awkward but I hope this could be better than before.
>>>> There is of course other solutions like hard-code the check into
>>>> vfio-pci but I feel it even more unpretty.  I didn't think out any
>>>> better way to do this, if there is please kindly shout out.
>>>>
>>>> Please have a look to see whether this would be acceptable, thanks.
>>>
>>> Any more comment on this?
>>
>> No problem from me, but I wouldn't mind if someone else merged it. :)
> 
> Can I read this as an "acked-by"? :)

Yes, it shouldn't even need Acked-by since there are other maintainers
that handle this part of the tree:

Paolo Bonzini <pbonzini@redhat.com> (maintainer:X86 TCG CPUs)
Richard Henderson <rth@twiddle.net> (maintainer:X86 TCG CPUs)
Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86 TCG CPUs)
"Michael S. Tsirkin" <mst@redhat.com> (supporter:PC)
Marcel Apfelbaum <marcel.apfelbaum@gmail.com> (supporter:PC)

> Michael, should this be for your tree?  What do you think about the
> series?  Please let me know what I need to do to move this forward.  I
> can repost a non-rfc series if needed, but it'll be exactly the same
> content.
> 
> Regards,
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-13  8:41 ` Jason Wang
  2019-08-13 14:04   ` Peter Xu
@ 2019-08-28 12:59   ` Auger Eric
  2019-08-29  1:18     ` Peter Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Auger Eric @ 2019-08-28 12:59 UTC (permalink / raw)
  To: Jason Wang, Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Alex Williamson, Bandan Das,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

Hi Peter,

On 8/13/19 10:41 AM, Jason Wang wrote:
> 
> On 2019/8/12 下午3:45, Peter Xu wrote:
>> This is a RFC series.
>>
>> The VT-d code has some defects, one of them is that we cannot detect
>> the misuse of vIOMMU and vfio-pci early enough.
>>
>> For example, logically this is not allowed:
>>
>>    -device intel-iommu,caching-mode=off \
>>    -device vfio-pci,host=05:00.0
>>
>> Because the caching mode is required to make vfio-pci devices
>> functional.
>>
>> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
>> as when the memory regions change their attributes.  However that's
>> too late in most cases!  Because the memory region layouts will only
>> change after IOMMU is enabled, and that's in most cases during the
>> guest OS boots.  So when the configuration is wrong, we will only bail
>> out during the guest boots rather than simply telling the user before
>> QEMU starts.
>>
>> The same problem happens on device hotplug, say, when we have this:
>>
>>    -device intel-iommu,caching-mode=off
>>
>> Then we do something like:
>>
>>    (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
>>
>> If at that time the vIOMMU is enabled in the guest then the QEMU
>> process will simply quit directly due to this hotplug event.  This is
>> a bit insane...
>>
>> This series tries to solve above two problems by introducing two
>> sanity checks upon these places separately:
>>
>>    - machine done
>>    - hotplug device
>>
>> This is a bit awkward but I hope this could be better than before.
>> There is of course other solutions like hard-code the check into
>> vfio-pci but I feel it even more unpretty.  I didn't think out any
>> better way to do this, if there is please kindly shout out.
>>
>> Please have a look to see whether this would be acceptable, thanks.
>>
>> Peter Xu (4):
>>    intel_iommu: Sanity check vfio-pci config on machine init done
>>    qdev/machine: Introduce hotplug_allowed hook
>>    pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
>>    intel_iommu: Remove the caching-mode check during flag change
>>
>>   hw/core/qdev.c         | 17 +++++++++++++++++
>>   hw/i386/intel_iommu.c  | 40 ++++++++++++++++++++++++++++++++++------
>>   hw/i386/pc.c           | 21 +++++++++++++++++++++
>>   include/hw/boards.h    |  9 +++++++++
>>   include/hw/qdev-core.h |  1 +
>>   qdev-monitor.c         |  7 +++++++
>>   6 files changed, 89 insertions(+), 6 deletions(-)
>>
> 
> Do we need a generic solution other than an Intel specific one?

In
[PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory
region attribute (https://patchwork.kernel.org/patch/11109701/)

[PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection
(https://patchwork.kernel.org/patch/11109697/)

I proposed to introduce a new IOMMU MR attribute to retrieve whether the
vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether
this kind of solution would fit your need too.

Assuming we would rename the attribute (whose name is challenged by
Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE
taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would
return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or
NONE in the negative. Then we could implement the check directly in VFIO
common.c. That way I don't think you would need the new notifiers and
this would satisfy both requirements?

Thoughts?

Thanks

Eric

> 
> Thanks
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-28 12:59   ` Auger Eric
@ 2019-08-29  1:18     ` Peter Xu
  2019-08-29  8:05       ` Auger Eric
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-29  1:18 UTC (permalink / raw)
  To: Auger Eric
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Bandan Das, Jason Wang,
	qemu-devel, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	Richard Henderson

On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote:
> Hi Peter,

Hi, Eric,

[...]

> In
> [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory
> region attribute (https://patchwork.kernel.org/patch/11109701/)

[1]

> 
> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection
> (https://patchwork.kernel.org/patch/11109697/)
> 
> I proposed to introduce a new IOMMU MR attribute to retrieve whether the
> vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether
> this kind of solution would fit your need too.
> 
> Assuming we would rename the attribute (whose name is challenged by
> Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE
> taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would
> return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or
> NONE in the negative. Then we could implement the check directly in VFIO
> common.c. That way I don't think you would need the new notifiers and
> this would satisfy both requirements?

IMHO it'll suffer from the similar issue we have now with
flag_changed, because at the very beginning of x86 system boots DMAR
is not yet enabled, the intel-iommu device is using the same mode as
its passthrough mode so there's no IOMMU memory region at all in the
DMA address spaces of the devices.  Hence even with patch [1] above we
still can't really reach the get_attr() check until DMAR enabled?

Maybe we can figure out a good way to expose IOMMU attributes rather
than the IOMMU memory region attributes then we let vfio to pick that
up, but I'm not very sure whether that's clean enough.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-29  1:18     ` Peter Xu
@ 2019-08-29  8:05       ` Auger Eric
  2019-08-29  8:21         ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Auger Eric @ 2019-08-29  8:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: Alex Williamson, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Bandan Das, Igor Mammedov, Paolo Bonzini, Richard Henderson

Hi Peter,
On 8/29/19 3:18 AM, Peter Xu wrote:
> On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote:
>> Hi Peter,
> 
> Hi, Eric,
> 
> [...]
> 
>> In
>> [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory
>> region attribute (https://patchwork.kernel.org/patch/11109701/)
> 
> [1]
> 
>>
>> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection
>> (https://patchwork.kernel.org/patch/11109697/)
>>
>> I proposed to introduce a new IOMMU MR attribute to retrieve whether the
>> vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether
>> this kind of solution would fit your need too.
>>
>> Assuming we would rename the attribute (whose name is challenged by
>> Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE
>> taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would
>> return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or
>> NONE in the negative. Then we could implement the check directly in VFIO
>> common.c. That way I don't think you would need the new notifiers and
>> this would satisfy both requirements?
> 
> IMHO it'll suffer from the similar issue we have now with
> flag_changed, because at the very beginning of x86 system boots DMAR
> is not yet enabled, the intel-iommu device is using the same mode as
> its passthrough mode so there's no IOMMU memory region at all in the
> DMA address spaces of the devices.

Ah OK I did not get this initially. We don't have this issue with SMMUv3
as the IOMMU MR exists from the very beginning and does not depend on
its enablement by the guest. Also it stays there. So the detection can
be made immediatly.

  Hence even with patch [1] above we
> still can't really reach the get_attr() check until DMAR enabled?
> 
> Maybe we can figure out a good way to expose IOMMU attributes rather
> than the IOMMU memory region attributes then we let vfio to pick that
> up, but I'm not very sure whether that's clean enough.
> 
> Thanks,
> 

Thanks

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-29  8:05       ` Auger Eric
@ 2019-08-29  8:21         ` Peter Xu
  2019-08-29  8:46           ` Auger Eric
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2019-08-29  8:21 UTC (permalink / raw)
  To: Auger Eric
  Cc: Alex Williamson, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Bandan Das, Igor Mammedov, Paolo Bonzini, Richard Henderson

On Thu, Aug 29, 2019 at 10:05:27AM +0200, Auger Eric wrote:
> Hi Peter,

Hi, Eric,

> On 8/29/19 3:18 AM, Peter Xu wrote:
> > On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote:
> >> Hi Peter,
> > 
> > Hi, Eric,
> > 
> > [...]
> > 
> >> In
> >> [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory
> >> region attribute (https://patchwork.kernel.org/patch/11109701/)
> > 
> > [1]
> > 
> >>
> >> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection
> >> (https://patchwork.kernel.org/patch/11109697/)
> >>
> >> I proposed to introduce a new IOMMU MR attribute to retrieve whether the
> >> vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether
> >> this kind of solution would fit your need too.
> >>
> >> Assuming we would rename the attribute (whose name is challenged by
> >> Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE
> >> taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would
> >> return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or
> >> NONE in the negative. Then we could implement the check directly in VFIO
> >> common.c. That way I don't think you would need the new notifiers and
> >> this would satisfy both requirements?
> > 
> > IMHO it'll suffer from the similar issue we have now with
> > flag_changed, because at the very beginning of x86 system boots DMAR
> > is not yet enabled, the intel-iommu device is using the same mode as
> > its passthrough mode so there's no IOMMU memory region at all in the
> > DMA address spaces of the devices.
> 
> Ah OK I did not get this initially. We don't have this issue with SMMUv3
> as the IOMMU MR exists from the very beginning and does not depend on
> its enablement by the guest. Also it stays there. So the detection can
> be made immediatly.

True.  With that, I'm a bit curious on whether ARM should implement
something like PT mode of Intel's.  For example, have you tried to run
a ARM guest with both a vSMMU and a vfio-pci inside, however keep DMAR
disabled?  IIUC in that case there will be no mapping at all for the
assigned device, then would that work?  Or is there any magic for ARM?

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-29  8:21         ` Peter Xu
@ 2019-08-29  8:46           ` Auger Eric
  2019-08-29  8:54             ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Auger Eric @ 2019-08-29  8:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Bandan Das, Jason Wang,
	qemu-devel, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	Richard Henderson

Hi Peter,

On 8/29/19 10:21 AM, Peter Xu wrote:
> On Thu, Aug 29, 2019 at 10:05:27AM +0200, Auger Eric wrote:
>> Hi Peter,
> 
> Hi, Eric,
> 
>> On 8/29/19 3:18 AM, Peter Xu wrote:
>>> On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote:
>>>> Hi Peter,
>>>
>>> Hi, Eric,
>>>
>>> [...]
>>>
>>>> In
>>>> [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory
>>>> region attribute (https://patchwork.kernel.org/patch/11109701/)
>>>
>>> [1]
>>>
>>>>
>>>> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection
>>>> (https://patchwork.kernel.org/patch/11109697/)
>>>>
>>>> I proposed to introduce a new IOMMU MR attribute to retrieve whether the
>>>> vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether
>>>> this kind of solution would fit your need too.
>>>>
>>>> Assuming we would rename the attribute (whose name is challenged by
>>>> Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE
>>>> taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would
>>>> return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or
>>>> NONE in the negative. Then we could implement the check directly in VFIO
>>>> common.c. That way I don't think you would need the new notifiers and
>>>> this would satisfy both requirements?
>>>
>>> IMHO it'll suffer from the similar issue we have now with
>>> flag_changed, because at the very beginning of x86 system boots DMAR
>>> is not yet enabled, the intel-iommu device is using the same mode as
>>> its passthrough mode so there's no IOMMU memory region at all in the
>>> DMA address spaces of the devices.
>>
>> Ah OK I did not get this initially. We don't have this issue with SMMUv3
>> as the IOMMU MR exists from the very beginning and does not depend on
>> its enablement by the guest. Also it stays there. So the detection can
>> be made immediatly.
> 
> True.  With that, I'm a bit curious on whether ARM should implement
> something like PT mode of Intel's.  For example, have you tried to run
> a ARM guest with both a vSMMU and a vfio-pci inside, however keep DMAR
> disabled?  IIUC in that case there will be no mapping at all for the
> assigned device, then would that work?  Or is there any magic for ARM?

If I understand correctly PT mode is a bypass mode. With the ARM SMMUv3
the IOMMU MR translate() function gets called but implements a direct
mapping. I understand that on your side, you destroy the IOMMU MR, right?

At the moment since SMMUv3/VFIO integration is not ready I plan to
forbid any usage of VFIO along with SMMUv3, whatever the enable state.

When HW nested paging gets ready, the stage1 bypass state will be
propagated to the HW config structure.

Hope I answer your question.

Thanks

Eric
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-29  8:46           ` Auger Eric
@ 2019-08-29  8:54             ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2019-08-29  8:54 UTC (permalink / raw)
  To: Auger Eric
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Bandan Das, Jason Wang,
	qemu-devel, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	Richard Henderson

On Thu, Aug 29, 2019 at 10:46:42AM +0200, Auger Eric wrote:
> If I understand correctly PT mode is a bypass mode. With the ARM SMMUv3
> the IOMMU MR translate() function gets called but implements a direct
> mapping. I understand that on your side, you destroy the IOMMU MR, right?
> 
> At the moment since SMMUv3/VFIO integration is not ready I plan to
> forbid any usage of VFIO along with SMMUv3, whatever the enable state.
> 
> When HW nested paging gets ready, the stage1 bypass state will be
> propagated to the HW config structure.
> 
> Hope I answer your question.

Yes, nested page tables will be fine. :)

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
  2019-08-21  7:50       ` Paolo Bonzini
@ 2019-09-16  3:35         ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2019-09-16  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	qemu-devel, Bandan Das, Daniel P. Berrangé,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

On Wed, Aug 21, 2019 at 09:50:43AM +0200, Paolo Bonzini wrote:
> On 21/08/19 07:03, Peter Xu wrote:
> > On Tue, Aug 20, 2019 at 08:24:49AM +0200, Paolo Bonzini wrote:
> >> On 20/08/19 07:22, Peter Xu wrote:
> >>> On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote:
> >>>> This is a RFC series.
> >>>>
> >>>> The VT-d code has some defects, one of them is that we cannot detect
> >>>> the misuse of vIOMMU and vfio-pci early enough.
> >>>>
> >>>> For example, logically this is not allowed:
> >>>>
> >>>>   -device intel-iommu,caching-mode=off \
> >>>>   -device vfio-pci,host=05:00.0
> >>>>
> >>>> Because the caching mode is required to make vfio-pci devices
> >>>> functional.
> >>>>
> >>>> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
> >>>> as when the memory regions change their attributes.  However that's
> >>>> too late in most cases!  Because the memory region layouts will only
> >>>> change after IOMMU is enabled, and that's in most cases during the
> >>>> guest OS boots.  So when the configuration is wrong, we will only bail
> >>>> out during the guest boots rather than simply telling the user before
> >>>> QEMU starts.
> >>>>
> >>>> The same problem happens on device hotplug, say, when we have this:
> >>>>
> >>>>   -device intel-iommu,caching-mode=off
> >>>>
> >>>> Then we do something like:
> >>>>
> >>>>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
> >>>>
> >>>> If at that time the vIOMMU is enabled in the guest then the QEMU
> >>>> process will simply quit directly due to this hotplug event.  This is
> >>>> a bit insane...
> >>>>
> >>>> This series tries to solve above two problems by introducing two
> >>>> sanity checks upon these places separately:
> >>>>
> >>>>   - machine done
> >>>>   - hotplug device
> >>>>
> >>>> This is a bit awkward but I hope this could be better than before.
> >>>> There is of course other solutions like hard-code the check into
> >>>> vfio-pci but I feel it even more unpretty.  I didn't think out any
> >>>> better way to do this, if there is please kindly shout out.
> >>>>
> >>>> Please have a look to see whether this would be acceptable, thanks.
> >>>
> >>> Any more comment on this?
> >>
> >> No problem from me, but I wouldn't mind if someone else merged it. :)
> > 
> > Can I read this as an "acked-by"? :)
> 
> Yes, it shouldn't even need Acked-by since there are other maintainers
> that handle this part of the tree:
> 
> Paolo Bonzini <pbonzini@redhat.com> (maintainer:X86 TCG CPUs)
> Richard Henderson <rth@twiddle.net> (maintainer:X86 TCG CPUs)
> Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86 TCG CPUs)
> "Michael S. Tsirkin" <mst@redhat.com> (supporter:PC)
> Marcel Apfelbaum <marcel.apfelbaum@gmail.com> (supporter:PC)

Michael (or any maintainers listed above):

Do any of you have any further comment on this series?  Do any of you
like to merge this?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done Peter Xu
@ 2019-09-16  7:11   ` Auger Eric
  2019-09-16  7:56     ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Auger Eric @ 2019-09-16  7:11 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	Bandan Das, Paolo Bonzini, Igor Mammedov, Richard Henderson

Hi Peter,
On 8/12/19 9:45 AM, Peter Xu wrote:
> This check was previously only happened when the IOMMU is enabled in
s/happened/happening
> the guest.  It was always too late because the enabling of IOMMU
> normally only happens during the boot of guest OS.  It means that we
> can bail out and exit directly during the guest OS boots if the
> configuration of devices are not supported.  Or, if the guest didn't
> enable vIOMMU at all, then the user can use the guest normally but as
> long as it reconfigure the guest OS to enable the vIOMMU then reboot,
reconfigures, and then reboots
> the user will see the panic right after the reset when the next boot
> starts.
> 
> Let's make this failure even earlier so that we force the user to use
> caching-mode for vfio-pci devices when with the vIOMMU.  So the user
> won't get surprise at least during execution of the guest, which seems
> a bit nicer.
> 
> This will affect some user who didn't enable vIOMMU in the guest OS
> but was using vfio-pci and the vtd device in the past.  However I hope
> it's not a majority because not enabling vIOMMU with the device
> attached is actually meaningless.
> 
> We still keep the old assertion for safety so far because the hotplug
> path could still reach it, so far.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 38 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index de86f53b4e..642dd595ed 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -61,6 +61,13 @@
>  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
>  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
>  
> +static void vtd_panic_require_caching_mode(void)
> +{
> +    error_report("We need to set caching-mode=on for intel-iommu to enable "
> +                 "device assignment with IOMMU protection.");
> +    exit(1);
> +}
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
> @@ -2926,9 +2933,7 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>      IntelIOMMUState *s = vtd_as->iommu_state;
>  
>      if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
> -        error_report("We need to set caching-mode=on for intel-iommu to enable "
> -                     "device assignment with IOMMU protection.");
> -        exit(1);
> +        vtd_panic_require_caching_mode();
>      }
>  
>      /* Update per-address-space notifier flags */
> @@ -3696,6 +3701,32 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>      return true;
>  }
>  
> +static int vtd_machine_done_notify_one(Object *child, void *unused)
> +{
> +    IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
> +
> +    /*
> +     * We hard-coded here because vfio-pci is the only special case
> +     * here.  Let's be more elegant in the future when we can, but so
> +     * far there seems to be no better way.
> +     */
> +    if (object_dynamic_cast(child, "vfio-pci") && !iommu->caching_mode) {
> +        vtd_panic_require_caching_mode();
> +    }
> +
> +    return 0;
> +}
> +
> +static void vtd_machine_done_hook(Notifier *notifier, void *unused)
> +{
> +    object_child_foreach_recursive(object_get_root(),
> +                                   vtd_machine_done_notify_one, NULL);
> +}
> +
> +static Notifier vtd_machine_done_notify = {
> +    .notify = vtd_machine_done_hook,
> +};
> +
>  static void vtd_realize(DeviceState *dev, Error **errp)
>  {
>      MachineState *ms = MACHINE(qdev_get_machine());
> @@ -3741,6 +3772,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
>      /* Pseudo address space under root PCI bus. */
>      pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
> +    qemu_add_machine_init_done_notifier(&vtd_machine_done_notify);
This does not compile anymore on master. I think sysemu/sysemu.h needs
to be included as declaration of qemu_add_machine_init_done_notifier is
not found.

Besides
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
>  }
>  
>  static void vtd_class_init(ObjectClass *klass, void *data)
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 3/4] pc/q35: Disallow vfio-pci hotplug without VT-d caching mode Peter Xu
@ 2019-09-16  7:23   ` Auger Eric
  0 siblings, 0 replies; 25+ messages in thread
From: Auger Eric @ 2019-09-16  7:23 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	Bandan Das, Paolo Bonzini, Igor Mammedov, Richard Henderson

Hi Peter,

On 8/12/19 9:45 AM, Peter Xu wrote:
> Instead of bailing out when trying to hotplug a vfio-pci device with
> below configuration:
> 
>   -device intel-iommu,caching-mode=off
> 
> With this we can return a warning message to the user via QMP/HMP and
> the VM will continue to work after failing the hotplug:
> 
>   (qemu) device_add vfio-pci,bus=root.3,host=05:00.0,id=vfio1
>   Error: Device assignment is not allowed without enabling caching-mode=on for Intel IOMMU.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/i386/pc.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 549c437050..4ea00c7bd2 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2905,6 +2905,26 @@ static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>      }
>  }
>  
> +
> +static bool pc_hotplug_allowed(MachineState *ms, DeviceState *dev, Error **errp)
> +{
> +    X86IOMMUState *iommu = x86_iommu_get_default();
> +    IntelIOMMUState *intel_iommu;
> +
> +    if (iommu &&
> +        object_dynamic_cast((Object *)iommu, TYPE_INTEL_IOMMU_DEVICE) &&
> +        object_dynamic_cast((Object *)dev, "vfio-pci")) {
> +        intel_iommu = INTEL_IOMMU_DEVICE(iommu);
> +        if (!intel_iommu->caching_mode) {
> +            error_setg(errp, "Device assignment is not allowed without "
> +                       "enabling caching-mode=on for Intel IOMMU.");
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  static void pc_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> @@ -2929,6 +2949,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>      pcmc->pvh_enabled = true;
>      assert(!mc->get_hotplug_handler);
>      mc->get_hotplug_handler = pc_get_hotplug_handler;
> +    mc->hotplug_allowed = pc_hotplug_allowed;
>      mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
>      mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
>      mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] qdev/machine: Introduce hotplug_allowed hook
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 2/4] qdev/machine: Introduce hotplug_allowed hook Peter Xu
@ 2019-09-16  7:23   ` Auger Eric
  0 siblings, 0 replies; 25+ messages in thread
From: Auger Eric @ 2019-09-16  7:23 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	Bandan Das, Paolo Bonzini, Igor Mammedov, Richard Henderson

Hi Peter,

On 8/12/19 9:45 AM, Peter Xu wrote:
> Introduce this new per-machine hook to give any machine class a chance
> to do a sanity check on the to-be-hotplugged device as a sanity test.
> This will be used for x86 to try to detect some illegal configuration
> of devices, e.g., possible conflictions between vfio-pci and x86
conflicts
> vIOMMU.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  hw/core/qdev.c         | 17 +++++++++++++++++
>  include/hw/boards.h    |  9 +++++++++
>  include/hw/qdev-core.h |  1 +
>  qdev-monitor.c         |  7 +++++++
>  4 files changed, 34 insertions(+)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 94ebc0a4a1..d792b43c37 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -236,6 +236,23 @@ HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev)
>      return NULL;
>  }
>  
> +bool qdev_hotplug_allowed(DeviceState *dev, Error **errp)
> +{
> +    MachineState *machine;
> +    MachineClass *mc;
> +    Object *m_obj = qdev_get_machine();
> +
> +    if (object_dynamic_cast(m_obj, TYPE_MACHINE)) {
> +        machine = MACHINE(m_obj);
> +        mc = MACHINE_GET_CLASS(machine);
> +        if (mc->hotplug_allowed) {
> +            return mc->hotplug_allowed(machine, dev, errp);
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev)
>  {
>      if (dev->parent_bus) {
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index a71d1a53a5..1cf63be45d 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -166,6 +166,13 @@ typedef struct {
>   *    The function pointer to hook different machine specific functions for
>   *    parsing "smp-opts" from QemuOpts to MachineState::CpuTopology and more
>   *    machine specific topology fields, such as smp_dies for PCMachine.
> + * @hotplug_allowed:
> + *    If the hook is provided, then it'll be called for each device
> + *    hotplug to check whether the device hotplug is allowed.  Return
> + *    true to grant allowance or false to reject the hotplug.  When
> + *    false is returned, an error must be set to show the reason of
> + *    the rejection.  If the hook is not provided, all hotplug will be
> + *    allowed.
>   */
>  struct MachineClass {
>      /*< private >*/
> @@ -223,6 +230,8 @@ struct MachineClass {
>  
>      HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>                                             DeviceState *dev);
> +    bool (*hotplug_allowed)(MachineState *state, DeviceState *dev,
> +                            Error **errp);
>      CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
>                                                           unsigned cpu_index);
>      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 136df7774c..88e7ec4b60 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -284,6 +284,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
>                                   int required_for_version);
>  HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev);
>  HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev);
> +bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
>  /**
>   * qdev_get_hotplug_handler: Get handler responsible for device wiring
>   *
> diff --git a/qdev-monitor.c b/qdev-monitor.c
> index 58222c2211..6c80602771 100644
> --- a/qdev-monitor.c
> +++ b/qdev-monitor.c
> @@ -614,6 +614,13 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
>      /* create device */
>      dev = DEVICE(object_new(driver));
>  
> +    /* Check whether the hotplug is allowed by the machine */
> +    if (qdev_hotplug && !qdev_hotplug_allowed(dev, &err)) {
> +        /* Error must be set in the machine hook */
> +        assert(err);
> +        goto err_del_dev;
> +    }
> +
>      if (bus) {
>          qdev_set_parent_bus(dev, bus);
>      } else if (qdev_hotplug && !qdev_get_machine_hotplug_handler(dev)) {
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] intel_iommu: Remove the caching-mode check during flag change
  2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 4/4] intel_iommu: Remove the caching-mode check during flag change Peter Xu
@ 2019-09-16  7:24   ` Auger Eric
  0 siblings, 0 replies; 25+ messages in thread
From: Auger Eric @ 2019-09-16  7:24 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Jason Wang, Alex Williamson,
	Bandan Das, Paolo Bonzini, Igor Mammedov, Richard Henderson

Hi Peter,

On 8/12/19 9:45 AM, Peter Xu wrote:
> That's never a good place to stop QEMU process... Since now we have
> both the machine done sanity check and also the hotplug handler, we
> can safely remove this to avoid that.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  hw/i386/intel_iommu.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 642dd595ed..93b26b633b 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2932,10 +2932,6 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>      IntelIOMMUState *s = vtd_as->iommu_state;
>  
> -    if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
> -        vtd_panic_require_caching_mode();
> -    }
> -
>      /* Update per-address-space notifier flags */
>      vtd_as->notifier_flags = new;
>  
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done
  2019-09-16  7:11   ` Auger Eric
@ 2019-09-16  7:56     ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2019-09-16  7:56 UTC (permalink / raw)
  To: Auger Eric
  Cc: Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Bandan Das, Jason Wang,
	qemu-devel, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	Richard Henderson

On Mon, Sep 16, 2019 at 09:11:50AM +0200, Auger Eric wrote:
> >  static void vtd_realize(DeviceState *dev, Error **errp)
> >  {
> >      MachineState *ms = MACHINE(qdev_get_machine());
> > @@ -3741,6 +3772,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> >      pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
> >      /* Pseudo address space under root PCI bus. */
> >      pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
> > +    qemu_add_machine_init_done_notifier(&vtd_machine_done_notify);
> This does not compile anymore on master. I think sysemu/sysemu.h needs
> to be included as declaration of qemu_add_machine_init_done_notifier is
> not found.

Indeed.  It's probably because we've changed some header inclusions
recently between each other.  I'll repost a new version with it added.

> 
> Besides
> Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks for the reviews!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-09-16  7:57 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12  7:45 [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Peter Xu
2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 1/4] intel_iommu: Sanity check vfio-pci config on machine init done Peter Xu
2019-09-16  7:11   ` Auger Eric
2019-09-16  7:56     ` Peter Xu
2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 2/4] qdev/machine: Introduce hotplug_allowed hook Peter Xu
2019-09-16  7:23   ` Auger Eric
2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 3/4] pc/q35: Disallow vfio-pci hotplug without VT-d caching mode Peter Xu
2019-09-16  7:23   ` Auger Eric
2019-08-12  7:45 ` [Qemu-devel] [PATCH RFC 4/4] intel_iommu: Remove the caching-mode check during flag change Peter Xu
2019-09-16  7:24   ` Auger Eric
2019-08-12 16:24 ` [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier Alex Williamson
2019-08-12 21:16   ` Peter Xu
2019-08-13  8:41 ` Jason Wang
2019-08-13 14:04   ` Peter Xu
2019-08-28 12:59   ` Auger Eric
2019-08-29  1:18     ` Peter Xu
2019-08-29  8:05       ` Auger Eric
2019-08-29  8:21         ` Peter Xu
2019-08-29  8:46           ` Auger Eric
2019-08-29  8:54             ` Peter Xu
2019-08-20  5:22 ` Peter Xu
2019-08-20  6:24   ` Paolo Bonzini
2019-08-21  5:03     ` Peter Xu
2019-08-21  7:50       ` Paolo Bonzini
2019-09-16  3:35         ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).