All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature
@ 2021-07-08 12:55 Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 1/9] hw/pci/pci_host: Allow PCI host to bypass iommu Wang Xingang
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Traditionally, there is a global switch to enable/disable vIOMMU. All
devices in the system can only support go through vIOMMU or not, which
is not flexible. We introduce this bypass iommu property to support
coexist of devices go through vIOMMU and devices not. This is useful to
passthrough devices with no-iommu mode and devices go through vIOMMU in
the same virtual machine.

PCI host bridges have a bypass_iommu property. This property is used to
determine whether the devices attached on the PCI host bridge will bypass
virtual iommu. The bypass_iommu property is valid only when there is a
virtual iommu in the system, it is implemented to allow some devices to
bypass vIOMMU. When bypass_iommu property is not set for a host bridge,
the attached devices will go through vIOMMU by default.

The bypass iommu feature support PXB host bridge and default main host
bridge, we add a bypass_iommu property for PXB and default_bus_bypass_iommu
for machine. Note that default_bus_bypass_iommu is available only when
the 'q35' machine type on x86 architecture and the 'virt' machine type
on AArch64. Other machine types do not support bypass iommu for default
root bus.

Here is how to use bypass iommu feature:
 (1) PCI expander bridge
     qemu -device pxb-pcie,bus_nr=0x10,addr=0x1,bypass_iommu=true
 (2) Arm default host bridge
     qemu -machine virt,iommu=smmuv3,default_bus_bypass_iommu=true
 (3) X86 default root bus bypass iommu:
     qemu -machine q35,default_bus_bypass_iommu=true

History:

v4 -> v5:
- Suggested by Igor, add docs/bypass-iommu.txt to describe bypass iommu
  feature in more detail, including motivation, usage and limitations.
- Add some small fixes, simplify the logic when building IORT idmap as
  suggested by Eric.
- https://lore.kernel.org/qemu-devel/1621914605-14724-1-git-send-email-wangxingang5@huawei.com/

v3 -> v4:
- simplify the logic in building the IORT idmap

v2 -> v3:
- rebase on top of v6.0.0-rc4
- Took into account Eric's comments, replace with a bypass_iommu
  proerty 
- When building the IORT idmap, cover the whole RID space

v1 -> v2:
- rebase on top of v6.0.0-rc0
- Fix some issues
- Took into account Eric's comments, and remove the PCI_BUS_IOMMU flag,
  replace it with a property in PCIHostState.
- Add support for x86 iommu option

Xingang Wang (9):
  hw/pci/pci_host: Allow PCI host to bypass iommu
  hw/pxb: Add a bypass iommu property
  hw/arm/virt: Add default_bus_bypass_iommu machine option
  hw/i386: Add a default_bus_bypass_iommu pc machine option
  hw/pci: Add pci_bus_range() to get PCI bus number range
  hw/arm/virt-acpi-build: Add IORT support to bypass SMMUv3
  hw/i386/acpi-build: Add DMAR support to bypass iommu
  hw/i386/acpi-build: Add IVRS support to bypass iommu
  docs: Add documentation for iommu bypass

 docs/bypass-iommu.txt               |  89 ++++++++++++++++++++++
 hw/arm/virt-acpi-build.c            | 114 +++++++++++++++++++++++++---
 hw/arm/virt.c                       |  26 +++++++
 hw/i386/acpi-build.c                |  70 ++++++++++++++++-
 hw/i386/pc.c                        |  20 +++++
 hw/pci-bridge/pci_expander_bridge.c |   3 +
 hw/pci-host/q35.c                   |   2 +
 hw/pci/pci.c                        |  34 ++++++++-
 hw/pci/pci_host.c                   |   1 +
 include/hw/arm/virt.h               |   1 +
 include/hw/i386/pc.h                |   1 +
 include/hw/pci/pci.h                |   2 +
 include/hw/pci/pci_host.h           |   1 +
 13 files changed, 349 insertions(+), 15 deletions(-)
 create mode 100644 docs/bypass-iommu.txt

-- 
2.19.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/9] hw/pci/pci_host: Allow PCI host to bypass iommu
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 2/9] hw/pxb: Add a bypass iommu property Wang Xingang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Add a new bypass_iommu property for PCI host and use it to check
whether devices attached to the PCI root bus will bypass iommu.
In pci_device_iommu_address_space(), check the property and
avoid getting iommu address space for devices bypass iommu.

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 hw/pci/pci.c              | 18 +++++++++++++++++-
 hw/pci/pci_host.c         |  1 +
 include/hw/pci/pci.h      |  1 +
 include/hw/pci/pci_host.h |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 377084f1a8..27d588e268 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -416,6 +416,22 @@ const char *pci_root_bus_path(PCIDevice *dev)
     return rootbus->qbus.name;
 }
 
+bool pci_bus_bypass_iommu(PCIBus *bus)
+{
+    PCIBus *rootbus = bus;
+    PCIHostState *host_bridge;
+
+    if (!pci_bus_is_root(bus)) {
+        rootbus = pci_device_root_bus(bus->parent_dev);
+    }
+
+    host_bridge = PCI_HOST_BRIDGE(rootbus->qbus.parent);
+
+    assert(host_bridge->bus == rootbus);
+
+    return host_bridge->bypass_iommu;
+}
+
 static void pci_root_bus_init(PCIBus *bus, DeviceState *parent,
                               MemoryRegion *address_space_mem,
                               MemoryRegion *address_space_io,
@@ -2718,7 +2734,7 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 
         iommu_bus = parent_bus;
     }
-    if (iommu_bus && iommu_bus->iommu_fn) {
+    if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
         return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
     }
     return &address_space_memory;
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index 8ca5fadcbd..cf02f0d6a5 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -222,6 +222,7 @@ const VMStateDescription vmstate_pcihost = {
 static Property pci_host_properties_common[] = {
     DEFINE_PROP_BOOL("x-config-reg-migration-enabled", PCIHostState,
                      mig_enabled, true),
+    DEFINE_PROP_BOOL("bypass-iommu", PCIHostState, bypass_iommu, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6be4e0c460..f4d51b672b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -480,6 +480,7 @@ void pci_for_each_bus(PCIBus *bus,
 
 PCIBus *pci_device_root_bus(const PCIDevice *d);
 const char *pci_root_bus_path(PCIDevice *dev);
+bool pci_bus_bypass_iommu(PCIBus *bus);
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
 int pci_qdev_find_device(const char *id, PCIDevice **pdev);
 void pci_bus_get_w64_range(PCIBus *bus, Range *range);
diff --git a/include/hw/pci/pci_host.h b/include/hw/pci/pci_host.h
index 52e038c019..c6f4eb4585 100644
--- a/include/hw/pci/pci_host.h
+++ b/include/hw/pci/pci_host.h
@@ -43,6 +43,7 @@ struct PCIHostState {
     uint32_t config_reg;
     bool mig_enabled;
     PCIBus *bus;
+    bool bypass_iommu;
 
     QLIST_ENTRY(PCIHostState) next;
 };
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/9] hw/pxb: Add a bypass iommu property
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 1/9] hw/pci/pci_host: Allow PCI host to bypass iommu Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 3/9] hw/arm/virt: Add default_bus_bypass_iommu machine option Wang Xingang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Add a bypass_iommu property for pci_expander_bridge, the property
is used to indicate whether pxb root bus will bypass iommu. By
default the bypass_iommu is disabled, and it can be enabled with:
qemu -device pxb-pcie,bus_nr=0x10,addr=0x1,bypass_iommu=true

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 hw/pci-bridge/pci_expander_bridge.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index aedded1064..7112dc3062 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -57,6 +57,7 @@ struct PXBDev {
 
     uint8_t bus_nr;
     uint16_t numa_node;
+    bool bypass_iommu;
 };
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
@@ -255,6 +256,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
     bus->map_irq = pxb_map_irq_fn;
 
     PCI_HOST_BRIDGE(ds)->bus = bus;
+    PCI_HOST_BRIDGE(ds)->bypass_iommu = pxb->bypass_iommu;
 
     pxb_register_bus(dev, bus, &local_err);
     if (local_err) {
@@ -301,6 +303,7 @@ static Property pxb_dev_properties[] = {
     /* Note: 0 is not a legal PXB bus number. */
     DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
     DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
+    DEFINE_PROP_BOOL("bypass_iommu", PXBDev, bypass_iommu, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/9] hw/arm/virt: Add default_bus_bypass_iommu machine option
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 1/9] hw/pci/pci_host: Allow PCI host to bypass iommu Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 2/9] hw/pxb: Add a bypass iommu property Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 4/9] hw/i386: Add a default_bus_bypass_iommu pc " Wang Xingang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Add a default_bus_bypass_iommu machine option to enable/disable
bypass_iommu for default root bus. The option is disabled by
default and can be enabled with:
$QEMU -machine virt,iommu=smmuv3,default_bus_bypass_iommu=true

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
---
 hw/arm/virt.c         | 26 ++++++++++++++++++++++++++
 include/hw/arm/virt.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4b96f06014..a2805977b8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1364,6 +1364,7 @@ static void create_pcie(VirtMachineState *vms)
     }
 
     pci = PCI_HOST_BRIDGE(dev);
+    pci->bypass_iommu = vms->default_bus_bypass_iommu;
     vms->bus = pci->bus;
     if (vms->bus) {
         for (i = 0; i < nb_nics; i++) {
@@ -2319,6 +2320,21 @@ static void virt_set_iommu(Object *obj, const char *value, Error **errp)
     }
 }
 
+static bool virt_get_default_bus_bypass_iommu(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->default_bus_bypass_iommu;
+}
+
+static void virt_set_default_bus_bypass_iommu(Object *obj, bool value,
+                                              Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->default_bus_bypass_iommu = value;
+}
+
 static CpuInstanceProperties
 virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
@@ -2658,6 +2674,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
                                           "Set the IOMMU type. "
                                           "Valid values are none and smmuv3");
 
+    object_class_property_add_bool(oc, "default_bus_bypass_iommu",
+                                   virt_get_default_bus_bypass_iommu,
+                                   virt_set_default_bus_bypass_iommu);
+    object_class_property_set_description(oc, "default_bus_bypass_iommu",
+                                          "Set on/off to enable/disable "
+                                          "bypass_iommu for default root bus");
+
     object_class_property_add_bool(oc, "ras", virt_get_ras,
                                    virt_set_ras);
     object_class_property_set_description(oc, "ras",
@@ -2725,6 +2748,9 @@ static void virt_instance_init(Object *obj)
     /* Default disallows iommu instantiation */
     vms->iommu = VIRT_IOMMU_NONE;
 
+    /* The default root bus is attached to iommu by default */
+    vms->default_bus_bypass_iommu = false;
+
     /* Default disallows RAS instantiation */
     vms->ras = false;
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 921416f918..9661c46699 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -147,6 +147,7 @@ struct VirtMachineState {
     OnOffAuto acpi;
     VirtGICType gic_version;
     VirtIOMMUType iommu;
+    bool default_bus_bypass_iommu;
     VirtMSIControllerType msi_controller;
     uint16_t virtio_iommu_bdf;
     struct arm_boot_info bootinfo;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 4/9] hw/i386: Add a default_bus_bypass_iommu pc machine option
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
                   ` (2 preceding siblings ...)
  2021-07-08 12:55 ` [PATCH v5 3/9] hw/arm/virt: Add default_bus_bypass_iommu machine option Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 5/9] hw/pci: Add pci_bus_range() to get PCI bus number range Wang Xingang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Add a default_bus_bypass_iommu pc machine option to enable/disable
bypass_iommu for default root bus. The option is disabled by default
and can be enabled with:
$QEMU -machine q35,default_bus_bypass_iommu=true

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
---
 hw/i386/pc.c         | 20 ++++++++++++++++++++
 hw/pci-host/q35.c    |  2 ++
 include/hw/i386/pc.h |  1 +
 3 files changed, 23 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e1220db72..257d70f6e9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1522,6 +1522,21 @@ static void pc_machine_set_hpet(Object *obj, bool value, Error **errp)
     pcms->hpet_enabled = value;
 }
 
+static bool pc_machine_get_default_bus_bypass_iommu(Object *obj, Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+
+    return pcms->default_bus_bypass_iommu;
+}
+
+static void pc_machine_set_default_bus_bypass_iommu(Object *obj, bool value,
+                                                    Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+
+    pcms->default_bus_bypass_iommu = value;
+}
+
 static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
                                             const char *name, void *opaque,
                                             Error **errp)
@@ -1621,6 +1636,7 @@ static void pc_machine_initfn(Object *obj)
 #ifdef CONFIG_HPET
     pcms->hpet_enabled = true;
 #endif
+    pcms->default_bus_bypass_iommu = false;
 
     pc_system_flash_create(pcms);
     pcms->pcspk = isa_new(TYPE_PC_SPEAKER);
@@ -1745,6 +1761,10 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, "hpet",
         pc_machine_get_hpet, pc_machine_set_hpet);
 
+    object_class_property_add_bool(oc, "default_bus_bypass_iommu",
+        pc_machine_get_default_bus_bypass_iommu,
+        pc_machine_set_default_bus_bypass_iommu);
+
     object_class_property_add(oc, PC_MACHINE_MAX_FW_SIZE, "size",
         pc_machine_get_max_fw_size, pc_machine_set_max_fw_size,
         NULL, NULL);
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 2eb729dff5..826a05d7f3 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -64,6 +64,8 @@ static void q35_host_realize(DeviceState *dev, Error **errp)
                                 s->mch.address_space_io,
                                 0, TYPE_PCIE_BUS);
     PC_MACHINE(qdev_get_machine())->bus = pci->bus;
+    pci->bypass_iommu =
+        PC_MACHINE(qdev_get_machine())->default_bus_bypass_iommu;
     qdev_realize(DEVICE(&s->mch), BUS(pci->bus), &error_fatal);
 }
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 87294f2632..fd741119fa 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -44,6 +44,7 @@ typedef struct PCMachineState {
     bool sata_enabled;
     bool pit_enabled;
     bool hpet_enabled;
+    bool default_bus_bypass_iommu;
     uint64_t max_fw_size;
 
     /* NUMA information: */
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 5/9] hw/pci: Add pci_bus_range() to get PCI bus number range
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
                   ` (3 preceding siblings ...)
  2021-07-08 12:55 ` [PATCH v5 4/9] hw/i386: Add a default_bus_bypass_iommu pc " Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 6/9] hw/arm/virt-acpi-build: Add IORT support to bypass SMMUv3 Wang Xingang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

This helps to get the min and max bus number of a PCI bus hierarchy.

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 hw/pci/pci.c         | 16 ++++++++++++++++
 include/hw/pci/pci.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 27d588e268..23d2ae2ab2 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -537,6 +537,22 @@ int pci_bus_num(PCIBus *s)
     return PCI_BUS_GET_CLASS(s)->bus_num(s);
 }
 
+/* Returns the min and max bus numbers of a PCI bus hierarchy */
+void pci_bus_range(PCIBus *bus, int *min_bus, int *max_bus)
+{
+    int i;
+    *min_bus = *max_bus = pci_bus_num(bus);
+
+    for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
+        PCIDevice *dev = bus->devices[i];
+
+        if (dev && PCI_DEVICE_GET_CLASS(dev)->is_bridge) {
+            *min_bus = MIN(*min_bus, dev->config[PCI_SECONDARY_BUS]);
+            *max_bus = MAX(*max_bus, dev->config[PCI_SUBORDINATE_BUS]);
+        }
+    }
+}
+
 int pci_bus_numa_node(PCIBus *bus)
 {
     return PCI_BUS_GET_CLASS(bus)->numa_node(bus);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f4d51b672b..d0f4266e37 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -450,6 +450,7 @@ static inline PCIBus *pci_get_bus(const PCIDevice *dev)
     return PCI_BUS(qdev_get_parent_bus(DEVICE(dev)));
 }
 int pci_bus_num(PCIBus *s);
+void pci_bus_range(PCIBus *bus, int *min_bus, int *max_bus);
 static inline int pci_dev_bus_num(const PCIDevice *dev)
 {
     return pci_bus_num(pci_get_bus(dev));
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 6/9] hw/arm/virt-acpi-build: Add IORT support to bypass SMMUv3
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
                   ` (4 preceding siblings ...)
  2021-07-08 12:55 ` [PATCH v5 5/9] hw/pci: Add pci_bus_range() to get PCI bus number range Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 7/9] hw/i386/acpi-build: Add DMAR support to bypass iommu Wang Xingang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

When we build IORT table with SMMUv3 and bypass iommu feature enabled,
we can no longer setup one map from RC to SMMUv3 covering the whole RIDs.
We need to walk the PCI bus and check whether the root bus will bypass
iommu, setup RC -> SMMUv3 -> ITS map for RC which will not bypass iommu.

When a SMMUv3 node exist, we setup the idmap from SMMUv3 to ITS
covering the whole RIDs, and only modify the map from RC to SMMUv3.
We build RC -> SMMUv3 -> ITS map for root bus with bypass_iommu
disabled, and build idmap from RC to ITS directly for the rest of
the whole RID space.

For example we run qemu with command line:

qemu/build/aarch64-softmmu/qemu-system-aarch64 \
 -kernel arch/arm64/boot/Image \
 -enable-kvm \
 -cpu host \
 -m 8G \
 -smp 8,sockets=2,cores=4,threads=1 \
 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3,default_bus_bypass_iommu=true \
 -drive file=./QEMU_EFI-pflash.raw,if=pflash,format=raw,unit=0,readonly=on \
 -device pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3.0x1 \
 -device pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x3.0x2,bypass_iommu=true \
 -device pcie-root-port,port=0x20,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
 -device pcie-root-port,port=0x20,chassis=11,id=pci.11,bus=pci.10,addr=0x1 \
 -device pcie-root-port,port=0x20,chassis=21,id=pci.21,bus=pci.20,addr=0x1 \
 -device virtio-scsi-pci,id=scsi0,bus=pci.1,addr=0x1 \
 -device virtio-scsi-pci,id=scsi1,bus=pci.11,addr=0x1 \
 -device virtio-scsi-pci,id=scsi2,bus=pci.21,addr=0x1 \
 -initrd /mnt/davinci/wxg/kill-linux/rootfs/mfs.cpio.gz \
 -nographic \
 -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 nokaslr" \

And we get guest configuration:

-+-[0000:20]---01.0-[21]--
 +-[0000:10]---01.0-[11]--
 \-[0000:00]-+-00.0  Device 1b36:0008
             +-01.0  Device 1af4:1000
             \-02.0-[01]--

With bypass_iommu enabled, the attached devices will bypass iommu.

/sys/class/iommu/smmu3.0x0000000009050000/
|-- device -> ../../../arm-smmu-v3.0.auto
|-- devices
|   `-- 0000:10:01.0 -> ../../../../../pci0000:10/0000:10:01.0

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
---
 hw/arm/virt-acpi-build.c | 114 +++++++++++++++++++++++++++++++++++----
 1 file changed, 103 insertions(+), 11 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f1024843dd..037cc1fd82 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -44,6 +44,7 @@
 #include "hw/acpi/tpm.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bus.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/arm/virt.h"
 #include "hw/mem/nvdimm.h"
@@ -239,23 +240,89 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms)
 }
 #endif
 
+/* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
+static int
+iort_host_bridges(Object *obj, void *opaque)
+{
+    GArray *idmap_blob = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
+        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
+
+        if (bus && !pci_bus_bypass_iommu(bus)) {
+            int min_bus, max_bus;
+
+            pci_bus_range(bus, &min_bus, &max_bus);
+
+            AcpiIortIdMapping idmap = {
+                .input_base = min_bus << 8,
+                .id_count = (max_bus - min_bus + 1) << 8,
+            };
+            g_array_append_val(idmap_blob, idmap);
+        }
+    }
+
+    return 0;
+}
+
+static int iort_idmap_compare(gconstpointer a, gconstpointer b)
+{
+    AcpiIortIdMapping *idmap_a = (AcpiIortIdMapping *)a;
+    AcpiIortIdMapping *idmap_b = (AcpiIortIdMapping *)b;
+
+    return idmap_a->input_base - idmap_b->input_base;
+}
+
 static void
 build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
-    int nb_nodes, iort_start = table_data->len;
+    int i, nb_nodes, rc_mapping_count, iort_start = table_data->len;
     AcpiIortIdMapping *idmap;
     AcpiIortItsGroup *its;
     AcpiIortTable *iort;
     AcpiIortSmmu3 *smmu;
     size_t node_size, iort_node_offset, iort_length, smmu_offset = 0;
     AcpiIortRC *rc;
+    GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
+    GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
 
     iort = acpi_data_push(table_data, sizeof(*iort));
 
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
+        AcpiIortIdMapping next_range = {0};
+
+        object_child_foreach_recursive(object_get_root(),
+                                       iort_host_bridges, smmu_idmaps);
+
+        /* Sort the smmu idmap by input_base */
+        g_array_sort(smmu_idmaps, iort_idmap_compare);
+
+        /*
+         * Split the whole RIDs by mapping from RC to SMMU,
+         * build the ID mapping from RC to ITS directly.
+         */
+        for (i = 0; i < smmu_idmaps->len; i++) {
+            idmap = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
+
+            if (next_range.input_base < idmap->input_base) {
+                next_range.id_count = idmap->input_base - next_range.input_base;
+                g_array_append_val(its_idmaps, next_range);
+            }
+
+            next_range.input_base = idmap->input_base + idmap->id_count;
+        }
+
+        /* Append the last RC -> ITS ID mapping */
+        if (next_range.input_base < 0xFFFF) {
+            next_range.id_count = 0xFFFF - next_range.input_base;
+            g_array_append_val(its_idmaps, next_range);
+        }
+
         nb_nodes = 3; /* RC, ITS, SMMUv3 */
+        rc_mapping_count = smmu_idmaps->len + its_idmaps->len;
     } else {
         nb_nodes = 2; /* RC, ITS */
+        rc_mapping_count = 1;
     }
 
     iort_length = sizeof(*iort);
@@ -307,13 +374,13 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     }
 
     /* Root Complex Node */
-    node_size = sizeof(*rc) + sizeof(*idmap);
+    node_size = sizeof(*rc) + sizeof(*idmap) * rc_mapping_count;
     iort_length += node_size;
     rc = acpi_data_push(table_data, node_size);
 
     rc->type = ACPI_IORT_NODE_PCI_ROOT_COMPLEX;
     rc->length = cpu_to_le16(node_size);
-    rc->mapping_count = cpu_to_le32(1);
+    rc->mapping_count = cpu_to_le32(rc_mapping_count);
     rc->mapping_offset = cpu_to_le32(sizeof(*rc));
 
     /* fully coherent device */
@@ -321,20 +388,45 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     rc->memory_properties.memory_flags = 0x3; /* CCA = CPM = DCAS = 1 */
     rc->pci_segment_number = 0; /* MCFG pci_segment */
 
-    /* Identity RID mapping covering the whole input RID range */
-    idmap = &rc->id_mapping_array[0];
-    idmap->input_base = 0;
-    idmap->id_count = cpu_to_le32(0xFFFF);
-    idmap->output_base = 0;
-
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
-        /* output IORT node is the smmuv3 node */
-        idmap->output_reference = cpu_to_le32(smmu_offset);
+        AcpiIortIdMapping *range;
+
+        /* translated RIDs connect to SMMUv3 node: RC -> SMMUv3 -> ITS */
+        for (i = 0; i < smmu_idmaps->len; i++) {
+            idmap = &rc->id_mapping_array[i];
+            range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
+
+            idmap->input_base = cpu_to_le32(range->input_base);
+            idmap->id_count = cpu_to_le32(range->id_count);
+            idmap->output_base = cpu_to_le32(range->input_base);
+            /* output IORT node is the smmuv3 node */
+            idmap->output_reference = cpu_to_le32(smmu_offset);
+        }
+
+        /* bypassed RIDs connect to ITS group node directly: RC -> ITS */
+        for (i = 0; i < its_idmaps->len; i++) {
+            idmap = &rc->id_mapping_array[smmu_idmaps->len + i];
+            range = &g_array_index(its_idmaps, AcpiIortIdMapping, i);
+
+            idmap->input_base = cpu_to_le32(range->input_base);
+            idmap->id_count = cpu_to_le32(range->id_count);
+            idmap->output_base = cpu_to_le32(range->input_base);
+            /* output IORT node is the ITS group node (the first node) */
+            idmap->output_reference = cpu_to_le32(iort_node_offset);
+        }
     } else {
+        /* Identity RID mapping covering the whole input RID range */
+        idmap = &rc->id_mapping_array[0];
+        idmap->input_base = cpu_to_le32(0);
+        idmap->id_count = cpu_to_le32(0xFFFF);
+        idmap->output_base = cpu_to_le32(0);
         /* output IORT node is the ITS group node (the first node) */
         idmap->output_reference = cpu_to_le32(iort_node_offset);
     }
 
+    g_array_free(smmu_idmaps, true);
+    g_array_free(its_idmaps, true);
+
     /*
      * Update the pointer address in case table_data->data moves during above
      * acpi_data_push operations.
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 7/9] hw/i386/acpi-build: Add DMAR support to bypass iommu
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
                   ` (5 preceding siblings ...)
  2021-07-08 12:55 ` [PATCH v5 6/9] hw/arm/virt-acpi-build: Add IORT support to bypass SMMUv3 Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 8/9] hw/i386/acpi-build: Add IVRS " Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 9/9] docs: Add documentation for iommu bypass Wang Xingang
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

In DMAR table, the drhd is set to cover all PCI devices when intel_iommu
is on. To support bypass iommu feature, we need to walk the PCI bus with
bypass_iommu disabled and add explicit scope data in DMAR drhd structure.

/mnt/sdb/wxg/qemu-next/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
 -machine q35,accel=kvm,default_bus_bypass_iommu=true \
 -cpu host \
 -m 16G \
 -smp 36,sockets=2,cores=18,threads=1 \
 -device pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3 \
 -device pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x4,bypass_iommu=true \
 -device pcie-root-port,port=0x1,chassis=1,id=pci.11,bus=pci.10,addr=0x0 \
 -device pcie-root-port,port=0x2,chassis=2,id=pci.21,bus=pci.20,addr=0x0 \
 -device virtio-scsi-pci,id=scsi0,bus=pci.11,addr=0x0 \
 -device virtio-scsi-pci,id=scsi1,bus=pci.21,addr=0x0 \
 -drive file=/mnt/sdb/wxg/fedora-48g.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=native \
 -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
 -device intel-iommu \
 -nographic \

And we get the guest configuration:

~ lspci -vt
-+-[0000:20]---00.0-[21]----00.0  Red Hat, Inc. Virtio SCSI
 +-[0000:10]---00.0-[11]----00.0  Red Hat, Inc. Virtio SCSI
 \-[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
             +-01.0  Device 1234:1111
             +-02.0  Intel Corporation 82574L Gigabit Network Connection
             +-03.0  Red Hat, Inc. QEMU PCIe Expander bridge
             +-04.0  Red Hat, Inc. QEMU PCIe Expander bridge
             +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
             +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
             \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

With bypass_iommu enabled on root bus, the attached devices will bypass iommu:

/sys/class/iommu/dmar0
├── devices
│   ├── 0000:10:00.0 -> ../../../../pci0000:10/0000:10:00.0
│   └── 0000:11:00.0 -> ../../../../pci0000:10/0000:10:00.0/0000:11:00.0

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
---
 hw/i386/acpi-build.c | 68 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 796ffc6f5c..1b972b4913 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2005,6 +2005,56 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
                  x86ms->oem_table_id);
 }
 
+/*
+ * Insert DMAR scope for PCI bridges and endpoint devcie
+ */
+static void
+insert_scope(PCIBus *bus, PCIDevice *dev, void *opaque)
+{
+    GArray *scope_blob = opaque;
+    AcpiDmarDeviceScope *scope = NULL;
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
+        /* Dmar Scope Type: 0x02 for PCI Bridge */
+        build_append_int_noprefix(scope_blob, 0x02, 1);
+    } else {
+        /* Dmar Scope Type: 0x01 for PCI Endpoint Device */
+        build_append_int_noprefix(scope_blob, 0x01, 1);
+    }
+
+    /* length */
+    build_append_int_noprefix(scope_blob,
+                              sizeof(*scope) + sizeof(scope->path[0]), 1);
+    /* reserved */
+    build_append_int_noprefix(scope_blob, 0, 2);
+    /* enumeration_id */
+    build_append_int_noprefix(scope_blob, 0, 1);
+    /* bus */
+    build_append_int_noprefix(scope_blob, pci_bus_num(bus), 1);
+    /* device */
+    build_append_int_noprefix(scope_blob, PCI_SLOT(dev->devfn), 1);
+    /* function */
+    build_append_int_noprefix(scope_blob, PCI_FUNC(dev->devfn), 1);
+}
+
+/* For a given PCI host bridge, walk and insert DMAR scope */
+static int
+dmar_host_bridges(Object *obj, void *opaque)
+{
+    GArray *scope_blob = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
+        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
+
+        if (bus && !pci_bus_bypass_iommu(bus)) {
+            pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
+                                scope_blob);
+        }
+    }
+
+    return 0;
+}
+
 /*
  * VT-d spec 8.1 DMA Remapping Reporting Structure
  * (version Oct. 2014 or later)
@@ -2024,6 +2074,15 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     /* Root complex IOAPIC use one path[0] only */
     size_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
     IntelIOMMUState *intel_iommu = INTEL_IOMMU_DEVICE(iommu);
+    GArray *scope_blob = g_array_new(false, true, 1);
+
+    /*
+     * A PCI bus walk, for each PCI host bridge.
+     * Insert scope for each PCI bridge and endpoint device which
+     * is attached to a bus with iommu enabled.
+     */
+    object_child_foreach_recursive(object_get_root(),
+                                   dmar_host_bridges, scope_blob);
 
     assert(iommu);
     if (x86_iommu_ir_supported(iommu)) {
@@ -2037,8 +2096,9 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     /* DMAR Remapping Hardware Unit Definition structure */
     drhd = acpi_data_push(table_data, sizeof(*drhd) + ioapic_scope_size);
     drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
-    drhd->length = cpu_to_le16(sizeof(*drhd) + ioapic_scope_size);
-    drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
+    drhd->length =
+        cpu_to_le16(sizeof(*drhd) + ioapic_scope_size + scope_blob->len);
+    drhd->flags = 0;            /* Don't include all pci device */
     drhd->pci_segment = cpu_to_le16(0);
     drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
 
@@ -2052,6 +2112,10 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     scope->path[0].device = PCI_SLOT(Q35_PSEUDO_DEVFN_IOAPIC);
     scope->path[0].function = PCI_FUNC(Q35_PSEUDO_DEVFN_IOAPIC);
 
+    /* Add scope found above */
+    g_array_append_vals(table_data, scope_blob->data, scope_blob->len);
+    g_array_free(scope_blob, true);
+
     if (iommu->dt_supported) {
         atsr = acpi_data_push(table_data, sizeof(*atsr));
         atsr->type = cpu_to_le16(ACPI_DMAR_TYPE_ATSR);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 8/9] hw/i386/acpi-build: Add IVRS support to bypass iommu
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
                   ` (6 preceding siblings ...)
  2021-07-08 12:55 ` [PATCH v5 7/9] hw/i386/acpi-build: Add DMAR support to bypass iommu Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  2021-07-08 12:55 ` [PATCH v5 9/9] docs: Add documentation for iommu bypass Wang Xingang
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Check bypass_iommu to exclude the devices which will bypass iommu.

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
---
 hw/i386/acpi-build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1b972b4913..7c888f05bf 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2246,7 +2246,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
     if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
         PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
-        if (bus) {
+        if (bus && !pci_bus_bypass_iommu(bus)) {
             pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
         }
     }
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 9/9] docs: Add documentation for iommu bypass
  2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
                   ` (7 preceding siblings ...)
  2021-07-08 12:55 ` [PATCH v5 8/9] hw/i386/acpi-build: Add IVRS " Wang Xingang
@ 2021-07-08 12:55 ` Wang Xingang
  8 siblings, 0 replies; 10+ messages in thread
From: Wang Xingang @ 2021-07-08 12:55 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, eric.auger, shannon.zhaosl, imammedo, mst,
	marcel.apfelbaum, peter.maydell, ehabkost, richard.henderson,
	pbonzini
  Cc: xieyingtai, wangxingang5

From: Xingang Wang <wangxingang5@huawei.com>

Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
---
 docs/bypass-iommu.txt | 89 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)
 create mode 100644 docs/bypass-iommu.txt

diff --git a/docs/bypass-iommu.txt b/docs/bypass-iommu.txt
new file mode 100644
index 0000000000..e6677bddd3
--- /dev/null
+++ b/docs/bypass-iommu.txt
@@ -0,0 +1,89 @@
+BYPASS IOMMU PROPERTY
+=====================
+
+Description
+===========
+Traditionally, there is a global switch to enable/disable vIOMMU. All
+devices in the system can only support go through vIOMMU or not, which
+is not flexible. We introduce this bypass iommu property to support
+coexist of devices go through vIOMMU and devices not. This is useful to
+passthrough devices with no-iommu mode and devices go through vIOMMU in
+the same virtual machine.
+
+PCI host bridges have a bypass_iommu property. This property is used to
+determine whether the devices attached on the PCI host bridge will bypass
+virtual iommu. The bypass_iommu property is valid only when there is a
+virtual iommu in the system, it is implemented to allow some devices to
+bypass vIOMMU. When bypass_iommu property is not set for a host bridge,
+the attached devices will go through vIOMMU by default.
+
+Usage
+=====
+The bypass iommu feature support PXB host bridge and default main host
+bridge, we add a bypass_iommu property for PXB and default_bus_bypass_iommu
+for machine. Note that default_bus_bypass_iommu is available only when
+the 'q35' machine type on x86 architecture and the 'virt' machine type
+on AArch64. Other machine types do not support bypass iommu for default
+root bus.
+
+1. The following is the bypass iommu options:
+ (1) PCI expander bridge
+     qemu -device pxb-pcie,bus_nr=0x10,addr=0x1,bypass_iommu=true
+ (2) Arm default host bridge
+     qemu -machine virt,iommu=smmuv3,default_bus_bypass_iommu=true
+ (3) X86 default root bus bypass iommu:
+     qemu -machine q35,default_bus_bypass_iommu=true
+
+2. Here is the detailed qemu command line for 'virt' machine with PXB on
+AArch64:
+
+qemu-system-aarch64 \
+ -machine virt,kernel_irqchip=on,iommu=smmuv3,default_bus_bypass_iommu=true \
+ -device pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3.0x1 \
+ -device pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x3.0x2,bypass_iommu=true \
+
+And we got:
+ - a default host bridge which bypass SMMUv3
+ - a pxb host bridge which go through SMMUv3
+ - a pxb host bridge which bypass SMMUv3
+
+3. Here is the detailed qemu command line for 'q35' machine with PXB on
+x86 architecture:
+
+qemu-system-x86_64 \
+ -machine q35,accel=kvm,default_bus_bypass_iommu=true \
+ -device pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3 \
+ -device pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x4,bypass_iommu=true \
+ -device intel-iommu \
+
+And we got:
+ - a default host bridge which bypass iommu
+ - a pxb host bridge which go through iommu
+ - a pxb host bridge which bypass iommu
+
+Limitations
+===========
+There might be potential security risk when devices bypass iommu, because
+devices might send malicious dma request to virtual machine if there is no
+iommu isolation. So it would be necessary to only bypass iommu for trusted
+device.
+
+Implementation
+==============
+The bypass iommu feature includes:
+ - Address space
+   Add bypass iommu property check of PCI Host and do not get iommu address
+   space for devices bypass iommu.
+ - Arm SMMUv3 support
+   We traverse all PCI root bus and get bus number ranges, then build explicit
+   RID mapping for devices which do not bypass iommu.
+ - X86 IOMMU support
+   To support Intel iommu, we traverse all PCI host bridge and get information
+   of devices which do not bypass iommu, then fill the DMAR drhd struct with
+   explicit device scope info. To support AMD iommu, add check of bypass iommu
+   when traverse the PCI hsot bridge.
+ - Machine and PXB options
+   We add bypass iommu options in machine option for default root bus, and add
+   option for PXB also. Note that the default value of bypass iommu is false,
+   so that the devices will by default go through iommu if there exist one.
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-08 13:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08 12:55 [PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature Wang Xingang
2021-07-08 12:55 ` [PATCH v5 1/9] hw/pci/pci_host: Allow PCI host to bypass iommu Wang Xingang
2021-07-08 12:55 ` [PATCH v5 2/9] hw/pxb: Add a bypass iommu property Wang Xingang
2021-07-08 12:55 ` [PATCH v5 3/9] hw/arm/virt: Add default_bus_bypass_iommu machine option Wang Xingang
2021-07-08 12:55 ` [PATCH v5 4/9] hw/i386: Add a default_bus_bypass_iommu pc " Wang Xingang
2021-07-08 12:55 ` [PATCH v5 5/9] hw/pci: Add pci_bus_range() to get PCI bus number range Wang Xingang
2021-07-08 12:55 ` [PATCH v5 6/9] hw/arm/virt-acpi-build: Add IORT support to bypass SMMUv3 Wang Xingang
2021-07-08 12:55 ` [PATCH v5 7/9] hw/i386/acpi-build: Add DMAR support to bypass iommu Wang Xingang
2021-07-08 12:55 ` [PATCH v5 8/9] hw/i386/acpi-build: Add IVRS " Wang Xingang
2021-07-08 12:55 ` [PATCH v5 9/9] docs: Add documentation for iommu bypass Wang Xingang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.