qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device
@ 2019-07-30 17:21 Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 01/15] update-linux-headers: Import virtio_iommu.h Eric Auger
                   ` (14 more replies)
  0 siblings, 15 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This series rebases the virtio-iommu device on qemu 4.1.0-rc2
and implements the v0.12 virtio-iommu spec. The driver has just been
upstreamed in 5.3 so kernel dependencies are now resolved.

The pci proxy for the virtio-iommu device is now available and needs
to be instantiated from the command line using "-device virtio-iommu-pci".

At the moment the virtio-iommu-device only works in the ARM virt
machine with DT boot. Indeed, besides the device instantiation,
links between the PCIe root complex and the IOMMU must be described.
ACPI description is not yet supported at kernel level.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v4.1.0-rc2-virtio-iommu-v10
virtio-iommu kernel driver is available from 5.3-rc3 onwards

Testing:
- tested with guest using virtio-net-pci
  (,vhost=off,iommu_platform,disable-modern=off,disable-legacy=on)
  and virtio-blk-pci
- VFIO/VHOST integration is not part of this series. Please follow
  [PATCH RFC v5 0/5] virtio-iommu: VFIO integration respins

History:

v9 -> v10:
- rebase on 4.1.0-rc2, compliance with 0.12 spec
- removed ACPI part
- cleanup (see individual change logs)
- moved to a PATCH series

v8 -> v9:
- virtio-iommu-pci device needs to be instantiated from the command
  line (RID is not imposed anymore).
- tail structure properly initialized

v7 -> v8:
- virtio-iommu-pci added
- virt instantiation modified
- DT and ACPI modified to exclude the iommu RID from the mapping
- VIRTIO_IOMMU_F_BYPASS, VIRTIO_F_VERSION_1 features exposed

v6 -> v7:
- rebase on qemu 3.0.0-rc3
- minor update against v0.7
- fix issue with EP not on pci.0 and ACPI probing
- change the instantiation method

v5 -> v6:
- minor update against v0.6 spec
- fix g_hash_table_lookup in virtio_iommu_find_add_as
- replace some error_reports by qemu_log_mask(LOG_GUEST_ERROR, ...)

v4 -> v5:
- event queue and fault reporting
- we now return the IOAPIC MSI region if the virtio-iommu is instantiated
  in a PC machine.
- we bypass transactions on MSI HW region and fault on reserved ones.
- We support ACPI boot with mach-virt (based on IORT proposal)
- We moved to the new driver naming conventions
- simplified mach-virt instantiation
- worked around the disappearing of pci_find_primary_bus
- in virtio_iommu_translate, check the dev->as is not NULL
- initialize as->device_list in virtio_iommu_get_as
- initialize bufstate.error to false in virtio_iommu_probe

v3 -> v4:
- probe request support although no reserved region is returned at
  the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
  of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell

v2 -> v3:
- rebase on top of 2.10-rc0 and especially
  [PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
- add mutex init
- fix as->mappings deletion using g_tree_ref/unref
- when a dev is attached whereas it is already attached to
  another address space, first detach it
- fix some error values
- page_sizes = TARGET_PAGE_MASK;
- I haven't changed the unmap() semantics yet, waiting for the
  next virtio-iommu spec revision.

v1 -> v2:
- fix redifinition of viommu_as typedef


Eric Auger (15):
  update-linux-headers: Import virtio_iommu.h
  linux-headers: update against 5.3-rc2
  virtio-iommu: Add skeleton
  virtio-iommu: Decode the command payload
  virtio-iommu: Add the iommu regions
  virtio-iommu: Endpoint and domains structs and helpers
  virtio-iommu: Implement attach/detach command
  virtio-iommu: Implement map/unmap
  virtio-iommu: Implement translate
  virtio-iommu: Implement probe request
  virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  virtio-iommu: Implement fault reporting
  virtio_iommu: Handle reserved regions in translation process
  virtio-iommu-pci: Add virtio iommu pci support
  hw/arm/virt: Add the virtio-iommu device tree mappings

 hw/arm/virt.c                                 |   54 +-
 hw/virtio/Kconfig                             |    5 +
 hw/virtio/Makefile.objs                       |    2 +
 hw/virtio/trace-events                        |   25 +
 hw/virtio/virtio-iommu-pci.c                  |   88 ++
 hw/virtio/virtio-iommu.c                      | 1004 +++++++++++++++++
 include/hw/arm/virt.h                         |    2 +
 include/hw/pci/pci.h                          |    1 +
 include/hw/virtio/virtio-iommu.h              |   66 ++
 include/standard-headers/asm-x86/bootparam.h  |    2 +
 include/standard-headers/asm-x86/kvm_para.h   |    3 +
 include/standard-headers/linux/ethtool.h      |    2 +
 include/standard-headers/linux/pci_regs.h     |    4 +
 include/standard-headers/linux/virtio_ids.h   |    1 +
 include/standard-headers/linux/virtio_iommu.h |  165 +++
 include/standard-headers/linux/virtio_pmem.h  |    6 +-
 linux-headers/asm-arm/kvm.h                   |   12 +
 linux-headers/asm-arm/unistd-common.h         |    2 +
 linux-headers/asm-arm64/kvm.h                 |   17 +
 linux-headers/asm-generic/mman-common.h       |   15 +-
 linux-headers/asm-generic/mman.h              |   10 +-
 linux-headers/asm-generic/unistd.h            |    8 +-
 linux-headers/asm-mips/unistd_n32.h           |    1 +
 linux-headers/asm-mips/unistd_n64.h           |    1 +
 linux-headers/asm-mips/unistd_o32.h           |    1 +
 linux-headers/asm-powerpc/mman.h              |    6 +-
 linux-headers/asm-powerpc/unistd_32.h         |    1 +
 linux-headers/asm-powerpc/unistd_64.h         |    1 +
 linux-headers/asm-s390/unistd_32.h            |    2 +
 linux-headers/asm-s390/unistd_64.h            |    2 +
 linux-headers/asm-x86/kvm.h                   |   28 +-
 linux-headers/asm-x86/unistd_32.h             |    2 +
 linux-headers/asm-x86/unistd_64.h             |    2 +
 linux-headers/asm-x86/unistd_x32.h            |    2 +
 linux-headers/linux/kvm.h                     |   11 +-
 linux-headers/linux/psp-sev.h                 |    5 +-
 linux-headers/linux/virtio_iommu.h            |    1 +
 qdev-monitor.c                                |    1 +
 scripts/update-linux-headers.sh               |    3 +
 39 files changed, 1522 insertions(+), 42 deletions(-)
 create mode 100644 hw/virtio/virtio-iommu-pci.c
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h
 create mode 100644 include/standard-headers/linux/virtio_iommu.h
 create mode 100644 linux-headers/linux/virtio_iommu.h

-- 
2.20.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 01/15] update-linux-headers: Import virtio_iommu.h
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 02/15] linux-headers: update against 5.3-rc2 Eric Auger
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

Update the script to update the virtio_iommu.h header.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 scripts/update-linux-headers.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index f76d77363b..7805291ca0 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -175,6 +175,9 @@ fi
 cat <<EOF >$output/linux-headers/linux/virtio_config.h
 #include "standard-headers/linux/virtio_config.h"
 EOF
+cat <<EOF >$output/linux-headers/linux/virtio_iommu.h
+#include "standard-headers/linux/virtio_iommu.h"
+EOF
 cat <<EOF >$output/linux-headers/linux/virtio_ring.h
 #include "standard-headers/linux/virtio_ring.h"
 EOF
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 02/15] linux-headers: update against 5.3-rc2
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 01/15] update-linux-headers: Import virtio_iommu.h Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton Eric Auger
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

Sync headers against 5.3-rc2 (commit 2a11c76e5301)

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/standard-headers/asm-x86/bootparam.h  |   2 +
 include/standard-headers/asm-x86/kvm_para.h   |   3 +
 include/standard-headers/linux/ethtool.h      |   2 +
 include/standard-headers/linux/pci_regs.h     |   4 +
 include/standard-headers/linux/virtio_ids.h   |   1 +
 include/standard-headers/linux/virtio_iommu.h | 165 ++++++++++++++++++
 include/standard-headers/linux/virtio_pmem.h  |   6 +-
 linux-headers/asm-arm/kvm.h                   |  12 ++
 linux-headers/asm-arm/unistd-common.h         |   2 +
 linux-headers/asm-arm64/kvm.h                 |  17 ++
 linux-headers/asm-generic/mman-common.h       |  15 +-
 linux-headers/asm-generic/mman.h              |  10 +-
 linux-headers/asm-generic/unistd.h            |   8 +-
 linux-headers/asm-mips/unistd_n32.h           |   1 +
 linux-headers/asm-mips/unistd_n64.h           |   1 +
 linux-headers/asm-mips/unistd_o32.h           |   1 +
 linux-headers/asm-powerpc/mman.h              |   6 +-
 linux-headers/asm-powerpc/unistd_32.h         |   1 +
 linux-headers/asm-powerpc/unistd_64.h         |   1 +
 linux-headers/asm-s390/unistd_32.h            |   2 +
 linux-headers/asm-s390/unistd_64.h            |   2 +
 linux-headers/asm-x86/kvm.h                   |  28 ++-
 linux-headers/asm-x86/unistd_32.h             |   2 +
 linux-headers/asm-x86/unistd_64.h             |   2 +
 linux-headers/asm-x86/unistd_x32.h            |   2 +
 linux-headers/linux/kvm.h                     |  11 +-
 linux-headers/linux/psp-sev.h                 |   5 +-
 linux-headers/linux/virtio_iommu.h            |   1 +
 28 files changed, 278 insertions(+), 35 deletions(-)
 create mode 100644 include/standard-headers/linux/virtio_iommu.h
 create mode 100644 linux-headers/linux/virtio_iommu.h

diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h
index 67d4f0119f..a6f7cf535e 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -29,6 +29,8 @@
 #define XLF_EFI_HANDOVER_32		(1<<2)
 #define XLF_EFI_HANDOVER_64		(1<<3)
 #define XLF_EFI_KEXEC			(1<<4)
+#define XLF_5LEVEL			(1<<5)
+#define XLF_5LEVEL_ENABLED		(1<<6)
 
 
 #endif /* _ASM_X86_BOOTPARAM_H */
diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h
index 35cd8d651f..90604a8fb7 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -29,6 +29,8 @@
 #define KVM_FEATURE_PV_TLB_FLUSH	9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT	10
 #define KVM_FEATURE_PV_SEND_IPI	11
+#define KVM_FEATURE_POLL_CONTROL	12
+#define KVM_FEATURE_PV_SCHED_YIELD	13
 
 #define KVM_HINTS_REALTIME      0
 
@@ -47,6 +49,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
+#define MSR_KVM_POLL_CONTROL	0x4b564d05
 
 struct kvm_steal_time {
 	uint64_t steal;
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 9b9919a8f6..16d0eeea86 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -1483,6 +1483,8 @@ enum ethtool_link_mode_bit_indices {
 	ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT = 64,
 	ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT	 = 65,
 	ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT	 = 66,
+	ETHTOOL_LINK_MODE_100baseT1_Full_BIT		 = 67,
+	ETHTOOL_LINK_MODE_1000baseT1_Full_BIT		 = 68,
 
 	/* must be last entry */
 	__ETHTOOL_LINK_MODE_MASK_NBITS
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index 27164769d1..f28e562d7c 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -528,6 +528,7 @@
 #define  PCI_EXP_LNKCAP_SLS_5_0GB 0x00000002 /* LNKCAP2 SLS Vector bit 1 */
 #define  PCI_EXP_LNKCAP_SLS_8_0GB 0x00000003 /* LNKCAP2 SLS Vector bit 2 */
 #define  PCI_EXP_LNKCAP_SLS_16_0GB 0x00000004 /* LNKCAP2 SLS Vector bit 3 */
+#define  PCI_EXP_LNKCAP_SLS_32_0GB 0x00000005 /* LNKCAP2 SLS Vector bit 4 */
 #define  PCI_EXP_LNKCAP_MLW	0x000003f0 /* Maximum Link Width */
 #define  PCI_EXP_LNKCAP_ASPMS	0x00000c00 /* ASPM Support */
 #define  PCI_EXP_LNKCAP_L0SEL	0x00007000 /* L0s Exit Latency */
@@ -556,6 +557,7 @@
 #define  PCI_EXP_LNKSTA_CLS_5_0GB 0x0002 /* Current Link Speed 5.0GT/s */
 #define  PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */
 #define  PCI_EXP_LNKSTA_CLS_16_0GB 0x0004 /* Current Link Speed 16.0GT/s */
+#define  PCI_EXP_LNKSTA_CLS_32_0GB 0x0005 /* Current Link Speed 32.0GT/s */
 #define  PCI_EXP_LNKSTA_NLW	0x03f0	/* Negotiated Link Width */
 #define  PCI_EXP_LNKSTA_NLW_X1	0x0010	/* Current Link Width x1 */
 #define  PCI_EXP_LNKSTA_NLW_X2	0x0020	/* Current Link Width x2 */
@@ -661,6 +663,7 @@
 #define  PCI_EXP_LNKCAP2_SLS_5_0GB	0x00000004 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_8_0GB	0x00000008 /* Supported Speed 8GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_16_0GB	0x00000010 /* Supported Speed 16GT/s */
+#define  PCI_EXP_LNKCAP2_SLS_32_0GB	0x00000020 /* Supported Speed 32GT/s */
 #define  PCI_EXP_LNKCAP2_CROSSLINK	0x00000100 /* Crosslink supported */
 #define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
 #define  PCI_EXP_LNKCTL2_TLS		0x000f
@@ -668,6 +671,7 @@
 #define  PCI_EXP_LNKCTL2_TLS_5_0GT	0x0002 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCTL2_TLS_8_0GT	0x0003 /* Supported Speed 8GT/s */
 #define  PCI_EXP_LNKCTL2_TLS_16_0GT	0x0004 /* Supported Speed 16GT/s */
+#define  PCI_EXP_LNKCTL2_TLS_32_0GT	0x0005 /* Supported Speed 32GT/s */
 #define PCI_EXP_LNKSTA2		50	/* Link Status 2 */
 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	52	/* v2 endpoints with link end here */
 #define PCI_EXP_SLTCAP2		52	/* Slot Capabilities 2 */
diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 32b2f94d1f..348fd0176f 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -43,6 +43,7 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_IOMMU        23 /* virtio IOMMU */
 #define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_iommu.h b/include/standard-headers/linux/virtio_iommu.h
new file mode 100644
index 0000000000..b9443b83a1
--- /dev/null
+++ b/include/standard-headers/linux/virtio_iommu.h
@@ -0,0 +1,165 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+/*
+ * Virtio-iommu definition v0.12
+ *
+ * Copyright (C) 2019 Arm Ltd.
+ */
+#ifndef _LINUX_VIRTIO_IOMMU_H
+#define _LINUX_VIRTIO_IOMMU_H
+
+#include "standard-headers/linux/types.h"
+
+/* Feature bits */
+#define VIRTIO_IOMMU_F_INPUT_RANGE		0
+#define VIRTIO_IOMMU_F_DOMAIN_RANGE		1
+#define VIRTIO_IOMMU_F_MAP_UNMAP		2
+#define VIRTIO_IOMMU_F_BYPASS			3
+#define VIRTIO_IOMMU_F_PROBE			4
+#define VIRTIO_IOMMU_F_MMIO			5
+
+struct virtio_iommu_range_64 {
+	uint64_t					start;
+	uint64_t					end;
+};
+
+struct virtio_iommu_range_32 {
+	uint32_t					start;
+	uint32_t					end;
+};
+
+struct virtio_iommu_config {
+	/* Supported page sizes */
+	uint64_t					page_size_mask;
+	/* Supported IOVA range */
+	struct virtio_iommu_range_64		input_range;
+	/* Max domain ID size */
+	struct virtio_iommu_range_32		domain_range;
+	/* Probe buffer size */
+	uint32_t					probe_size;
+};
+
+/* Request types */
+#define VIRTIO_IOMMU_T_ATTACH			0x01
+#define VIRTIO_IOMMU_T_DETACH			0x02
+#define VIRTIO_IOMMU_T_MAP			0x03
+#define VIRTIO_IOMMU_T_UNMAP			0x04
+#define VIRTIO_IOMMU_T_PROBE			0x05
+
+/* Status types */
+#define VIRTIO_IOMMU_S_OK			0x00
+#define VIRTIO_IOMMU_S_IOERR			0x01
+#define VIRTIO_IOMMU_S_UNSUPP			0x02
+#define VIRTIO_IOMMU_S_DEVERR			0x03
+#define VIRTIO_IOMMU_S_INVAL			0x04
+#define VIRTIO_IOMMU_S_RANGE			0x05
+#define VIRTIO_IOMMU_S_NOENT			0x06
+#define VIRTIO_IOMMU_S_FAULT			0x07
+#define VIRTIO_IOMMU_S_NOMEM			0x08
+
+struct virtio_iommu_req_head {
+	uint8_t					type;
+	uint8_t					reserved[3];
+};
+
+struct virtio_iommu_req_tail {
+	uint8_t					status;
+	uint8_t					reserved[3];
+};
+
+struct virtio_iommu_req_attach {
+	struct virtio_iommu_req_head		head;
+	uint32_t					domain;
+	uint32_t					endpoint;
+	uint8_t					reserved[8];
+	struct virtio_iommu_req_tail		tail;
+};
+
+struct virtio_iommu_req_detach {
+	struct virtio_iommu_req_head		head;
+	uint32_t					domain;
+	uint32_t					endpoint;
+	uint8_t					reserved[8];
+	struct virtio_iommu_req_tail		tail;
+};
+
+#define VIRTIO_IOMMU_MAP_F_READ			(1 << 0)
+#define VIRTIO_IOMMU_MAP_F_WRITE		(1 << 1)
+#define VIRTIO_IOMMU_MAP_F_MMIO			(1 << 2)
+
+#define VIRTIO_IOMMU_MAP_F_MASK			(VIRTIO_IOMMU_MAP_F_READ |	\
+						 VIRTIO_IOMMU_MAP_F_WRITE |	\
+						 VIRTIO_IOMMU_MAP_F_MMIO)
+
+struct virtio_iommu_req_map {
+	struct virtio_iommu_req_head		head;
+	uint32_t					domain;
+	uint64_t					virt_start;
+	uint64_t					virt_end;
+	uint64_t					phys_start;
+	uint32_t					flags;
+	struct virtio_iommu_req_tail		tail;
+};
+
+struct virtio_iommu_req_unmap {
+	struct virtio_iommu_req_head		head;
+	uint32_t					domain;
+	uint64_t					virt_start;
+	uint64_t					virt_end;
+	uint8_t					reserved[4];
+	struct virtio_iommu_req_tail		tail;
+};
+
+#define VIRTIO_IOMMU_PROBE_T_NONE		0
+#define VIRTIO_IOMMU_PROBE_T_RESV_MEM		1
+
+#define VIRTIO_IOMMU_PROBE_T_MASK		0xfff
+
+struct virtio_iommu_probe_property {
+	uint16_t					type;
+	uint16_t					length;
+};
+
+#define VIRTIO_IOMMU_RESV_MEM_T_RESERVED	0
+#define VIRTIO_IOMMU_RESV_MEM_T_MSI		1
+
+struct virtio_iommu_probe_resv_mem {
+	struct virtio_iommu_probe_property	head;
+	uint8_t					subtype;
+	uint8_t					reserved[3];
+	uint64_t					start;
+	uint64_t					end;
+};
+
+struct virtio_iommu_req_probe {
+	struct virtio_iommu_req_head		head;
+	uint32_t					endpoint;
+	uint8_t					reserved[64];
+
+	uint8_t					properties[];
+
+	/*
+	 * Tail follows the variable-length properties array. No padding,
+	 * property lengths are all aligned on 8 bytes.
+	 */
+};
+
+/* Fault types */
+#define VIRTIO_IOMMU_FAULT_R_UNKNOWN		0
+#define VIRTIO_IOMMU_FAULT_R_DOMAIN		1
+#define VIRTIO_IOMMU_FAULT_R_MAPPING		2
+
+#define VIRTIO_IOMMU_FAULT_F_READ		(1 << 0)
+#define VIRTIO_IOMMU_FAULT_F_WRITE		(1 << 1)
+#define VIRTIO_IOMMU_FAULT_F_EXEC		(1 << 2)
+#define VIRTIO_IOMMU_FAULT_F_ADDRESS		(1 << 8)
+
+struct virtio_iommu_fault {
+	uint8_t					reason;
+	uint8_t					reserved[3];
+	uint32_t					flags;
+	uint32_t					endpoint;
+	uint8_t					reserved2[4];
+	uint64_t					address;
+};
+
+#endif
diff --git a/include/standard-headers/linux/virtio_pmem.h b/include/standard-headers/linux/virtio_pmem.h
index 7e3d43b121..fc029de798 100644
--- a/include/standard-headers/linux/virtio_pmem.h
+++ b/include/standard-headers/linux/virtio_pmem.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause */
 /*
  * Definitions for virtio-pmem devices.
  *
@@ -7,8 +7,8 @@
  * Author(s): Pankaj Gupta <pagupta@redhat.com>
  */
 
-#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
-#define _UAPI_LINUX_VIRTIO_PMEM_H
+#ifndef _LINUX_VIRTIO_PMEM_H
+#define _LINUX_VIRTIO_PMEM_H
 
 #include "standard-headers/linux/types.h"
 #include "standard-headers/linux/virtio_ids.h"
diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
index e1f8b74558..dfccc47092 100644
--- a/linux-headers/asm-arm/kvm.h
+++ b/linux-headers/asm-arm/kvm.h
@@ -214,6 +214,18 @@ struct kvm_vcpu_events {
 #define KVM_REG_ARM_FW_REG(r)		(KVM_REG_ARM | KVM_REG_SIZE_U64 | \
 					 KVM_REG_ARM_FW | ((r) & 0xffff))
 #define KVM_REG_ARM_PSCI_VERSION	KVM_REG_ARM_FW_REG(0)
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1	KVM_REG_ARM_FW_REG(1)
+	/* Higher values mean better protection. */
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL		0
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL		1
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED	2
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2	KVM_REG_ARM_FW_REG(2)
+	/* Higher values mean better protection. */
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL		0
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN		1
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL		2
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED	(1U << 4)
 
 /* Device Control API: ARM VGIC */
 #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
diff --git a/linux-headers/asm-arm/unistd-common.h b/linux-headers/asm-arm/unistd-common.h
index 27a9b6da27..eb5d361b11 100644
--- a/linux-headers/asm-arm/unistd-common.h
+++ b/linux-headers/asm-arm/unistd-common.h
@@ -388,5 +388,7 @@
 #define __NR_fsconfig (__NR_SYSCALL_BASE + 431)
 #define __NR_fsmount (__NR_SYSCALL_BASE + 432)
 #define __NR_fspick (__NR_SYSCALL_BASE + 433)
+#define __NR_pidfd_open (__NR_SYSCALL_BASE + 434)
+#define __NR_clone3 (__NR_SYSCALL_BASE + 435)
 
 #endif /* _ASM_ARM_UNISTD_COMMON_H */
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 2431ec35a9..a95d3a4203 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -229,6 +229,16 @@ struct kvm_vcpu_events {
 #define KVM_REG_ARM_FW_REG(r)		(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
 					 KVM_REG_ARM_FW | ((r) & 0xffff))
 #define KVM_REG_ARM_PSCI_VERSION	KVM_REG_ARM_FW_REG(0)
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1	KVM_REG_ARM_FW_REG(1)
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL		0
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL		1
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED	2
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2	KVM_REG_ARM_FW_REG(2)
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL		0
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN		1
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL		2
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED     	(1U << 4)
 
 /* SVE registers */
 #define KVM_REG_ARM64_SVE		(0x15 << KVM_REG_ARM_COPROC_SHIFT)
@@ -260,6 +270,13 @@ struct kvm_vcpu_events {
 	 KVM_REG_SIZE_U256 |						\
 	 ((i) & (KVM_ARM64_SVE_MAX_SLICES - 1)))
 
+/*
+ * Register values for KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() and
+ * KVM_REG_ARM64_SVE_FFR() are represented in memory in an endianness-
+ * invariant layout which differs from the layout used for the FPSIMD
+ * V-registers on big-endian systems: see sigcontext.h for more explanation.
+ */
+
 #define KVM_ARM64_SVE_VQ_MIN __SVE_VQ_MIN
 #define KVM_ARM64_SVE_VQ_MAX __SVE_VQ_MAX
 
diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
index abd238d0f7..63b1f506ea 100644
--- a/linux-headers/asm-generic/mman-common.h
+++ b/linux-headers/asm-generic/mman-common.h
@@ -19,15 +19,18 @@
 #define MAP_TYPE	0x0f		/* Mask for type of mapping */
 #define MAP_FIXED	0x10		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
-#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
-# define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be uninitialized */
-#else
-# define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
-#endif
 
-/* 0x0100 - 0x80000 flags are defined in asm-generic/mman.h */
+/* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */
+#define MAP_POPULATE		0x008000	/* populate (prefault) pagetables */
+#define MAP_NONBLOCK		0x010000	/* do not block on IO */
+#define MAP_STACK		0x020000	/* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB		0x040000	/* create a huge page mapping */
+#define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
 #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
 
+#define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
+					 * uninitialized */
+
 /*
  * Flags for mlock
  */
diff --git a/linux-headers/asm-generic/mman.h b/linux-headers/asm-generic/mman.h
index 653687d977..57e8195d0b 100644
--- a/linux-headers/asm-generic/mman.h
+++ b/linux-headers/asm-generic/mman.h
@@ -9,13 +9,11 @@
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_LOCKED	0x2000		/* pages are locked */
 #define MAP_NORESERVE	0x4000		/* don't check for reservations */
-#define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
-#define MAP_NONBLOCK	0x10000		/* do not block on IO */
-#define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
-#define MAP_HUGETLB	0x40000		/* create a huge page mapping */
-#define MAP_SYNC	0x80000		/* perform synchronous page faults for the mapping */
 
-/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
+/*
+ * Bits [26:31] are reserved, see asm-generic/hugetlb_encode.h
+ * for MAP_HUGETLB usage
+ */
 
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index a87904daf1..1be0e798e3 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -844,9 +844,15 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
 __SYSCALL(__NR_fsmount, sys_fsmount)
 #define __NR_fspick 433
 __SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
+#ifdef __ARCH_WANT_SYS_CLONE3
+#define __NR_clone3 435
+__SYSCALL(__NR_clone3, sys_clone3)
+#endif
 
 #undef __NR_syscalls
-#define __NR_syscalls 434
+#define __NR_syscalls 436
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
index fb988de900..7dffe8e34e 100644
--- a/linux-headers/asm-mips/unistd_n32.h
+++ b/linux-headers/asm-mips/unistd_n32.h
@@ -363,6 +363,7 @@
 #define __NR_fsconfig	(__NR_Linux + 431)
 #define __NR_fsmount	(__NR_Linux + 432)
 #define __NR_fspick	(__NR_Linux + 433)
+#define __NR_pidfd_open	(__NR_Linux + 434)
 
 
 #endif /* _ASM_MIPS_UNISTD_N32_H */
diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
index 17359163c9..f4592d6fc5 100644
--- a/linux-headers/asm-mips/unistd_n64.h
+++ b/linux-headers/asm-mips/unistd_n64.h
@@ -339,6 +339,7 @@
 #define __NR_fsconfig	(__NR_Linux + 431)
 #define __NR_fsmount	(__NR_Linux + 432)
 #define __NR_fspick	(__NR_Linux + 433)
+#define __NR_pidfd_open	(__NR_Linux + 434)
 
 
 #endif /* _ASM_MIPS_UNISTD_N64_H */
diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
index 83c8d8fb83..04c6728352 100644
--- a/linux-headers/asm-mips/unistd_o32.h
+++ b/linux-headers/asm-mips/unistd_o32.h
@@ -409,6 +409,7 @@
 #define __NR_fsconfig	(__NR_Linux + 431)
 #define __NR_fsmount	(__NR_Linux + 432)
 #define __NR_fspick	(__NR_Linux + 433)
+#define __NR_pidfd_open	(__NR_Linux + 434)
 
 
 #endif /* _ASM_MIPS_UNISTD_O32_H */
diff --git a/linux-headers/asm-powerpc/mman.h b/linux-headers/asm-powerpc/mman.h
index 1c2b3fca05..8db7c2a3be 100644
--- a/linux-headers/asm-powerpc/mman.h
+++ b/linux-headers/asm-powerpc/mman.h
@@ -21,15 +21,11 @@
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 
+
 #define MCL_CURRENT     0x2000          /* lock all currently mapped pages */
 #define MCL_FUTURE      0x4000          /* lock all additions to address space */
 #define MCL_ONFAULT	0x8000		/* lock all pages that are faulted in */
 
-#define MAP_POPULATE	0x8000		/* populate (prefault) pagetables */
-#define MAP_NONBLOCK	0x10000		/* do not block on IO */
-#define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
-#define MAP_HUGETLB	0x40000		/* create a huge page mapping */
-
 /* Override any generic PKEY permission defines */
 #define PKEY_DISABLE_EXECUTE   0x4
 #undef PKEY_ACCESS_MASK
diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
index 04cb2d3e61..2af478a7fe 100644
--- a/linux-headers/asm-powerpc/unistd_32.h
+++ b/linux-headers/asm-powerpc/unistd_32.h
@@ -416,6 +416,7 @@
 #define __NR_fsconfig	431
 #define __NR_fsmount	432
 #define __NR_fspick	433
+#define __NR_pidfd_open	434
 
 
 #endif /* _ASM_POWERPC_UNISTD_32_H */
diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
index b1e6921490..4d76f18222 100644
--- a/linux-headers/asm-powerpc/unistd_64.h
+++ b/linux-headers/asm-powerpc/unistd_64.h
@@ -388,6 +388,7 @@
 #define __NR_fsconfig	431
 #define __NR_fsmount	432
 #define __NR_fspick	433
+#define __NR_pidfd_open	434
 
 
 #endif /* _ASM_POWERPC_UNISTD_64_H */
diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
index 941853f3e9..7cce3ee296 100644
--- a/linux-headers/asm-s390/unistd_32.h
+++ b/linux-headers/asm-s390/unistd_32.h
@@ -406,5 +406,7 @@
 #define __NR_fsconfig 431
 #define __NR_fsmount 432
 #define __NR_fspick 433
+#define __NR_pidfd_open 434
+#define __NR_clone3 435
 
 #endif /* _ASM_S390_UNISTD_32_H */
diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
index 90271d7f82..2371ff1e7a 100644
--- a/linux-headers/asm-s390/unistd_64.h
+++ b/linux-headers/asm-s390/unistd_64.h
@@ -354,5 +354,7 @@
 #define __NR_fsconfig 431
 #define __NR_fsmount 432
 #define __NR_fspick 433
+#define __NR_pidfd_open 434
+#define __NR_clone3 435
 
 #endif /* _ASM_S390_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 6e7dd792e4..503d3f42da 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -378,23 +378,24 @@ struct kvm_sync_regs {
 	struct kvm_vcpu_events events;
 };
 
-#define KVM_X86_QUIRK_LINT0_REENABLED	(1 << 0)
-#define KVM_X86_QUIRK_CD_NW_CLEARED	(1 << 1)
-#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	(1 << 2)
-#define KVM_X86_QUIRK_OUT_7E_INC_RIP	(1 << 3)
+#define KVM_X86_QUIRK_LINT0_REENABLED	   (1 << 0)
+#define KVM_X86_QUIRK_CD_NW_CLEARED	   (1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	   (1 << 2)
+#define KVM_X86_QUIRK_OUT_7E_INC_RIP	   (1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
-#define KVM_STATE_NESTED_FORMAT_SVM	1
+#define KVM_STATE_NESTED_FORMAT_SVM	1	/* unused */
 
 #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
 #define KVM_STATE_NESTED_EVMCS		0x00000004
 
-#define KVM_STATE_NESTED_VMX_VMCS_SIZE	0x1000
-
 #define KVM_STATE_NESTED_SMM_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_SMM_VMXON	0x00000002
 
+#define KVM_STATE_NESTED_VMX_VMCS_SIZE	0x1000
+
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
 	__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
@@ -432,4 +433,17 @@ struct kvm_nested_state {
 	} data;
 };
 
+/* for KVM_CAP_PMU_EVENT_FILTER */
+struct kvm_pmu_event_filter {
+	__u32 action;
+	__u32 nevents;
+	__u32 fixed_counter_bitmap;
+	__u32 flags;
+	__u32 pad[4];
+	__u64 events[0];
+};
+
+#define KVM_PMU_EVENT_ALLOW 0
+#define KVM_PMU_EVENT_DENY 1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index 57bb48854c..e8ebec1cdc 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -424,5 +424,7 @@
 #define __NR_fsconfig 431
 #define __NR_fsmount 432
 #define __NR_fspick 433
+#define __NR_pidfd_open 434
+#define __NR_clone3 435
 
 #endif /* _ASM_X86_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index fe6aa0688a..a2f863d549 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -346,5 +346,7 @@
 #define __NR_fsconfig 431
 #define __NR_fsmount 432
 #define __NR_fspick 433
+#define __NR_pidfd_open 434
+#define __NR_clone3 435
 
 #endif /* _ASM_X86_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index 09cca49ba7..4cdc67d848 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -299,6 +299,8 @@
 #define __NR_fsconfig (__X32_SYSCALL_BIT + 431)
 #define __NR_fsmount (__X32_SYSCALL_BIT + 432)
 #define __NR_fspick (__X32_SYSCALL_BIT + 433)
+#define __NR_pidfd_open (__X32_SYSCALL_BIT + 434)
+#define __NR_clone3 (__X32_SYSCALL_BIT + 435)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index c8423e760c..9cf351919c 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -116,7 +116,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
-	 * For ARM: See Documentation/virtual/kvm/api.txt
+	 * For ARM: See Documentation/virt/kvm/api.txt
 	 */
 	union {
 		__u32 irq;
@@ -696,9 +696,11 @@ struct kvm_ioeventfd {
 #define KVM_X86_DISABLE_EXITS_MWAIT          (1 << 0)
 #define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE         (1 << 3)
 #define KVM_X86_DISABLE_VALID_EXITS          (KVM_X86_DISABLE_EXITS_MWAIT | \
                                               KVM_X86_DISABLE_EXITS_HLT | \
-                                              KVM_X86_DISABLE_EXITS_PAUSE)
+                                              KVM_X86_DISABLE_EXITS_PAUSE | \
+                                              KVM_X86_DISABLE_EXITS_CSTATE)
 
 /* for KVM_ENABLE_CAP */
 struct kvm_enable_cap {
@@ -993,6 +995,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_SVE 170
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
+#define KVM_CAP_PMU_EVENT_FILTER 173
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1083,7 +1086,7 @@ struct kvm_xen_hvm_config {
  *
  * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
  * the irqfd to operate in resampling mode for level triggered interrupt
- * emulation.  See Documentation/virtual/kvm/api.txt.
+ * emulation.  See Documentation/virt/kvm/api.txt.
  */
 #define KVM_IRQFD_FLAG_RESAMPLE (1 << 1)
 
@@ -1327,6 +1330,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_RMMU_INFO	  _IOW(KVMIO,  0xb0, struct kvm_ppc_rmmu_info)
 /* Available with KVM_CAP_PPC_GET_CPU_CHAR */
 #define KVM_PPC_GET_CPU_CHAR	  _IOR(KVMIO,  0xb1, struct kvm_ppc_cpu_char)
+/* Available with KVM_CAP_PMU_EVENT_FILTER */
+#define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
index 36bbe17d8f..34c39690c0 100644
--- a/linux-headers/linux/psp-sev.h
+++ b/linux-headers/linux/psp-sev.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
 /*
  * Userspace interface for AMD Secure Encrypted Virtualization (SEV)
  * platform management commands.
@@ -7,10 +8,6 @@
  * Author: Brijesh Singh <brijesh.singh@amd.com>
  *
  * SEV API specification is available at: https://developer.amd.com/sev/
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
  */
 
 #ifndef __PSP_SEV_USER_H__
diff --git a/linux-headers/linux/virtio_iommu.h b/linux-headers/linux/virtio_iommu.h
new file mode 100644
index 0000000000..2dc4609c16
--- /dev/null
+++ b/linux-headers/linux/virtio_iommu.h
@@ -0,0 +1 @@
+#include "standard-headers/linux/virtio_iommu.h"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 01/15] update-linux-headers: Import virtio_iommu.h Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 02/15] linux-headers: update against 5.3-rc2 Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-15 13:54   ` Peter Xu
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 04/15] virtio-iommu: Decode the command payload Eric Auger
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patchs adds the skeleton for the virtio-iommu device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- expose VIRTIO_IOMMU_F_MMIO feature
- s/domain_bits/domain_range struct
- change error codes
- enforce unmigratable
- Kconfig

v7 -> v8:
- expose VIRTIO_IOMMU_F_BYPASS and VIRTIO_F_VERSION_1
  features
- set_config dummy implementation + tracing
- add trace in get_features
- set the features on realize() and store the acked ones
- remove inclusion of linux/virtio_iommu.h

v6 -> v7:
- removed qapi-event.h include
- add primary_bus and associated property

v4 -> v5:
- use the new v0.5 terminology (domain, endpoint)
- add the event virtqueue

v3 -> v4:
- use page_size_mask instead of page_sizes
- added set_features()
- added some traces (reset, set_status, set_features)
- empty virtio_iommu_set_config() as the driver MUST NOT
  write to device configuration fields
- add get_config trace

v2 -> v3:
- rebase on 2.10-rc0, ie. use IOMMUMemoryRegion and remove
  iommu_ops.
- advertise VIRTIO_IOMMU_F_MAP_UNMAP feature
- page_sizes set to TARGET_PAGE_SIZE

Conflicts:
	hw/virtio/trace-events
---
 hw/virtio/Kconfig                |   5 +
 hw/virtio/Makefile.objs          |   1 +
 hw/virtio/trace-events           |   8 +
 hw/virtio/virtio-iommu.c         | 267 +++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-iommu.h |  62 +++++++
 5 files changed, 343 insertions(+)
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h

diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
index 3724ff8bac..a30107b439 100644
--- a/hw/virtio/Kconfig
+++ b/hw/virtio/Kconfig
@@ -6,6 +6,11 @@ config VIRTIO_RNG
     default y
     depends on VIRTIO
 
+config VIRTIO_IOMMU
+    bool
+    default y
+    depends on VIRTIO
+
 config VIRTIO_PCI
     bool
     default y if PCI_DEVICES
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 964ce78607..f42e4dd94f 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -14,6 +14,7 @@ obj-$(CONFIG_VIRTIO_CRYPTO) += virtio-crypto.o
 obj-$(call land,$(CONFIG_VIRTIO_CRYPTO),$(CONFIG_VIRTIO_PCI)) += virtio-crypto-pci.o
 obj-$(CONFIG_VIRTIO_PMEM) += virtio-pmem.o
 common-obj-$(call land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pmem-pci.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e28ba48da6..f7dac39213 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -53,3 +53,11 @@ virtio_mmio_write_offset(uint64_t offset, uint64_t value) "virtio_mmio_write off
 virtio_mmio_guest_page(uint64_t size, int shift) "guest page size 0x%" PRIx64 " shift %d"
 virtio_mmio_queue_write(uint64_t value, int max_size) "mmio_queue write 0x%" PRIx64 " max %d"
 virtio_mmio_setting_irq(int level) "virtio_mmio setting IRQ %d"
+
+# hw/virtio/virtio-iommu.c
+virtio_iommu_device_reset(void) "reset!"
+virtio_iommu_get_features(uint64_t features) "device supports features=0x%"PRIx64
+virtio_iommu_set_features(uint64_t features) "features accepted by the driver =0x%"PRIx64
+virtio_iommu_device_status(uint8_t status) "driver status = %d"
+virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_range=%d probe_size=0x%x"
+virtio_iommu_set_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_bits=%d probe_size=0x%x"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
new file mode 100644
index 0000000000..f239954396
--- /dev/null
+++ b/hw/virtio/virtio-iommu.c
@@ -0,0 +1,267 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iov.h"
+#include "qemu-common.h"
+#include "hw/virtio/virtio.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+#include "standard-headers/linux/virtio_ids.h"
+
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-iommu.h"
+
+/* Max size */
+#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+
+static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
+                                      struct iovec *iov,
+                                      unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
+                                      struct iovec *iov,
+                                      unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+static int virtio_iommu_handle_map(VirtIOIOMMU *s,
+                                   struct iovec *iov,
+                                   unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
+                                     struct iovec *iov,
+                                     unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+    struct virtio_iommu_req_head head;
+    struct virtio_iommu_req_tail tail;
+    VirtQueueElement *elem;
+    unsigned int iov_cnt;
+    struct iovec *iov;
+    size_t sz;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            return;
+        }
+
+        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
+            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
+            virtio_error(vdev, "virtio-iommu bad head/tail size");
+            virtqueue_detach_element(vq, elem, 0);
+            g_free(elem);
+            break;
+        }
+
+        iov_cnt = elem->out_num;
+        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
+        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
+        if (unlikely(sz != sizeof(head))) {
+            tail.status = VIRTIO_IOMMU_S_DEVERR;
+            goto out;
+        }
+        qemu_mutex_lock(&s->mutex);
+        switch (head.type) {
+        case VIRTIO_IOMMU_T_ATTACH:
+            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_DETACH:
+            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_MAP:
+            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_UNMAP:
+            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
+            break;
+        default:
+            tail.status = VIRTIO_IOMMU_S_UNSUPP;
+        }
+        qemu_mutex_unlock(&s->mutex);
+
+out:
+        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                          &tail, sizeof(tail));
+        assert(sz == sizeof(tail));
+
+        virtqueue_push(vq, elem, sizeof(tail));
+        virtio_notify(vdev, vq);
+        g_free(elem);
+    }
+}
+
+static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+    struct virtio_iommu_config *config = &dev->config;
+
+    trace_virtio_iommu_get_config(config->page_size_mask,
+                                  config->input_range.start,
+                                  config->input_range.end,
+                                  config->domain_range.end,
+                                  config->probe_size);
+    memcpy(config_data, &dev->config, sizeof(struct virtio_iommu_config));
+}
+
+static void virtio_iommu_set_config(VirtIODevice *vdev,
+                                      const uint8_t *config_data)
+{
+    struct virtio_iommu_config config;
+
+    memcpy(&config, config_data, sizeof(struct virtio_iommu_config));
+    trace_virtio_iommu_set_config(config.page_size_mask,
+                                  config.input_range.start,
+                                  config.input_range.end,
+                                  config.domain_range.end,
+                                  config.probe_size);
+}
+
+static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
+                                          Error **errp)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+
+    f |= dev->features;
+    trace_virtio_iommu_get_features(f);
+    return f;
+}
+
+static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+
+    dev->acked_features = val;
+    trace_virtio_iommu_set_features(dev->acked_features);
+}
+
+static const VMStateDescription vmstate_virtio_iommu_device = {
+    .name = "virtio-iommu-device",
+    .unmigratable = 1,
+};
+
+static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
+                sizeof(struct virtio_iommu_config));
+
+    s->req_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
+                             virtio_iommu_handle_command);
+    s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
+
+    s->config.page_size_mask = TARGET_PAGE_MASK;
+    s->config.input_range.end = -1UL;
+    s->config.domain_range.start = 0;
+    s->config.domain_range.end = 32;
+
+    virtio_add_feature(&s->features, VIRTIO_RING_F_EVENT_IDX);
+    virtio_add_feature(&s->features, VIRTIO_RING_F_INDIRECT_DESC);
+    virtio_add_feature(&s->features, VIRTIO_F_VERSION_1);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_INPUT_RANGE);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_DOMAIN_RANGE);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
+}
+
+static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    virtio_cleanup(vdev);
+}
+
+static void virtio_iommu_device_reset(VirtIODevice *vdev)
+{
+    trace_virtio_iommu_device_reset();
+}
+
+static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
+{
+    trace_virtio_iommu_device_status(status);
+}
+
+static void virtio_iommu_instance_init(Object *obj)
+{
+}
+
+static const VMStateDescription vmstate_virtio_iommu = {
+    .name = "virtio-iommu",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property virtio_iommu_properties[] = {
+    DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus, "PCI", PCIBus *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_iommu_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+
+    dc->props = virtio_iommu_properties;
+    dc->vmsd = &vmstate_virtio_iommu;
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    vdc->realize = virtio_iommu_device_realize;
+    vdc->unrealize = virtio_iommu_device_unrealize;
+    vdc->reset = virtio_iommu_device_reset;
+    vdc->get_config = virtio_iommu_get_config;
+    vdc->set_config = virtio_iommu_set_config;
+    vdc->get_features = virtio_iommu_get_features;
+    vdc->set_features = virtio_iommu_set_features;
+    vdc->set_status = virtio_iommu_set_status;
+    vdc->vmsd = &vmstate_virtio_iommu_device;
+}
+
+static const TypeInfo virtio_iommu_info = {
+    .name = TYPE_VIRTIO_IOMMU,
+    .parent = TYPE_VIRTIO_DEVICE,
+    .instance_size = sizeof(VirtIOIOMMU),
+    .instance_init = virtio_iommu_instance_init,
+    .class_init = virtio_iommu_class_init,
+};
+
+static void virtio_register_types(void)
+{
+    type_register_static(&virtio_iommu_info);
+}
+
+type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
new file mode 100644
index 0000000000..4d47b6abeb
--- /dev/null
+++ b/include/hw/virtio/virtio-iommu.h
@@ -0,0 +1,62 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef QEMU_VIRTIO_IOMMU_H
+#define QEMU_VIRTIO_IOMMU_H
+
+#include "standard-headers/linux/virtio_iommu.h"
+#include "hw/virtio/virtio.h"
+#include "hw/pci/pci.h"
+
+#define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
+#define VIRTIO_IOMMU(obj) \
+        OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
+
+#define IOMMU_PCI_BUS_MAX      256
+#define IOMMU_PCI_DEVFN_MAX    256
+
+typedef struct IOMMUDevice {
+    void         *viommu;
+    PCIBus       *bus;
+    int           devfn;
+    IOMMUMemoryRegion  iommu_mr;
+    AddressSpace  as;
+} IOMMUDevice;
+
+typedef struct IOMMUPciBus {
+    PCIBus       *bus;
+    IOMMUDevice  *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
+} IOMMUPciBus;
+
+typedef struct VirtIOIOMMU {
+    VirtIODevice parent_obj;
+    VirtQueue *req_vq;
+    VirtQueue *event_vq;
+    struct virtio_iommu_config config;
+    uint64_t features;
+    uint64_t acked_features;
+    GHashTable *as_by_busptr;
+    IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
+    PCIBus *primary_bus;
+    GTree *domains;
+    QemuMutex mutex;
+    GTree *endpoints;
+} VirtIOIOMMU;
+
+#endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 04/15] virtio-iommu: Decode the command payload
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (2 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions Eric Auger
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch adds the command payload decoding and
introduces the functions that will do the actual
command handling. Those functions are not yet implemented.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- make virtio_iommu_handle_* more compact and
  remove get_payload_size

v7 -> v8:
- handle new domain parameter in detach
- remove reserved checks

v5 -> v6:
- change map/unmap semantics (remove size)

v4 -> v5:
- adopt new v0.5 terminology

v3 -> v4:
- no flags field anymore in struct virtio_iommu_req_unmap
- test reserved on attach/detach, change trace proto
- rebase on v2.10.0.
---
 hw/virtio/trace-events   |  4 ++
 hw/virtio/virtio-iommu.c | 81 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index f7dac39213..c7276116e7 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -61,3 +61,7 @@ virtio_iommu_set_features(uint64_t features) "features accepted by the driver =0
 virtio_iommu_device_status(uint8_t status) "driver status = %d"
 virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_range=%d probe_size=0x%x"
 virtio_iommu_set_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_bits=%d probe_size=0x%x"
+virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
+virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
+virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
+virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index f239954396..658249c81e 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -33,29 +33,102 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static int virtio_iommu_attach(VirtIOIOMMU *s,
+                               struct virtio_iommu_req_attach *req)
+{
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint32_t ep_id = le32_to_cpu(req->endpoint);
+
+    trace_virtio_iommu_attach(domain_id, ep_id);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_detach(VirtIOIOMMU *s,
+                               struct virtio_iommu_req_detach *req)
+{
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint32_t ep_id = le32_to_cpu(req->endpoint);
+
+    trace_virtio_iommu_detach(domain_id, ep_id);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_map(VirtIOIOMMU *s,
+                            struct virtio_iommu_req_map *req)
+{
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint64_t phys_start = le64_to_cpu(req->phys_start);
+    uint64_t virt_start = le64_to_cpu(req->virt_start);
+    uint64_t virt_end = le64_to_cpu(req->virt_end);
+    uint32_t flags = le32_to_cpu(req->flags);
+
+    trace_virtio_iommu_map(domain_id, virt_start, virt_end, phys_start, flags);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_unmap(VirtIOIOMMU *s,
+                              struct virtio_iommu_req_unmap *req)
+{
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint64_t virt_start = le64_to_cpu(req->virt_start);
+    uint64_t virt_end = le64_to_cpu(req->virt_end);
+
+    trace_virtio_iommu_unmap(domain_id, virt_start, virt_end);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_iov_to_req(struct iovec *iov,
+                                   unsigned int iov_cnt,
+                                   void *req, size_t req_sz)
+{
+    size_t sz, payload_sz = req_sz - sizeof(struct virtio_iommu_req_tail);
+
+    sz = iov_to_buf(iov, iov_cnt, 0, req, payload_sz);
+    if (unlikely(sz != payload_sz)) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    return 0;
+}
+
 static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
                                       struct iovec *iov,
                                       unsigned int iov_cnt)
 {
-    return VIRTIO_IOMMU_S_UNSUPP;
+    struct virtio_iommu_req_attach req;
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req));
+
+    return ret ? ret : virtio_iommu_attach(s, &req);
 }
 static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
                                       struct iovec *iov,
                                       unsigned int iov_cnt)
 {
-    return VIRTIO_IOMMU_S_UNSUPP;
+    struct virtio_iommu_req_detach req;
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req));
+
+    return ret ? ret : virtio_iommu_detach(s, &req);
 }
 static int virtio_iommu_handle_map(VirtIOIOMMU *s,
                                    struct iovec *iov,
                                    unsigned int iov_cnt)
 {
-    return VIRTIO_IOMMU_S_UNSUPP;
+    struct virtio_iommu_req_map req;
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req));
+
+    return ret ? ret : virtio_iommu_map(s, &req);
 }
 static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
                                      struct iovec *iov,
                                      unsigned int iov_cnt)
 {
-    return VIRTIO_IOMMU_S_UNSUPP;
+    struct virtio_iommu_req_unmap req;
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req));
+
+    return ret ? ret : virtio_iommu_unmap(s, &req);
 }
 
 static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (3 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 04/15] virtio-iommu: Decode the command payload Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-16  4:00   ` Peter Xu
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch initializes the iommu memory regions so that
PCIe end point transactions get translated. The translation
function is not yet implemented though.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- remove pc/virt machine headers
- virtio_iommu_find_add_as: mr_index introduced in that patch
  and name properly freed

v6 -> v7:
- use primary_bus
- rebase on new translate proto featuring iommu_idx

v5 -> v6:
- include qapi/error.h
- fix g_hash_table_lookup key in virtio_iommu_find_add_as

v4 -> v5:
- use PCI bus handle as a key
- use get_primary_pci_bus() callback

v3 -> v4:
- add trace_virtio_iommu_init_iommu_mr

v2 -> v3:
- use IOMMUMemoryRegion
- iommu mr name built with BDF
- rename smmu_get_sid into virtio_iommu_get_sid and use PCI_BUILD_BDF
---
 hw/virtio/trace-events           |  2 +
 hw/virtio/virtio-iommu.c         | 92 ++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-iommu.h |  2 +
 3 files changed, 96 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index c7276116e7..b32169d56c 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -65,3 +65,5 @@ virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
+virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
+virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 658249c81e..1610e2f773 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -22,6 +22,8 @@
 #include "qemu-common.h"
 #include "hw/virtio/virtio.h"
 #include "sysemu/kvm.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "trace.h"
 
 #include "standard-headers/linux/virtio_ids.h"
@@ -33,6 +35,50 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
+{
+    return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
+}
+
+static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
+                                              int devfn)
+{
+    VirtIOIOMMU *s = opaque;
+    IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
+    static uint32_t mr_index;
+    IOMMUDevice *sdev;
+
+    if (!sbus) {
+        sbus = g_malloc0(sizeof(IOMMUPciBus) +
+                         sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
+        sbus->bus = bus;
+        g_hash_table_insert(s->as_by_busptr, bus, sbus);
+    }
+
+    sdev = sbus->pbdev[devfn];
+    if (!sdev) {
+        char *name = g_strdup_printf("%s-%d-%d",
+                                     TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+                                     mr_index++, devfn);
+        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
+
+        sdev->viommu = s;
+        sdev->bus = bus;
+        sdev->devfn = devfn;
+
+        trace_virtio_iommu_init_iommu_mr(name);
+
+        memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev->iommu_mr),
+                                 TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+                                 OBJECT(s), name,
+                                 UINT64_MAX);
+        address_space_init(&sdev->as,
+                           MEMORY_REGION(&sdev->iommu_mr), TYPE_VIRTIO_IOMMU);
+        g_free(name);
+    }
+    return &sdev->as;
+}
+
 static int virtio_iommu_attach(VirtIOIOMMU *s,
                                struct virtio_iommu_req_attach *req)
 {
@@ -192,6 +238,27 @@ out:
     }
 }
 
+static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
+                                            IOMMUAccessFlags flag,
+                                            int iommu_idx)
+{
+    IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    uint32_t sid;
+
+    IOMMUTLBEntry entry = {
+        .target_as = &address_space_memory,
+        .iova = addr,
+        .translated_addr = addr,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+
+    sid = virtio_iommu_get_sid(sdev);
+
+    trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+    return entry;
+}
+
 static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
@@ -266,6 +333,15 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
+
+    memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
+    s->as_by_busptr = g_hash_table_new(NULL, NULL);
+
+    if (s->primary_bus) {
+        pci_setup_iommu(s->primary_bus, virtio_iommu_find_add_as, s);
+    } else {
+        error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
+    }
 }
 
 static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
@@ -324,6 +400,14 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
     vdc->vmsd = &vmstate_virtio_iommu_device;
 }
 
+static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
+                                                  void *data)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+    imrc->translate = virtio_iommu_translate;
+}
+
 static const TypeInfo virtio_iommu_info = {
     .name = TYPE_VIRTIO_IOMMU,
     .parent = TYPE_VIRTIO_DEVICE,
@@ -332,9 +416,17 @@ static const TypeInfo virtio_iommu_info = {
     .class_init = virtio_iommu_class_init,
 };
 
+static const TypeInfo virtio_iommu_memory_region_info = {
+    .parent = TYPE_IOMMU_MEMORY_REGION,
+    .name = TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+    .class_init = virtio_iommu_memory_region_class_init,
+};
+
+
 static void virtio_register_types(void)
 {
     type_register_static(&virtio_iommu_info);
+    type_register_static(&virtio_iommu_memory_region_info);
 }
 
 type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 4d47b6abeb..f55f48d304 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -28,6 +28,8 @@
 #define VIRTIO_IOMMU(obj) \
         OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
 
+#define TYPE_VIRTIO_IOMMU_MEMORY_REGION "virtio-iommu-memory-region"
+
 #define IOMMU_PCI_BUS_MAX      256
 #define IOMMU_PCI_DEVFN_MAX    256
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (4 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-16  4:17   ` Peter Xu
  2019-11-04 18:31   ` Jean-Philippe Brucker
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command Eric Auger
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch introduce domain and endpoint internal
datatypes. Both are stored in RB trees. The domain
owns a list of endpoints attached to it.

Helpers to get/put end points and domains are introduced.
get() helpers will become static in subsequent patches.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Bharat Bhushan <bharat.bhushan@nxp.com>

---

v9 -> v10:
- added Bharat's R-b

v6 -> v7:
- on virtio_iommu_find_add_as the bus number computation may
  not be finalized yet so we cannot register the EPs at that time.
  Hence, let's remove the get_endpoint and also do not use the
  bus number for building the memory region name string (only
  used for debug though).

v4 -> v5:
- initialize as->endpoint_list

v3 -> v4:
- new separate patch
---
 hw/virtio/trace-events   |   4 ++
 hw/virtio/virtio-iommu.c | 121 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index b32169d56c..a373bdebb3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -67,3 +67,7 @@ virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uin
 virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
 virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
+virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
+virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
+virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
+virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1610e2f773..77dccecc0a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -31,15 +31,118 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
 
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+typedef struct viommu_domain {
+    uint32_t id;
+    GTree *mappings;
+    QLIST_HEAD(, viommu_endpoint) endpoint_list;
+} viommu_domain;
+
+typedef struct viommu_endpoint {
+    uint32_t id;
+    viommu_domain *domain;
+    QLIST_ENTRY(viommu_endpoint) next;
+    VirtIOIOMMU *viommu;
+} viommu_endpoint;
+
+typedef struct viommu_interval {
+    uint64_t low;
+    uint64_t high;
+} viommu_interval;
+
 static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
 {
     return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
 }
 
+static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+    viommu_interval *inta = (viommu_interval *)a;
+    viommu_interval *intb = (viommu_interval *)b;
+
+    if (inta->high <= intb->low) {
+        return -1;
+    } else if (intb->high <= inta->low) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
+{
+    QLIST_REMOVE(ep, next);
+    ep->domain = NULL;
+}
+
+viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
+viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
+{
+    viommu_endpoint *ep;
+
+    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(ep_id));
+    if (ep) {
+        return ep;
+    }
+    ep = g_malloc0(sizeof(*ep));
+    ep->id = ep_id;
+    ep->viommu = s;
+    trace_virtio_iommu_get_endpoint(ep_id);
+    g_tree_insert(s->endpoints, GUINT_TO_POINTER(ep_id), ep);
+    return ep;
+}
+
+static void virtio_iommu_put_endpoint(gpointer data)
+{
+    viommu_endpoint *ep = (viommu_endpoint *)data;
+
+    if (ep->domain) {
+        virtio_iommu_detach_endpoint_from_domain(ep);
+        g_tree_unref(ep->domain->mappings);
+    }
+
+    trace_virtio_iommu_put_endpoint(ep->id);
+    g_free(ep);
+}
+
+viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
+viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
+{
+    viommu_domain *domain;
+
+    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
+    if (domain) {
+        return domain;
+    }
+    domain = g_malloc0(sizeof(*domain));
+    domain->id = domain_id;
+    domain->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+                                   NULL, (GDestroyNotify)g_free,
+                                   (GDestroyNotify)g_free);
+    g_tree_insert(s->domains, GUINT_TO_POINTER(domain_id), domain);
+    QLIST_INIT(&domain->endpoint_list);
+    trace_virtio_iommu_get_domain(domain_id);
+    return domain;
+}
+
+static void virtio_iommu_put_domain(gpointer data)
+{
+    viommu_domain *domain = (viommu_domain *)data;
+    viommu_endpoint *iter, *tmp;
+
+    QLIST_FOREACH_SAFE(iter, &domain->endpoint_list, next, tmp) {
+        virtio_iommu_detach_endpoint_from_domain(iter);
+    }
+    g_tree_destroy(domain->mappings);
+    trace_virtio_iommu_put_domain(domain->id);
+    g_free(domain);
+}
+
 static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
                                               int devfn)
 {
@@ -308,6 +411,13 @@ static const VMStateDescription vmstate_virtio_iommu_device = {
     .unmigratable = 1,
 };
 
+static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+    uint ua = GPOINTER_TO_UINT(a);
+    uint ub = GPOINTER_TO_UINT(b);
+    return (ua > ub) - (ua < ub);
+}
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -334,6 +444,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
 
+    qemu_mutex_init(&s->mutex);
+
     memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
     s->as_by_busptr = g_hash_table_new(NULL, NULL);
 
@@ -342,11 +454,20 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     } else {
         error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
     }
+
+    s->domains = g_tree_new_full((GCompareDataFunc)int_cmp,
+                                 NULL, NULL, virtio_iommu_put_domain);
+    s->endpoints = g_tree_new_full((GCompareDataFunc)int_cmp,
+                                   NULL, NULL, virtio_iommu_put_endpoint);
 }
 
 static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+    g_tree_destroy(s->domains);
+    g_tree_destroy(s->endpoints);
 
     virtio_cleanup(vdev);
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (5 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-16  4:27   ` Peter Xu
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap Eric Auger
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch implements the endpoint attach/detach to/from
a domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
---
 hw/virtio/virtio-iommu.c | 40 ++++++++++++++++++++++++++++++++++------
 1 file changed, 34 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 77dccecc0a..5ea0930cc2 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -80,8 +80,8 @@ static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
     ep->domain = NULL;
 }
 
-viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
-viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
+static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
+                                                  uint32_t ep_id)
 {
     viommu_endpoint *ep;
 
@@ -110,8 +110,8 @@ static void virtio_iommu_put_endpoint(gpointer data)
     g_free(ep);
 }
 
-viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
-viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
+static viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s,
+                                              uint32_t domain_id)
 {
     viommu_domain *domain;
 
@@ -187,10 +187,27 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
 {
     uint32_t domain_id = le32_to_cpu(req->domain);
     uint32_t ep_id = le32_to_cpu(req->endpoint);
+    viommu_domain *domain;
+    viommu_endpoint *ep;
 
     trace_virtio_iommu_attach(domain_id, ep_id);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    ep = virtio_iommu_get_endpoint(s, ep_id);
+    if (ep->domain) {
+        /*
+         * the device is already attached to a domain,
+         * detach it first
+         */
+        virtio_iommu_detach_endpoint_from_domain(ep);
+    }
+
+    domain = virtio_iommu_get_domain(s, domain_id);
+    QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
+
+    ep->domain = domain;
+    g_tree_ref(domain->mappings);
+
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_detach(VirtIOIOMMU *s,
@@ -198,10 +215,21 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
 {
     uint32_t domain_id = le32_to_cpu(req->domain);
     uint32_t ep_id = le32_to_cpu(req->endpoint);
+    viommu_endpoint *ep;
 
     trace_virtio_iommu_detach(domain_id, ep_id);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(ep_id));
+    if (!ep) {
+        return VIRTIO_IOMMU_S_NOENT;
+    }
+
+    if (!ep->domain) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+
+    virtio_iommu_detach_endpoint_from_domain(ep);
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_map(VirtIOIOMMU *s,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (6 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-19  8:11   ` Peter Xu
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate Eric Auger
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch implements virtio_iommu_map/unmap.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v5 -> v6:
- use new v0.6 fields
- replace error_report by qemu_log_mask

v3 -> v4:
- implement unmap semantics as specified in v0.4
---
 hw/virtio/trace-events   |  3 ++
 hw/virtio/virtio-iommu.c | 94 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a373bdebb3..25a71b0505 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -71,3 +71,6 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
 virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
 virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
 virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
+virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 5ea0930cc2..4706b9da6e 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "qemu/iov.h"
 #include "qemu-common.h"
 #include "hw/virtio/virtio.h"
@@ -55,6 +56,13 @@ typedef struct viommu_interval {
     uint64_t high;
 } viommu_interval;
 
+typedef struct viommu_mapping {
+    uint64_t virt_addr;
+    uint64_t phys_addr;
+    uint64_t size;
+    uint32_t flags;
+} viommu_mapping;
+
 static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
 {
     return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
@@ -240,10 +248,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
     uint64_t virt_start = le64_to_cpu(req->virt_start);
     uint64_t virt_end = le64_to_cpu(req->virt_end);
     uint32_t flags = le32_to_cpu(req->flags);
+    viommu_domain *domain;
+    viommu_interval *interval;
+    viommu_mapping *mapping;
+
+    interval = g_malloc0(sizeof(*interval));
+
+    interval->low = virt_start;
+    interval->high = virt_end;
+
+    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
+    if (!domain) {
+        return VIRTIO_IOMMU_S_NOENT;
+    }
+
+    mapping = g_tree_lookup(domain->mappings, (gpointer)interval);
+    if (mapping) {
+        g_free(interval);
+        return VIRTIO_IOMMU_S_INVAL;
+    }
 
     trace_virtio_iommu_map(domain_id, virt_start, virt_end, phys_start, flags);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    mapping = g_malloc0(sizeof(*mapping));
+    mapping->virt_addr = virt_start;
+    mapping->phys_addr = phys_start;
+    mapping->size = virt_end - virt_start + 1;
+    mapping->flags = flags;
+
+    g_tree_insert(domain->mappings, interval, mapping);
+
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_unmap(VirtIOIOMMU *s,
@@ -252,10 +287,65 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     uint32_t domain_id = le32_to_cpu(req->domain);
     uint64_t virt_start = le64_to_cpu(req->virt_start);
     uint64_t virt_end = le64_to_cpu(req->virt_end);
+    uint64_t size = virt_end - virt_start + 1;
+    viommu_mapping *mapping;
+    viommu_interval interval;
+    viommu_domain *domain;
 
     trace_virtio_iommu_unmap(domain_id, virt_start, virt_end);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
+    if (!domain) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: no domain\n", __func__);
+        return VIRTIO_IOMMU_S_NOENT;
+    }
+    interval.low = virt_start;
+    interval.high = virt_end;
+
+    mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
+
+    while (mapping) {
+        viommu_interval current;
+        uint64_t low  = mapping->virt_addr;
+        uint64_t high = mapping->virt_addr + mapping->size - 1;
+
+        current.low = low;
+        current.high = high;
+
+        if (low == interval.low && size >= mapping->size) {
+            g_tree_remove(domain->mappings, (gpointer)(&current));
+            interval.low = high + 1;
+            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
+                interval.low, interval.high);
+        } else if (high == interval.high && size >= mapping->size) {
+            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
+                interval.low, interval.high);
+            g_tree_remove(domain->mappings, (gpointer)(&current));
+            interval.high = low - 1;
+        } else if (low > interval.low && high < interval.high) {
+            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
+            g_tree_remove(domain->mappings, (gpointer)(&current));
+        } else {
+            break;
+        }
+        if (interval.low >= interval.high) {
+            return VIRTIO_IOMMU_S_OK;
+        } else {
+            mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
+        }
+    }
+
+    if (mapping) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
+                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
+                     __func__, interval.low, size,
+                     mapping->virt_addr, mapping->size);
+    } else {
+        return VIRTIO_IOMMU_S_OK;
+    }
+
+    return VIRTIO_IOMMU_S_INVAL;
 }
 
 static int virtio_iommu_iov_to_req(struct iovec *iov,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (7 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-19  8:24   ` Peter Xu
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request Eric Auger
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch implements the translate callback

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v6 -> v7:
- implemented bypass-mode

v5 -> v6:
- replace error_report by qemu_log_mask

v4 -> v5:
- check the device domain is not NULL
- s/printf/error_report
- set flags to IOMMU_NONE in case of all translation faults
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 58 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 25a71b0505..8257065159 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -74,3 +74,4 @@ virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
 virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
 virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
 virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 4706b9da6e..a8de583f9a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -464,19 +464,75 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                                             int iommu_idx)
 {
     IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    VirtIOIOMMU *s = sdev->viommu;
     uint32_t sid;
+    viommu_endpoint *ep;
+    viommu_mapping *mapping;
+    viommu_interval interval;
+    bool bypass_allowed;
+
+    interval.low = addr;
+    interval.high = addr + 1;
 
     IOMMUTLBEntry entry = {
         .target_as = &address_space_memory,
         .iova = addr,
         .translated_addr = addr,
-        .addr_mask = ~(hwaddr)0,
+        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
         .perm = IOMMU_NONE,
     };
 
+    bypass_allowed = virtio_has_feature(s->acked_features,
+                                        VIRTIO_IOMMU_F_BYPASS);
+
     sid = virtio_iommu_get_sid(sdev);
 
     trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+    qemu_mutex_lock(&s->mutex);
+
+    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
+    if (!ep) {
+        if (!bypass_allowed) {
+            error_report("%s sid=%d is not known!!", __func__, sid);
+        } else {
+            entry.perm = flag;
+        }
+        goto unlock;
+    }
+
+    if (!ep->domain) {
+        if (!bypass_allowed) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s %02x:%02x.%01x not attached to any domain\n",
+                          __func__, PCI_BUS_NUM(sid),
+                          PCI_SLOT(sid), PCI_FUNC(sid));
+        } else {
+            entry.perm = flag;
+        }
+        goto unlock;
+    }
+
+    mapping = g_tree_lookup(ep->domain->mappings, (gpointer)(&interval));
+    if (!mapping) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s no mapping for 0x%"PRIx64" for sid=%d\n",
+                      __func__, addr, sid);
+        goto unlock;
+    }
+
+    if (((flag & IOMMU_RO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
+        ((flag & IOMMU_WO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
+                      addr, flag, mapping->flags);
+        goto unlock;
+    }
+    entry.translated_addr = addr - mapping->virt_addr + mapping->phys_addr;
+    entry.perm = flag;
+    trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
+
+unlock:
+    qemu_mutex_unlock(&s->mutex);
     return entry;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (8 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-19 12:08   ` Peter Xu
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant Eric Auger
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch implements the PROBE request. At the moment,
no reserved regions are returned as none are registered
per device. Only a NONE property is returned.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v8 -> v9:
- fix filling of properties (changes induced by v0.7 -> v0.8 spec
  evolution)
- return VIRTIO_IOMMU_S_INVAL in case of error

v7 -> v8:
- adapt to removal of value filed in virtio_iommu_probe_property

v6 -> v7:
- adapt to the change in virtio_iommu_probe_resv_mem fields
- use get_endpoint() instead of directly checking the EP
  was registered.

v4 -> v5:
- initialize bufstate.error to false
- add cpu_to_le64(size)
---
 hw/virtio/trace-events   |   2 +
 hw/virtio/virtio-iommu.c | 168 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 168 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 8257065159..2e557dffb4 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -75,3 +75,5 @@ virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low,
 virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
 virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
 virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
+virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t start, uint64_t end, uint32_t flags, size_t filled) "dev= %d, subtype=%d start=0x%"PRIx64" end=0x%"PRIx64" flags=%d filled=0x%lx"
+virtio_iommu_fill_none_property(uint32_t devid) "devid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index a8de583f9a..66be9a4627 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -37,6 +37,10 @@
 
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
+#define VIOMMU_PROBE_SIZE 512
+
+#define SUPPORTED_PROBE_PROPERTIES (\
+    1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
 
 typedef struct viommu_domain {
     uint32_t id;
@@ -49,6 +53,7 @@ typedef struct viommu_endpoint {
     viommu_domain *domain;
     QLIST_ENTRY(viommu_endpoint) next;
     VirtIOIOMMU *viommu;
+    GTree *reserved_regions;
 } viommu_endpoint;
 
 typedef struct viommu_interval {
@@ -63,6 +68,13 @@ typedef struct viommu_mapping {
     uint32_t flags;
 } viommu_mapping;
 
+typedef struct viommu_property_buffer {
+    viommu_endpoint *endpoint;
+    size_t filled;
+    uint8_t *start;
+    bool error;
+} viommu_property_buffer;
+
 static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
 {
     return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
@@ -102,6 +114,9 @@ static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
     ep->viommu = s;
     trace_virtio_iommu_get_endpoint(ep_id);
     g_tree_insert(s->endpoints, GUINT_TO_POINTER(ep_id), ep);
+    ep->reserved_regions = g_tree_new_full((GCompareDataFunc)interval_cmp,
+                                            NULL, (GDestroyNotify)g_free,
+                                            (GDestroyNotify)g_free);
     return ep;
 }
 
@@ -115,6 +130,7 @@ static void virtio_iommu_put_endpoint(gpointer data)
     }
 
     trace_virtio_iommu_put_endpoint(ep->id);
+    g_tree_destroy(ep->reserved_regions);
     g_free(ep);
 }
 
@@ -348,6 +364,125 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     return VIRTIO_IOMMU_S_INVAL;
 }
 
+/**
+ * virtio_iommu_fill_resv_mem_prop - Add a RESV_MEM probe
+ * property into the probe request buffer
+ *
+ * @key: interval handle
+ * @value: handle to the reserved memory region
+ * @data: handle to the probe request buffer state
+ */
+static gboolean virtio_iommu_fill_resv_mem_prop(gpointer key,
+                                                gpointer value,
+                                                gpointer data)
+{
+    struct virtio_iommu_probe_resv_mem *resv =
+        (struct virtio_iommu_probe_resv_mem *)value;
+    struct virtio_iommu_probe_resv_mem *buf_prop;
+    viommu_property_buffer *bufstate = (viommu_property_buffer *)data;
+    size_t prop_size = sizeof(*resv);
+
+    if (bufstate->filled + prop_size >= VIOMMU_PROBE_SIZE) {
+        bufstate->error = true;
+        /* get the traversal stopped by returning true */
+        return true;
+    }
+    buf_prop = (struct virtio_iommu_probe_resv_mem *)
+                (bufstate->start + bufstate->filled);
+    *buf_prop = *resv;
+
+    bufstate->filled += prop_size;
+    trace_virtio_iommu_fill_resv_property(bufstate->endpoint->id,
+                                          resv->subtype, resv->start,
+                                          resv->end, resv->subtype,
+                                          bufstate->filled);
+    return false;
+}
+
+static int virtio_iommu_fill_none_prop(viommu_property_buffer *bufstate)
+{
+    struct virtio_iommu_probe_property *prop;
+
+    prop = (struct virtio_iommu_probe_property *)
+                (bufstate->start + bufstate->filled);
+    prop->type = 0;
+    prop->length = 0;
+    bufstate->filled += sizeof(*prop);
+    trace_virtio_iommu_fill_none_property(bufstate->endpoint->id);
+    return 0;
+}
+
+/* Fill the properties[] buffer with properties of type @type */
+static int virtio_iommu_fill_property(int type,
+                                      viommu_property_buffer *bufstate)
+{
+    int ret = -ENOSPC;
+
+    if (bufstate->filled + sizeof(struct virtio_iommu_probe_property)
+            >= VIOMMU_PROBE_SIZE) {
+        /* no space left for the header */
+        bufstate->error = true;
+        goto out;
+    }
+
+    switch (type) {
+    case VIRTIO_IOMMU_PROBE_T_NONE:
+        ret = virtio_iommu_fill_none_prop(bufstate);
+        break;
+    case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+    {
+        viommu_endpoint *ep = bufstate->endpoint;
+
+        g_tree_foreach(ep->reserved_regions,
+                       virtio_iommu_fill_resv_mem_prop,
+                       bufstate);
+        if (!bufstate->error) {
+            ret = 0;
+        }
+        break;
+    }
+    default:
+        ret = -ENOENT;
+        break;
+    }
+out:
+    if (ret) {
+        error_report("%s property of type=%d could not be filled (%d),"
+                     " remaining size = 0x%lx",
+                     __func__, type, ret, bufstate->filled);
+    }
+    return ret;
+}
+
+/**
+ * virtio_iommu_probe - Fill the probe request buffer with all
+ * the properties the device is able to return and add a NONE
+ * property at the end. @buf points to properties[].
+ */
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+                              struct virtio_iommu_req_probe *req,
+                              uint8_t *buf)
+{
+    uint32_t ep_id = le32_to_cpu(req->endpoint);
+    viommu_endpoint *ep = virtio_iommu_get_endpoint(s, ep_id);
+    int16_t prop_types = SUPPORTED_PROBE_PROPERTIES, type;
+    viommu_property_buffer bufstate = {.start = buf, .filled = 0,
+                                       .error = false, .endpoint = ep};
+
+    while ((type = ctz32(prop_types)) != 32) {
+        if (virtio_iommu_fill_property(type, &bufstate)) {
+            goto failure;
+        }
+        prop_types &= ~(1 << type);
+    }
+    if (virtio_iommu_fill_property(VIRTIO_IOMMU_PROBE_T_NONE, &bufstate)) {
+        goto failure;
+    }
+    return VIRTIO_IOMMU_S_OK;
+failure:
+    return VIRTIO_IOMMU_S_INVAL;
+}
+
 static int virtio_iommu_iov_to_req(struct iovec *iov,
                                    unsigned int iov_cnt,
                                    void *req, size_t req_sz)
@@ -398,6 +533,17 @@ static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
     return ret ? ret : virtio_iommu_unmap(s, &req);
 }
 
+static int virtio_iommu_handle_probe(VirtIOIOMMU *s,
+                                     struct iovec *iov,
+                                     unsigned int iov_cnt,
+                                     uint8_t *buf)
+{
+    struct virtio_iommu_req_probe req;
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req));
+
+    return ret ? ret : virtio_iommu_probe(s, &req, buf);
+}
+
 static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
@@ -443,17 +589,33 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
         case VIRTIO_IOMMU_T_UNMAP:
             tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
             break;
+        case VIRTIO_IOMMU_T_PROBE:
+        {
+            struct virtio_iommu_req_tail *ptail;
+            uint8_t *buf = g_malloc0(s->config.probe_size + sizeof(tail));
+
+            ptail = (struct virtio_iommu_req_tail *)
+                        (buf + s->config.probe_size);
+            ptail->status = virtio_iommu_handle_probe(s, iov, iov_cnt, buf);
+
+            sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                              buf, s->config.probe_size + sizeof(tail));
+            g_free(buf);
+            assert(sz == s->config.probe_size + sizeof(tail));
+            goto push;
+        }
         default:
             tail.status = VIRTIO_IOMMU_S_UNSUPP;
         }
-        qemu_mutex_unlock(&s->mutex);
 
 out:
         sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
                           &tail, sizeof(tail));
         assert(sz == sizeof(tail));
 
-        virtqueue_push(vq, elem, sizeof(tail));
+push:
+        qemu_mutex_unlock(&s->mutex);
+        virtqueue_push(vq, elem, sz);
         virtio_notify(vdev, vq);
         g_free(elem);
     }
@@ -608,6 +770,7 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     s->config.input_range.end = -1UL;
     s->config.domain_range.start = 0;
     s->config.domain_range.end = 32;
+    s->config.probe_size = VIOMMU_PROBE_SIZE;
 
     virtio_add_feature(&s->features, VIRTIO_RING_F_EVENT_IDX);
     virtio_add_feature(&s->features, VIRTIO_RING_F_INDIRECT_DESC);
@@ -617,6 +780,7 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_PROBE);
 
     qemu_mutex_init(&s->mutex);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (9 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-07-30 19:38   ` Michael S. Tsirkin
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 12/15] virtio-iommu: Implement fault reporting Eric Auger
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

We introduce a new msi_bypass field which indicates whether
the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed
as a reserved region. By default the field is set to true at
instantiation time. Later on we will introduce a property at
virtio pci proxy level to turn it off.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v8 -> v9:
- pass IOAPIC_RANGE_END to virtio_iommu_register_resv_region
- take into account the change in the struct virtio_iommu_probe_resv_mem
  definition
- We just introduce the field here. A property will be introduced later on
  at pci proxy level.
---
 hw/virtio/virtio-iommu.c         | 36 ++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-iommu.h |  1 +
 2 files changed, 37 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 66be9a4627..74038288b0 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -39,6 +39,9 @@
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 #define VIOMMU_PROBE_SIZE 512
 
+#define IOAPIC_RANGE_START      (0xfee00000)
+#define IOAPIC_RANGE_END        (0xfeefffff)
+
 #define SUPPORTED_PROBE_PROPERTIES (\
     1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
 
@@ -100,6 +103,30 @@ static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
     ep->domain = NULL;
 }
 
+static void virtio_iommu_register_resv_region(viommu_endpoint *ep,
+                                              uint8_t subtype,
+                                              uint64_t start, uint64_t end)
+{
+    viommu_interval *interval;
+    struct virtio_iommu_probe_resv_mem *resv_reg_prop;
+    size_t prop_size = sizeof(struct virtio_iommu_probe_resv_mem);
+    size_t value_size = prop_size -
+                sizeof(struct virtio_iommu_probe_property);
+
+    interval = g_malloc0(sizeof(*interval));
+    interval->low = start;
+    interval->high = end;
+
+    resv_reg_prop = g_malloc0(prop_size);
+    resv_reg_prop->head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
+    resv_reg_prop->head.length = cpu_to_le64(value_size);
+    resv_reg_prop->subtype = cpu_to_le64(subtype);
+    resv_reg_prop->start = cpu_to_le64(start);
+    resv_reg_prop->end = cpu_to_le64(end);
+
+    g_tree_insert(ep->reserved_regions, interval, resv_reg_prop);
+}
+
 static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
                                                   uint32_t ep_id)
 {
@@ -117,6 +144,12 @@ static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
     ep->reserved_regions = g_tree_new_full((GCompareDataFunc)interval_cmp,
                                             NULL, (GDestroyNotify)g_free,
                                             (GDestroyNotify)g_free);
+    if (s->msi_bypass) {
+        virtio_iommu_register_resv_region(ep, VIRTIO_IOMMU_RESV_MEM_T_MSI,
+                                          IOAPIC_RANGE_START,
+                                          IOAPIC_RANGE_END);
+    }
+
     return ep;
 }
 
@@ -822,6 +855,9 @@ static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
 
 static void virtio_iommu_instance_init(Object *obj)
 {
+    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
+
+    s->msi_bypass = true;
 }
 
 static const VMStateDescription vmstate_virtio_iommu = {
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index f55f48d304..56c8b4e57f 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -59,6 +59,7 @@ typedef struct VirtIOIOMMU {
     GTree *domains;
     QemuMutex mutex;
     GTree *endpoints;
+    bool msi_bypass;
 } VirtIOIOMMU;
 
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 12/15] virtio-iommu: Implement fault reporting
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (10 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process Eric Auger
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

The event queue allows to report asynchronous errors.
The translate function now injects faults when relevant.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 67 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 2e557dffb4..046290a971 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -77,3 +77,4 @@ virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRI
 virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
 virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t start, uint64_t end, uint32_t flags, size_t filled) "dev= %d, subtype=%d start=0x%"PRIx64" end=0x%"PRIx64" flags=%d filled=0x%lx"
 virtio_iommu_fill_none_property(uint32_t devid) "devid=%d"
+virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t endpoint, uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address =0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 74038288b0..8e54a17227 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -654,17 +654,63 @@ push:
     }
 }
 
+static void virtio_iommu_report_fault(VirtIOIOMMU *viommu, uint8_t reason,
+                                      uint32_t flags, uint32_t endpoint,
+                                      uint64_t address)
+{
+    VirtIODevice *vdev = &viommu->parent_obj;
+    VirtQueue *vq = viommu->event_vq;
+    struct virtio_iommu_fault fault;
+    VirtQueueElement *elem;
+    size_t sz;
+
+    memset(&fault, 0, sizeof(fault));
+    fault.reason = reason;
+    fault.flags = flags;
+    fault.endpoint = endpoint;
+    fault.address = address;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+
+        if (!elem) {
+            virtio_error(vdev,
+                         "no buffer available in event queue to report event");
+            return;
+        }
+
+        if (iov_size(elem->in_sg, elem->in_num) < sizeof(fault)) {
+            virtio_error(vdev, "error buffer of wrong size");
+            virtqueue_detach_element(vq, elem, 0);
+            g_free(elem);
+            continue;
+        }
+        break;
+    }
+    /* we have a buffer to fill in */
+    sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                      &fault, sizeof(fault));
+    assert(sz == sizeof(fault));
+
+    trace_virtio_iommu_report_fault(reason, flags, endpoint, address);
+    virtqueue_push(vq, elem, sz);
+    virtio_notify(vdev, vq);
+    g_free(elem);
+
+}
+
 static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                                             IOMMUAccessFlags flag,
                                             int iommu_idx)
 {
     IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
     VirtIOIOMMU *s = sdev->viommu;
-    uint32_t sid;
+    uint32_t sid, flags;
     viommu_endpoint *ep;
     viommu_mapping *mapping;
     viommu_interval interval;
     bool bypass_allowed;
+    bool read_fault, write_fault;
 
     interval.low = addr;
     interval.high = addr + 1;
@@ -689,6 +735,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     if (!ep) {
         if (!bypass_allowed) {
             error_report("%s sid=%d is not known!!", __func__, sid);
+            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_UNKNOWN,
+                                      0, sid, 0);
         } else {
             entry.perm = flag;
         }
@@ -701,6 +749,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                           "%s %02x:%02x.%01x not attached to any domain\n",
                           __func__, PCI_BUS_NUM(sid),
                           PCI_SLOT(sid), PCI_FUNC(sid));
+            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_DOMAIN,
+                                      0, sid, 0);
         } else {
             entry.perm = flag;
         }
@@ -712,14 +762,25 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         qemu_log_mask(LOG_GUEST_ERROR,
                       "%s no mapping for 0x%"PRIx64" for sid=%d\n",
                       __func__, addr, sid);
+        virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
+                                  0, sid, addr);
         goto unlock;
     }
 
-    if (((flag & IOMMU_RO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
-        ((flag & IOMMU_WO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
+    read_fault = (flag & IOMMU_RO) &&
+                    !(mapping->flags & VIRTIO_IOMMU_MAP_F_READ);
+    write_fault = (flag & IOMMU_WO) &&
+                    !(mapping->flags & VIRTIO_IOMMU_MAP_F_WRITE);
+
+    flags = read_fault ? VIRTIO_IOMMU_FAULT_F_READ : 0;
+    flags |= write_fault ? VIRTIO_IOMMU_FAULT_F_WRITE : 0;
+    if (flags) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
                       addr, flag, mapping->flags);
+        flags |= VIRTIO_IOMMU_FAULT_F_ADDRESS;
+        virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
+                                  flags, sid, addr);
         goto unlock;
     }
     entry.translated_addr = addr - mapping->virt_addr + mapping->phys_addr;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (11 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 12/15] virtio-iommu: Implement fault reporting Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-08-19 12:44   ` Peter Xu
  2019-09-01  6:38   ` Michael S. Tsirkin
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 15/15] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger
  14 siblings, 2 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

When translating an address we need to check if it belongs to
a reserved virtual address range. If it does, there are 2 cases:

- it belongs to a RESERVED region: the guest should neither use
  this address in a MAP not instruct the end-point to DMA on
  them. We report an error

- It belongs to an MSI region: we bypass the translation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- in case of MSI region, we immediatly return
---
 hw/virtio/virtio-iommu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 8e54a17227..20d92b7ab0 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -711,6 +711,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     viommu_interval interval;
     bool bypass_allowed;
     bool read_fault, write_fault;
+    struct virtio_iommu_probe_resv_mem *reg;
 
     interval.low = addr;
     interval.high = addr + 1;
@@ -743,6 +744,21 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto unlock;
     }
 
+    reg = g_tree_lookup(ep->reserved_regions, (gpointer)(&interval));
+    if (reg) {
+        switch (reg->subtype) {
+        case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+            entry.perm = flag;
+            return entry;
+        case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+        default:
+            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
+                                      0, sid, addr);
+            break;
+        }
+        goto unlock;
+    }
+
     if (!ep->domain) {
         if (!bypass_allowed) {
             qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (12 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  2019-07-30 19:35   ` Michael S. Tsirkin
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 15/15] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger
  14 siblings, 1 reply; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

This patch adds virtio-iommu-pci, which is the pci proxy for
the virtio-iommu device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v8 -> v9:
- add the msi-bypass property
- create virtio-iommu-pci.c
---
 hw/virtio/Makefile.objs          |  1 +
 hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h             |  1 +
 include/hw/virtio/virtio-iommu.h |  1 +
 qdev-monitor.c                   |  1 +
 5 files changed, 92 insertions(+)
 create mode 100644 hw/virtio/virtio-iommu-pci.c

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index f42e4dd94f..80ca719f1c 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
 obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
 obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
 obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
 obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
new file mode 100644
index 0000000000..f9977096bd
--- /dev/null
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -0,0 +1,88 @@
+/*
+ * Virtio IOMMU PCI Bindings
+ *
+ * Copyright (c) 2019 Red Hat, Inc.
+ * Written by Eric Auger
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 or
+ *  (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+
+#include "virtio-pci.h"
+#include "hw/virtio/virtio-iommu.h"
+
+typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
+
+/*
+ * virtio-iommu-pci: This extends VirtioPCIProxy.
+ *
+ */
+#define VIRTIO_IOMMU_PCI(obj) \
+        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
+
+struct VirtIOIOMMUPCI {
+    VirtIOPCIProxy parent_obj;
+    VirtIOIOMMU vdev;
+};
+
+static Property virtio_iommu_pci_properties[] = {
+    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
+    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
+    DeviceState *vdev = DEVICE(&dev->vdev);
+
+    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
+    object_property_set_link(OBJECT(dev),
+                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
+                             "primary-bus", errp);
+    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
+}
+
+static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+    k->realize = virtio_iommu_pci_realize;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->props = virtio_iommu_pci_properties;
+    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
+    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+    pcidev_k->class_id = PCI_CLASS_OTHERS;
+}
+
+static void virtio_iommu_pci_instance_init(Object *obj)
+{
+    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
+
+    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+                                TYPE_VIRTIO_IOMMU);
+}
+
+static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
+    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
+    .generic_name          = "virtio-iommu-pci",
+    .transitional_name     = "virtio-iommu-pci-transitional",
+    .non_transitional_name = "virtio-iommu-pci-non-transitional",
+    .instance_size = sizeof(VirtIOIOMMUPCI),
+    .instance_init = virtio_iommu_pci_instance_init,
+    .class_init    = virtio_iommu_pci_class_init,
+};
+
+static void virtio_iommu_pci_register(void)
+{
+    virtio_pci_types_register(&virtio_iommu_pci_info);
+}
+
+type_init(virtio_iommu_pci_register)
+
+
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index aaf1b9f70d..492ea7e68d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -86,6 +86,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
 #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
 #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
+#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
 
 #define PCI_VENDOR_ID_REDHAT             0x1b36
 #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 56c8b4e57f..893ac65c0b 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -25,6 +25,7 @@
 #include "hw/pci/pci.h"
 
 #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
+#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
 #define VIRTIO_IOMMU(obj) \
         OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
 
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 58222c2211..74cf090c61 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
     { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
     { "virtio-input-host-pci", "virtio-input-host",
             QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
+    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
     { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
     { "virtio-keyboard-pci", "virtio-keyboard",
             QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH for-4.2 v10 15/15] hw/arm/virt: Add the virtio-iommu device tree mappings
  2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
                   ` (13 preceding siblings ...)
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
@ 2019-07-30 17:21 ` Eric Auger
  14 siblings, 0 replies; 55+ messages in thread
From: Eric Auger @ 2019-07-30 17:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	peter.maydell, alex.williamson, jean-philippe, kevin.tian
  Cc: tn, bharat.bhushan, peterx

Adds the "virtio,pci-iommu" node in the host bridge node and
the RID mapping, excluding the IOMMU RID.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v8 -> v9:
- disable msi-bypass property
- addition of the subnode is handled is the hotplug handler
  and IOMMU RID is notimposed anymore

v6 -> v7:
- align to the smmu instantiation code

v4 -> v5:
- VirtMachineClass no_iommu added in this patch
- Use object_resolve_path_type
---
 hw/arm/virt.c         | 54 +++++++++++++++++++++++++++++++++++++------
 include/hw/arm/virt.h |  2 ++
 2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d9496c9363..8f6bcba99e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -32,6 +32,7 @@
 #include "qemu-common.h"
 #include "qemu/units.h"
 #include "qemu/option.h"
+#include "monitor/qdev.h"
 #include "qapi/error.h"
 #include "hw/sysbus.h"
 #include "hw/arm/boot.h"
@@ -52,6 +53,7 @@
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "hw/pci-host/gpex.h"
+#include "hw/virtio/virtio-pci.h"
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/platform-bus.h"
 #include "hw/arm/fdt.h"
@@ -64,6 +66,7 @@
 #include "hw/arm/smmuv3.h"
 #include "hw/acpi/acpi.h"
 #include "target/arm/internals.h"
+#include "hw/virtio/virtio-iommu.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1147,6 +1150,30 @@ static void create_smmu(const VirtMachineState *vms, qemu_irq *pic,
     g_free(node);
 }
 
+static void create_virtio_iommu(VirtMachineState *vms, Error **errp)
+{
+    const char compat[] = "virtio,pci-iommu";
+    uint16_t bdf = vms->virtio_iommu_bdf;
+    char *node;
+
+    vms->iommu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
+
+    node = g_strdup_printf("%s/virtio_iommu@%d", vms->pciehb_nodename, bdf);
+    qemu_fdt_add_subnode(vms->fdt, node);
+    qemu_fdt_setprop(vms->fdt, node, "compatible", compat, sizeof(compat));
+    qemu_fdt_setprop_sized_cells(vms->fdt, node, "reg",
+                                 1, bdf << 8, 1, 0, 1, 0,
+                                 1, 0, 1, 0);
+
+    qemu_fdt_setprop_cell(vms->fdt, node, "#iommu-cells", 1);
+    qemu_fdt_setprop_cell(vms->fdt, node, "phandle", vms->iommu_phandle);
+    g_free(node);
+
+    qemu_fdt_setprop_cells(vms->fdt, vms->pciehb_nodename, "iommu-map",
+                           0x0, vms->iommu_phandle, 0x0, bdf,
+                           bdf + 1, vms->iommu_phandle, bdf + 1, 0xffff - bdf);
+}
+
 static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
 {
     hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
@@ -1224,7 +1251,7 @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
         }
     }
 
-    nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+    nodename = vms->pciehb_nodename = g_strdup_printf("/pcie@%" PRIx64, base);
     qemu_fdt_add_subnode(vms->fdt, nodename);
     qemu_fdt_setprop_string(vms->fdt, nodename,
                             "compatible", "pci-host-ecam-generic");
@@ -1267,13 +1294,17 @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
     if (vms->iommu) {
         vms->iommu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
 
-        create_smmu(vms, pic, pci->bus);
+        switch (vms->iommu) {
+        case VIRT_IOMMU_SMMUV3:
+            create_smmu(vms, pic, pci->bus);
+            qemu_fdt_setprop_cells(vms->fdt, nodename, "iommu-map",
+                                   0x0, vms->iommu_phandle, 0x0, 0x10000);
+            break;
+        default:
+            g_assert_not_reached();
+        }
 
-        qemu_fdt_setprop_cells(vms->fdt, nodename, "iommu-map",
-                               0x0, vms->iommu_phandle, 0x0, 0x10000);
     }
-
-    g_free(nodename);
 }
 
 static void create_platform_bus(VirtMachineState *vms, qemu_irq *pic)
@@ -1882,12 +1913,21 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                      SYS_BUS_DEVICE(dev));
         }
     }
+    if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
+        PCIDevice *pdev = PCI_DEVICE(dev);
+
+        vms->iommu = VIRT_IOMMU_VIRTIO;
+        vms->virtio_iommu_bdf = pci_get_bdf(pdev);
+        object_property_set_bool(OBJECT(dev), false, "msi-bypass", errp);
+        create_virtio_iommu(vms, errp);
+    }
 }
 
 static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
                                                         DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
         return HOTPLUG_HANDLER(machine);
     }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a72094204e..abdee94f3a 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -121,8 +121,10 @@ typedef struct {
     bool virt;
     int32_t gic_version;
     VirtIOMMUType iommu;
+    uint16_t virtio_iommu_bdf;
     struct arm_boot_info bootinfo;
     MemMapEntry *memmap;
+    char *pciehb_nodename;
     const int *irqmap;
     int smp_cpus;
     void *fdt;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
@ 2019-07-30 19:35   ` Michael S. Tsirkin
  2019-08-01 12:15     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 19:35 UTC (permalink / raw)
  To: Eric Auger
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
> This patch adds virtio-iommu-pci, which is the pci proxy for
> the virtio-iommu device.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

This part I'm not sure we should merge just yet.  The reason being I
think we should limit it to mmio where DT can be used to describe iommu
topology. For PCI I don't see why we shouldn't always expose this
in the config space, and I think it's preferable not to
need to support a mix of DT,ACPI and PCI as options.

> ---
> 
> v8 -> v9:
> - add the msi-bypass property
> - create virtio-iommu-pci.c
> ---
>  hw/virtio/Makefile.objs          |  1 +
>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
>  include/hw/pci/pci.h             |  1 +
>  include/hw/virtio/virtio-iommu.h |  1 +
>  qdev-monitor.c                   |  1 +
>  5 files changed, 92 insertions(+)
>  create mode 100644 hw/virtio/virtio-iommu-pci.c
> 
> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> index f42e4dd94f..80ca719f1c 100644
> --- a/hw/virtio/Makefile.objs
> +++ b/hw/virtio/Makefile.objs
> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> new file mode 100644
> index 0000000000..f9977096bd
> --- /dev/null
> +++ b/hw/virtio/virtio-iommu-pci.c
> @@ -0,0 +1,88 @@
> +/*
> + * Virtio IOMMU PCI Bindings
> + *
> + * Copyright (c) 2019 Red Hat, Inc.
> + * Written by Eric Auger
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License version 2 or
> + *  (at your option) any later version.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "virtio-pci.h"
> +#include "hw/virtio/virtio-iommu.h"
> +
> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
> +
> +/*
> + * virtio-iommu-pci: This extends VirtioPCIProxy.
> + *
> + */
> +#define VIRTIO_IOMMU_PCI(obj) \
> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
> +
> +struct VirtIOIOMMUPCI {
> +    VirtIOPCIProxy parent_obj;
> +    VirtIOIOMMU vdev;
> +};
> +
> +static Property virtio_iommu_pci_properties[] = {
> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> +{
> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
> +    DeviceState *vdev = DEVICE(&dev->vdev);
> +
> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> +    object_property_set_link(OBJECT(dev),
> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
> +                             "primary-bus", errp);
> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> +}
> +
> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> +    k->realize = virtio_iommu_pci_realize;
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    dc->props = virtio_iommu_pci_properties;
> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> +}
> +
> +static void virtio_iommu_pci_instance_init(Object *obj)
> +{
> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
> +
> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> +                                TYPE_VIRTIO_IOMMU);
> +}
> +
> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
> +    .generic_name          = "virtio-iommu-pci",
> +    .transitional_name     = "virtio-iommu-pci-transitional",
> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
> +    .instance_size = sizeof(VirtIOIOMMUPCI),
> +    .instance_init = virtio_iommu_pci_instance_init,
> +    .class_init    = virtio_iommu_pci_class_init,
> +};
> +
> +static void virtio_iommu_pci_register(void)
> +{
> +    virtio_pci_types_register(&virtio_iommu_pci_info);
> +}
> +
> +type_init(virtio_iommu_pci_register)
> +
> +
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index aaf1b9f70d..492ea7e68d 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -86,6 +86,7 @@ extern bool pci_available;
>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
>  
>  #define PCI_VENDOR_ID_REDHAT             0x1b36
>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index 56c8b4e57f..893ac65c0b 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -25,6 +25,7 @@
>  #include "hw/pci/pci.h"
>  
>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
>  #define VIRTIO_IOMMU(obj) \
>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
>  
> diff --git a/qdev-monitor.c b/qdev-monitor.c
> index 58222c2211..74cf090c61 100644
> --- a/qdev-monitor.c
> +++ b/qdev-monitor.c
> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
>      { "virtio-input-host-pci", "virtio-input-host",
>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
>      { "virtio-keyboard-pci", "virtio-keyboard",
>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant Eric Auger
@ 2019-07-30 19:38   ` Michael S. Tsirkin
  2019-07-30 23:20     ` Tian, Kevin
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-07-30 19:38 UTC (permalink / raw)
  To: Eric Auger
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:33PM +0200, Eric Auger wrote:
> We introduce a new msi_bypass field which indicates whether
> the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed
> as a reserved region. By default the field is set to true at
> instantiation time. Later on we will introduce a property at
> virtio pci proxy level to turn it off.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v8 -> v9:
> - pass IOAPIC_RANGE_END to virtio_iommu_register_resv_region
> - take into account the change in the struct virtio_iommu_probe_resv_mem
>   definition
> - We just introduce the field here. A property will be introduced later on
>   at pci proxy level.
> ---
>  hw/virtio/virtio-iommu.c         | 36 ++++++++++++++++++++++++++++++++
>  include/hw/virtio/virtio-iommu.h |  1 +
>  2 files changed, 37 insertions(+)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 66be9a4627..74038288b0 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -39,6 +39,9 @@
>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>  #define VIOMMU_PROBE_SIZE 512
>  
> +#define IOAPIC_RANGE_START      (0xfee00000)
> +#define IOAPIC_RANGE_END        (0xfeefffff)
> +
>  #define SUPPORTED_PROBE_PROPERTIES (\
>      1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
>  

Sorry where are these numbers coming from?
Does this really work on all platforms?
With all guests?

> @@ -100,6 +103,30 @@ static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>      ep->domain = NULL;
>  }
>  
> +static void virtio_iommu_register_resv_region(viommu_endpoint *ep,
> +                                              uint8_t subtype,
> +                                              uint64_t start, uint64_t end)
> +{
> +    viommu_interval *interval;
> +    struct virtio_iommu_probe_resv_mem *resv_reg_prop;
> +    size_t prop_size = sizeof(struct virtio_iommu_probe_resv_mem);
> +    size_t value_size = prop_size -
> +                sizeof(struct virtio_iommu_probe_property);
> +
> +    interval = g_malloc0(sizeof(*interval));
> +    interval->low = start;
> +    interval->high = end;
> +
> +    resv_reg_prop = g_malloc0(prop_size);
> +    resv_reg_prop->head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
> +    resv_reg_prop->head.length = cpu_to_le64(value_size);
> +    resv_reg_prop->subtype = cpu_to_le64(subtype);
> +    resv_reg_prop->start = cpu_to_le64(start);
> +    resv_reg_prop->end = cpu_to_le64(end);
> +
> +    g_tree_insert(ep->reserved_regions, interval, resv_reg_prop);
> +}
> +
>  static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>                                                    uint32_t ep_id)
>  {
> @@ -117,6 +144,12 @@ static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>      ep->reserved_regions = g_tree_new_full((GCompareDataFunc)interval_cmp,
>                                              NULL, (GDestroyNotify)g_free,
>                                              (GDestroyNotify)g_free);
> +    if (s->msi_bypass) {
> +        virtio_iommu_register_resv_region(ep, VIRTIO_IOMMU_RESV_MEM_T_MSI,
> +                                          IOAPIC_RANGE_START,
> +                                          IOAPIC_RANGE_END);
> +    }
> +
>      return ep;
>  }
>  
> @@ -822,6 +855,9 @@ static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
>  
>  static void virtio_iommu_instance_init(Object *obj)
>  {
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
> +
> +    s->msi_bypass = true;
>  }
>  
>  static const VMStateDescription vmstate_virtio_iommu = {
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index f55f48d304..56c8b4e57f 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -59,6 +59,7 @@ typedef struct VirtIOIOMMU {
>      GTree *domains;
>      QemuMutex mutex;
>      GTree *endpoints;
> +    bool msi_bypass;
>  } VirtIOIOMMU;
>  
>  #endif
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-30 19:38   ` Michael S. Tsirkin
@ 2019-07-30 23:20     ` Tian, Kevin
  2019-07-31  9:05       ` Auger Eric
  2019-07-31 19:25       ` Michael S. Tsirkin
  0 siblings, 2 replies; 55+ messages in thread
From: Tian, Kevin @ 2019-07-30 23:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eric Auger
  Cc: jean-philippe, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Wednesday, July 31, 2019 3:38 AM
> 
> On Tue, Jul 30, 2019 at 07:21:33PM +0200, Eric Auger wrote:
> > We introduce a new msi_bypass field which indicates whether
> > the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed

it's not good to call it IOAPIC MSI window. any write to this range, either
from IOAPIC or PCI device, is interpreted by the platform as interrupt
request. I'd call it "x86 interrupt address range".

> > as a reserved region. By default the field is set to true at
> > instantiation time. Later on we will introduce a property at
> > virtio pci proxy level to turn it off.
> >
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >
> > ---
> >
> > v8 -> v9:
> > - pass IOAPIC_RANGE_END to virtio_iommu_register_resv_region
> > - take into account the change in the struct virtio_iommu_probe_resv_mem
> >   definition
> > - We just introduce the field here. A property will be introduced later on
> >   at pci proxy level.
> > ---
> >  hw/virtio/virtio-iommu.c         | 36 ++++++++++++++++++++++++++++++++
> >  include/hw/virtio/virtio-iommu.h |  1 +
> >  2 files changed, 37 insertions(+)
> >
> > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> > index 66be9a4627..74038288b0 100644
> > --- a/hw/virtio/virtio-iommu.c
> > +++ b/hw/virtio/virtio-iommu.c
> > @@ -39,6 +39,9 @@
> >  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
> >  #define VIOMMU_PROBE_SIZE 512
> >
> > +#define IOAPIC_RANGE_START      (0xfee00000)
> > +#define IOAPIC_RANGE_END        (0xfeefffff)
> > +
> >  #define SUPPORTED_PROBE_PROPERTIES (\
> >      1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
> >
> 
> Sorry where are these numbers coming from?

this is architecturally defined in x86 SDM.

> Does this really work on all platforms?

x86 only. 

> With all guests?

yes.

> 
> > @@ -100,6 +103,30 @@ static void
> virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
> >      ep->domain = NULL;
> >  }
> >
> > +static void virtio_iommu_register_resv_region(viommu_endpoint *ep,
> > +                                              uint8_t subtype,
> > +                                              uint64_t start, uint64_t end)
> > +{
> > +    viommu_interval *interval;
> > +    struct virtio_iommu_probe_resv_mem *resv_reg_prop;
> > +    size_t prop_size = sizeof(struct virtio_iommu_probe_resv_mem);
> > +    size_t value_size = prop_size -
> > +                sizeof(struct virtio_iommu_probe_property);
> > +
> > +    interval = g_malloc0(sizeof(*interval));
> > +    interval->low = start;
> > +    interval->high = end;
> > +
> > +    resv_reg_prop = g_malloc0(prop_size);
> > +    resv_reg_prop->head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
> > +    resv_reg_prop->head.length = cpu_to_le64(value_size);
> > +    resv_reg_prop->subtype = cpu_to_le64(subtype);
> > +    resv_reg_prop->start = cpu_to_le64(start);
> > +    resv_reg_prop->end = cpu_to_le64(end);
> > +
> > +    g_tree_insert(ep->reserved_regions, interval, resv_reg_prop);
> > +}
> > +
> >  static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> >                                                    uint32_t ep_id)
> >  {
> > @@ -117,6 +144,12 @@ static viommu_endpoint
> *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> >      ep->reserved_regions =
> g_tree_new_full((GCompareDataFunc)interval_cmp,
> >                                              NULL, (GDestroyNotify)g_free,
> >                                              (GDestroyNotify)g_free);
> > +    if (s->msi_bypass) {
> > +        virtio_iommu_register_resv_region(ep,
> VIRTIO_IOMMU_RESV_MEM_T_MSI,
> > +                                          IOAPIC_RANGE_START,
> > +                                          IOAPIC_RANGE_END);
> > +    }
> > +
> >      return ep;
> >  }
> >
> > @@ -822,6 +855,9 @@ static void virtio_iommu_set_status(VirtIODevice
> *vdev, uint8_t status)
> >
> >  static void virtio_iommu_instance_init(Object *obj)
> >  {
> > +    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
> > +
> > +    s->msi_bypass = true;
> >  }
> >
> >  static const VMStateDescription vmstate_virtio_iommu = {
> > diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
> iommu.h
> > index f55f48d304..56c8b4e57f 100644
> > --- a/include/hw/virtio/virtio-iommu.h
> > +++ b/include/hw/virtio/virtio-iommu.h
> > @@ -59,6 +59,7 @@ typedef struct VirtIOIOMMU {
> >      GTree *domains;
> >      QemuMutex mutex;
> >      GTree *endpoints;
> > +    bool msi_bypass;
> >  } VirtIOIOMMU;
> >
> >  #endif
> > --
> > 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-30 23:20     ` Tian, Kevin
@ 2019-07-31  9:05       ` Auger Eric
  2019-07-31 19:25       ` Michael S. Tsirkin
  1 sibling, 0 replies; 55+ messages in thread
From: Auger Eric @ 2019-07-31  9:05 UTC (permalink / raw)
  To: Tian, Kevin, Michael S. Tsirkin
  Cc: jean-philippe, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Kevin, Michael,

On 7/31/19 1:20 AM, Tian, Kevin wrote:
>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>> Sent: Wednesday, July 31, 2019 3:38 AM
>>
>> On Tue, Jul 30, 2019 at 07:21:33PM +0200, Eric Auger wrote:
>>> We introduce a new msi_bypass field which indicates whether
>>> the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed
> 
> it's not good to call it IOAPIC MSI window. any write to this range, either
> from IOAPIC or PCI device, is interpreted by the platform as interrupt
> request. I'd call it "x86 interrupt address range".
Thank you for the clarification. I will reword the commit message as
suggested.
> 
>>> as a reserved region. By default the field is set to true at
>>> instantiation time. Later on we will introduce a property at
>>> virtio pci proxy level to turn it off.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>>
>>> v8 -> v9:
>>> - pass IOAPIC_RANGE_END to virtio_iommu_register_resv_region
>>> - take into account the change in the struct virtio_iommu_probe_resv_mem
>>>   definition
>>> - We just introduce the field here. A property will be introduced later on
>>>   at pci proxy level.
>>> ---
>>>  hw/virtio/virtio-iommu.c         | 36 ++++++++++++++++++++++++++++++++
>>>  include/hw/virtio/virtio-iommu.h |  1 +
>>>  2 files changed, 37 insertions(+)
>>>
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>> index 66be9a4627..74038288b0 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -39,6 +39,9 @@
>>>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>>>  #define VIOMMU_PROBE_SIZE 512
>>>
>>> +#define IOAPIC_RANGE_START      (0xfee00000)
>>> +#define IOAPIC_RANGE_END        (0xfeefffff)
>>> +
>>>  #define SUPPORTED_PROBE_PROPERTIES (\
>>>      1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
>>>
>>
>> Sorry where are these numbers coming from?
> 
> this is architecturally defined in x86 SDM.
> 
>> Does this really work on all platforms?
> 
> x86 only. 
Yes, the initial goal was to allow the x86 integration. Maybe I should allow
the machine to pass reserved regions as device properties instead.

As integration with pc/q35 is beyond the scope of this initial series,
maybe I should remove that patch?

Thanks

Eric
> 
>> With all guests?
> 
> yes.
> 
>>
>>> @@ -100,6 +103,30 @@ static void
>> virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>>>      ep->domain = NULL;
>>>  }
>>>
>>> +static void virtio_iommu_register_resv_region(viommu_endpoint *ep,
>>> +                                              uint8_t subtype,
>>> +                                              uint64_t start, uint64_t end)
>>> +{
>>> +    viommu_interval *interval;
>>> +    struct virtio_iommu_probe_resv_mem *resv_reg_prop;
>>> +    size_t prop_size = sizeof(struct virtio_iommu_probe_resv_mem);
>>> +    size_t value_size = prop_size -
>>> +                sizeof(struct virtio_iommu_probe_property);
>>> +
>>> +    interval = g_malloc0(sizeof(*interval));
>>> +    interval->low = start;
>>> +    interval->high = end;
>>> +
>>> +    resv_reg_prop = g_malloc0(prop_size);
>>> +    resv_reg_prop->head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
>>> +    resv_reg_prop->head.length = cpu_to_le64(value_size);
>>> +    resv_reg_prop->subtype = cpu_to_le64(subtype);
>>> +    resv_reg_prop->start = cpu_to_le64(start);
>>> +    resv_reg_prop->end = cpu_to_le64(end);
>>> +
>>> +    g_tree_insert(ep->reserved_regions, interval, resv_reg_prop);
>>> +}
>>> +
>>>  static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>>>                                                    uint32_t ep_id)
>>>  {
>>> @@ -117,6 +144,12 @@ static viommu_endpoint
>> *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>>>      ep->reserved_regions =
>> g_tree_new_full((GCompareDataFunc)interval_cmp,
>>>                                              NULL, (GDestroyNotify)g_free,
>>>                                              (GDestroyNotify)g_free);
>>> +    if (s->msi_bypass) {
>>> +        virtio_iommu_register_resv_region(ep,
>> VIRTIO_IOMMU_RESV_MEM_T_MSI,
>>> +                                          IOAPIC_RANGE_START,
>>> +                                          IOAPIC_RANGE_END);
>>> +    }
>>> +
>>>      return ep;
>>>  }
>>>
>>> @@ -822,6 +855,9 @@ static void virtio_iommu_set_status(VirtIODevice
>> *vdev, uint8_t status)
>>>
>>>  static void virtio_iommu_instance_init(Object *obj)
>>>  {
>>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
>>> +
>>> +    s->msi_bypass = true;
>>>  }
>>>
>>>  static const VMStateDescription vmstate_virtio_iommu = {
>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
>> iommu.h
>>> index f55f48d304..56c8b4e57f 100644
>>> --- a/include/hw/virtio/virtio-iommu.h
>>> +++ b/include/hw/virtio/virtio-iommu.h
>>> @@ -59,6 +59,7 @@ typedef struct VirtIOIOMMU {
>>>      GTree *domains;
>>>      QemuMutex mutex;
>>>      GTree *endpoints;
>>> +    bool msi_bypass;
>>>  } VirtIOIOMMU;
>>>
>>>  #endif
>>> --
>>> 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-30 23:20     ` Tian, Kevin
  2019-07-31  9:05       ` Auger Eric
@ 2019-07-31 19:25       ` Michael S. Tsirkin
  2019-07-31 19:44         ` Auger Eric
  1 sibling, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-07-31 19:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: peter.maydell, jean-philippe, tn, qemu-devel, peterx, Eric Auger,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 11:20:44PM +0000, Tian, Kevin wrote:
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Wednesday, July 31, 2019 3:38 AM
> > 
> > On Tue, Jul 30, 2019 at 07:21:33PM +0200, Eric Auger wrote:
> > > We introduce a new msi_bypass field which indicates whether
> > > the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed
> 
> it's not good to call it IOAPIC MSI window. any write to this range, either
> from IOAPIC or PCI device, is interpreted by the platform as interrupt
> request. I'd call it "x86 interrupt address range".

Isn't this APIC_DEFAULT_ADDRESS? I'm not sure guests can't change it
even though I'm not sure qemu supports changing it.

And if so I'd say integrating IOAPIC defaults into the device itself is
inelegant.  How about having guest supply the range through config
space? It's a small change that won't be too late for Linux.

> > > as a reserved region. By default the field is set to true at
> > > instantiation time. Later on we will introduce a property at
> > > virtio pci proxy level to turn it off.
> > >
> > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > >
> > > ---
> > >
> > > v8 -> v9:
> > > - pass IOAPIC_RANGE_END to virtio_iommu_register_resv_region
> > > - take into account the change in the struct virtio_iommu_probe_resv_mem
> > >   definition
> > > - We just introduce the field here. A property will be introduced later on
> > >   at pci proxy level.
> > > ---
> > >  hw/virtio/virtio-iommu.c         | 36 ++++++++++++++++++++++++++++++++
> > >  include/hw/virtio/virtio-iommu.h |  1 +
> > >  2 files changed, 37 insertions(+)
> > >
> > > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> > > index 66be9a4627..74038288b0 100644
> > > --- a/hw/virtio/virtio-iommu.c
> > > +++ b/hw/virtio/virtio-iommu.c
> > > @@ -39,6 +39,9 @@
> > >  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
> > >  #define VIOMMU_PROBE_SIZE 512
> > >
> > > +#define IOAPIC_RANGE_START      (0xfee00000)
> > > +#define IOAPIC_RANGE_END        (0xfeefffff)
> > > +
> > >  #define SUPPORTED_PROBE_PROPERTIES (\
> > >      1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
> > >
> > 
> > Sorry where are these numbers coming from?
> 
> this is architecturally defined in x86 SDM.
> 
> > Does this really work on all platforms?
> 
> x86 only. 

But you seem to add this code for all platforms:

	@@ -6,6 +6,11 @@ config VIRTIO_RNG
	     default y
	     depends on VIRTIO

	+config VIRTIO_IOMMU
	+    bool
	+    default y
	+    depends on VIRTIO
	+    


> > With all guests?
> 
> yes.
> 
> > 
> > > @@ -100,6 +103,30 @@ static void
> > virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
> > >      ep->domain = NULL;
> > >  }
> > >
> > > +static void virtio_iommu_register_resv_region(viommu_endpoint *ep,
> > > +                                              uint8_t subtype,
> > > +                                              uint64_t start, uint64_t end)
> > > +{
> > > +    viommu_interval *interval;
> > > +    struct virtio_iommu_probe_resv_mem *resv_reg_prop;
> > > +    size_t prop_size = sizeof(struct virtio_iommu_probe_resv_mem);
> > > +    size_t value_size = prop_size -
> > > +                sizeof(struct virtio_iommu_probe_property);
> > > +
> > > +    interval = g_malloc0(sizeof(*interval));
> > > +    interval->low = start;
> > > +    interval->high = end;
> > > +
> > > +    resv_reg_prop = g_malloc0(prop_size);
> > > +    resv_reg_prop->head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
> > > +    resv_reg_prop->head.length = cpu_to_le64(value_size);
> > > +    resv_reg_prop->subtype = cpu_to_le64(subtype);
> > > +    resv_reg_prop->start = cpu_to_le64(start);
> > > +    resv_reg_prop->end = cpu_to_le64(end);
> > > +
> > > +    g_tree_insert(ep->reserved_regions, interval, resv_reg_prop);
> > > +}
> > > +
> > >  static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> > >                                                    uint32_t ep_id)
> > >  {
> > > @@ -117,6 +144,12 @@ static viommu_endpoint
> > *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> > >      ep->reserved_regions =
> > g_tree_new_full((GCompareDataFunc)interval_cmp,
> > >                                              NULL, (GDestroyNotify)g_free,
> > >                                              (GDestroyNotify)g_free);
> > > +    if (s->msi_bypass) {
> > > +        virtio_iommu_register_resv_region(ep,
> > VIRTIO_IOMMU_RESV_MEM_T_MSI,
> > > +                                          IOAPIC_RANGE_START,
> > > +                                          IOAPIC_RANGE_END);
> > > +    }
> > > +
> > >      return ep;
> > >  }
> > >
> > > @@ -822,6 +855,9 @@ static void virtio_iommu_set_status(VirtIODevice
> > *vdev, uint8_t status)
> > >
> > >  static void virtio_iommu_instance_init(Object *obj)
> > >  {
> > > +    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
> > > +
> > > +    s->msi_bypass = true;
> > >  }
> > >
> > >  static const VMStateDescription vmstate_virtio_iommu = {
> > > diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
> > iommu.h
> > > index f55f48d304..56c8b4e57f 100644
> > > --- a/include/hw/virtio/virtio-iommu.h
> > > +++ b/include/hw/virtio/virtio-iommu.h
> > > @@ -59,6 +59,7 @@ typedef struct VirtIOIOMMU {
> > >      GTree *domains;
> > >      QemuMutex mutex;
> > >      GTree *endpoints;
> > > +    bool msi_bypass;
> > >  } VirtIOIOMMU;
> > >
> > >  #endif
> > > --
> > > 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-31 19:25       ` Michael S. Tsirkin
@ 2019-07-31 19:44         ` Auger Eric
  2019-07-31 23:23           ` Tian, Kevin
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-07-31 19:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, Tian, Kevin
  Cc: jean-philippe, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Michael,

On 7/31/19 9:25 PM, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:20:44PM +0000, Tian, Kevin wrote:
>>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>>> Sent: Wednesday, July 31, 2019 3:38 AM
>>>
>>> On Tue, Jul 30, 2019 at 07:21:33PM +0200, Eric Auger wrote:
>>>> We introduce a new msi_bypass field which indicates whether
>>>> the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed
>>
>> it's not good to call it IOAPIC MSI window. any write to this range, either
>> from IOAPIC or PCI device, is interpreted by the platform as interrupt
>> request. I'd call it "x86 interrupt address range".
> 
> Isn't this APIC_DEFAULT_ADDRESS? I'm not sure guests can't change it
> even though I'm not sure qemu supports changing it.

That's indeed matching:

#define APIC_DEFAULT_ADDRESS 0xfee00000
#define APIC_SPACE_SIZE      0x100000

> 
> And if so I'd say integrating IOAPIC defaults into the device itself is
> inelegant.

I agree.

  How about having guest supply the range through config
> space? It's a small change that won't be too late for Linux.

Isn't it a property of the platform instead. I mean isn't it the job of
the machine model to set this. The guest driver is arch agnostic if I am
not wrong.


> 
>>>> as a reserved region. By default the field is set to true at
>>>> instantiation time. Later on we will introduce a property at
>>>> virtio pci proxy level to turn it off.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v8 -> v9:
>>>> - pass IOAPIC_RANGE_END to virtio_iommu_register_resv_region
>>>> - take into account the change in the struct virtio_iommu_probe_resv_mem
>>>>   definition
>>>> - We just introduce the field here. A property will be introduced later on
>>>>   at pci proxy level.
>>>> ---
>>>>  hw/virtio/virtio-iommu.c         | 36 ++++++++++++++++++++++++++++++++
>>>>  include/hw/virtio/virtio-iommu.h |  1 +
>>>>  2 files changed, 37 insertions(+)
>>>>
>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>>> index 66be9a4627..74038288b0 100644
>>>> --- a/hw/virtio/virtio-iommu.c
>>>> +++ b/hw/virtio/virtio-iommu.c
>>>> @@ -39,6 +39,9 @@
>>>>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>>>>  #define VIOMMU_PROBE_SIZE 512
>>>>
>>>> +#define IOAPIC_RANGE_START      (0xfee00000)
>>>> +#define IOAPIC_RANGE_END        (0xfeefffff)
>>>> +
>>>>  #define SUPPORTED_PROBE_PROPERTIES (\
>>>>      1 << VIRTIO_IOMMU_PROBE_T_RESV_MEM)
>>>>
>>>
>>> Sorry where are these numbers coming from?
>>
>> this is architecturally defined in x86 SDM.
>>
>>> Does this really work on all platforms?
>>
>> x86 only. 
> 
> But you seem to add this code for all platforms:
> 
> 	@@ -6,6 +6,11 @@ config VIRTIO_RNG
> 	     default y
> 	     depends on VIRTIO
> 
> 	+config VIRTIO_IOMMU
> 	+    bool
> 	+    default y
> 	+    depends on VIRTIO
> 	+
Actually it was supposed to be integrated with ARM first and then with x86.

Thanks

Eric
> 
> 
>>> With all guests?
>>
>> yes.
>>
>>>
>>>> @@ -100,6 +103,30 @@ static void
>>> virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>>>>      ep->domain = NULL;
>>>>  }
>>>>
>>>> +static void virtio_iommu_register_resv_region(viommu_endpoint *ep,
>>>> +                                              uint8_t subtype,
>>>> +                                              uint64_t start, uint64_t end)
>>>> +{
>>>> +    viommu_interval *interval;
>>>> +    struct virtio_iommu_probe_resv_mem *resv_reg_prop;
>>>> +    size_t prop_size = sizeof(struct virtio_iommu_probe_resv_mem);
>>>> +    size_t value_size = prop_size -
>>>> +                sizeof(struct virtio_iommu_probe_property);
>>>> +
>>>> +    interval = g_malloc0(sizeof(*interval));
>>>> +    interval->low = start;
>>>> +    interval->high = end;
>>>> +
>>>> +    resv_reg_prop = g_malloc0(prop_size);
>>>> +    resv_reg_prop->head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
>>>> +    resv_reg_prop->head.length = cpu_to_le64(value_size);
>>>> +    resv_reg_prop->subtype = cpu_to_le64(subtype);
>>>> +    resv_reg_prop->start = cpu_to_le64(start);
>>>> +    resv_reg_prop->end = cpu_to_le64(end);
>>>> +
>>>> +    g_tree_insert(ep->reserved_regions, interval, resv_reg_prop);
>>>> +}
>>>> +
>>>>  static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>>>>                                                    uint32_t ep_id)
>>>>  {
>>>> @@ -117,6 +144,12 @@ static viommu_endpoint
>>> *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>>>>      ep->reserved_regions =
>>> g_tree_new_full((GCompareDataFunc)interval_cmp,
>>>>                                              NULL, (GDestroyNotify)g_free,
>>>>                                              (GDestroyNotify)g_free);
>>>> +    if (s->msi_bypass) {
>>>> +        virtio_iommu_register_resv_region(ep,
>>> VIRTIO_IOMMU_RESV_MEM_T_MSI,
>>>> +                                          IOAPIC_RANGE_START,
>>>> +                                          IOAPIC_RANGE_END);
>>>> +    }
>>>> +
>>>>      return ep;
>>>>  }
>>>>
>>>> @@ -822,6 +855,9 @@ static void virtio_iommu_set_status(VirtIODevice
>>> *vdev, uint8_t status)
>>>>
>>>>  static void virtio_iommu_instance_init(Object *obj)
>>>>  {
>>>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
>>>> +
>>>> +    s->msi_bypass = true;
>>>>  }
>>>>
>>>>  static const VMStateDescription vmstate_virtio_iommu = {
>>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
>>> iommu.h
>>>> index f55f48d304..56c8b4e57f 100644
>>>> --- a/include/hw/virtio/virtio-iommu.h
>>>> +++ b/include/hw/virtio/virtio-iommu.h
>>>> @@ -59,6 +59,7 @@ typedef struct VirtIOIOMMU {
>>>>      GTree *domains;
>>>>      QemuMutex mutex;
>>>>      GTree *endpoints;
>>>> +    bool msi_bypass;
>>>>  } VirtIOIOMMU;
>>>>
>>>>  #endif
>>>> --
>>>> 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant
  2019-07-31 19:44         ` Auger Eric
@ 2019-07-31 23:23           ` Tian, Kevin
  0 siblings, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2019-07-31 23:23 UTC (permalink / raw)
  To: Auger Eric, Michael S. Tsirkin
  Cc: jean-philippe, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Thursday, August 1, 2019 3:45 AM
> 
> Hi Michael,
> 
> On 7/31/19 9:25 PM, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 11:20:44PM +0000, Tian, Kevin wrote:
> >>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> >>> Sent: Wednesday, July 31, 2019 3:38 AM
> >>>
> >>> On Tue, Jul 30, 2019 at 07:21:33PM +0200, Eric Auger wrote:
> >>>> We introduce a new msi_bypass field which indicates whether
> >>>> the IOAPIC MSI window [0xFEE00000 - 0xFEEFFFFF] must be exposed
> >>
> >> it's not good to call it IOAPIC MSI window. any write to this range, either
> >> from IOAPIC or PCI device, is interpreted by the platform as interrupt
> >> request. I'd call it "x86 interrupt address range".
> >
> > Isn't this APIC_DEFAULT_ADDRESS? I'm not sure guests can't change it
> > even though I'm not sure qemu supports changing it.
> 
> That's indeed matching:
> 
> #define APIC_DEFAULT_ADDRESS 0xfee00000
> #define APIC_SPACE_SIZE      0x100000
> 

They are different thing, though value matches. APIC default address
is the memory-mapped region for software to access APIC register. It
can be relocated by the software, with default as 0xfee00000. On the
other hand, the interrupt address range is for root complex to interpret
interrupt message from devices. You can look at Intel SDM 3A, 10.11
Message Signalled Interrupts, where the message address register
format is defined with 0xfee as the hard prefix.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-07-30 19:35   ` Michael S. Tsirkin
@ 2019-08-01 12:15     ` Auger Eric
  2019-08-01 13:06       ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-08-01 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Michael,

On 7/30/19 9:35 PM, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
>> This patch adds virtio-iommu-pci, which is the pci proxy for
>> the virtio-iommu device.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> This part I'm not sure we should merge just yet.  The reason being I
> think we should limit it to mmio where DT can be used to describe iommu
> topology. For PCI I don't see why we shouldn't always expose this
> in the config space, and I think it's preferable not to
> need to support a mix of DT,ACPI and PCI as options.

For context, some discussion related to this topic already arose on v7
revision of the driver:

[1] Re: [PATCH v7 0/7] Add virtio-iommu driver
https://lore.kernel.org/linux-pci/87a7ioby9u.fsf@morokweng.localdomain/

Some additional thoughts.

First considering DT boot.

THE DT description features an iommu-map property in the
pci-host-ecam-generic node that describes which RIDs are handled by the
virtio-iommu and a possible offset/mask to be applied inbetween the RID
and the streamID at the input of the IOMMU
(Documentation/devicetree/bindings/pci/pci-iommu.txt)

As far as I understand when a DMA capable device is setup, its DMA
configuration is built using that call chain:

pci_dma_configure
|_ of_dma_configure
   |_ of_iommu_configure
      |_ of_pci_iommu_init
         |_ of_map_rid

I understand you would like the iommu-map/iommu-map-mask info to be
exposed directly into the config space of the device instead of inside
the DT or IORT table. Assuming a module is initialized sufficiently
early to retrieve this info, we would need the resulting info to be
consolidated to allow pci_dma_configure chain to work seemlessly. This
sounds a significant impact on above kernel infrastructure.

This comes in addition to the development of the "small module that
loads early and pokes at the IOMMU sufficiently to get the data about
which devices use the IOMMU out of it using standard virtio config
space" evoked in [1] + the definition of the data formats to be put in
the very cfg space.

With ACPI I understand we have the same kind of infrastructure:
drivers/acpi/arm64/iort.c currently extracts the mapping between RC RIDs
and IOMMU streamids

pci_dma_configure(
|_ acpi_dma_configure
   |_ iort_iommu_configure
      |_ iort_pci_iommu_init
         |_ iort_node_map_id
            |_ iort_id_map

Maybe I fail to see the easy and right way to do the integration at
kernel level but I am a bit frightened by the efforts that would be
requested to follow your suggestion, whereas the DT infra is ready and
fully upstreamed to accept the use case.

For ACPI I agree AFAIK IORT was primarily defined by ARM, for ARM but we
prototyped IORT integration with x86 and it worked for pc machine
without major trouble.

I sent the kernel and qemu patches prototyping this IORT integration:

https://github.com/eauger/linux/tree/virtio-iommu-v0.9-iort-x86
https://github.com/eauger/qemu/tree/v3.1.0-rc3-virtio-iommu-v0.9-x86

There ACPI IORT was built for PC machine and the integration effort at
both kernel and QEMU level was low. This work would need to be rebased
and depends on kernel ACPI related patches that are not yet upstreamed
though.

Thanks

Eric
> 
>> ---
>>
>> v8 -> v9:
>> - add the msi-bypass property
>> - create virtio-iommu-pci.c
>> ---
>>  hw/virtio/Makefile.objs          |  1 +
>>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
>>  include/hw/pci/pci.h             |  1 +
>>  include/hw/virtio/virtio-iommu.h |  1 +
>>  qdev-monitor.c                   |  1 +
>>  5 files changed, 92 insertions(+)
>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
>>
>> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
>> index f42e4dd94f..80ca719f1c 100644
>> --- a/hw/virtio/Makefile.objs
>> +++ b/hw/virtio/Makefile.objs
>> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
>>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
>>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
>>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
>> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
>>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
>>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
>>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
>> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
>> new file mode 100644
>> index 0000000000..f9977096bd
>> --- /dev/null
>> +++ b/hw/virtio/virtio-iommu-pci.c
>> @@ -0,0 +1,88 @@
>> +/*
>> + * Virtio IOMMU PCI Bindings
>> + *
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + * Written by Eric Auger
>> + *
>> + *  This program is free software; you can redistribute it and/or modify
>> + *  it under the terms of the GNU General Public License version 2 or
>> + *  (at your option) any later version.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +
>> +#include "virtio-pci.h"
>> +#include "hw/virtio/virtio-iommu.h"
>> +
>> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
>> +
>> +/*
>> + * virtio-iommu-pci: This extends VirtioPCIProxy.
>> + *
>> + */
>> +#define VIRTIO_IOMMU_PCI(obj) \
>> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
>> +
>> +struct VirtIOIOMMUPCI {
>> +    VirtIOPCIProxy parent_obj;
>> +    VirtIOIOMMU vdev;
>> +};
>> +
>> +static Property virtio_iommu_pci_properties[] = {
>> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>> +{
>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
>> +    DeviceState *vdev = DEVICE(&dev->vdev);
>> +
>> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
>> +    object_property_set_link(OBJECT(dev),
>> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
>> +                             "primary-bus", errp);
>> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
>> +}
>> +
>> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
>> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
>> +    k->realize = virtio_iommu_pci_realize;
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +    dc->props = virtio_iommu_pci_properties;
>> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
>> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
>> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
>> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
>> +}
>> +
>> +static void virtio_iommu_pci_instance_init(Object *obj)
>> +{
>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
>> +
>> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
>> +                                TYPE_VIRTIO_IOMMU);
>> +}
>> +
>> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
>> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
>> +    .generic_name          = "virtio-iommu-pci",
>> +    .transitional_name     = "virtio-iommu-pci-transitional",
>> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
>> +    .instance_size = sizeof(VirtIOIOMMUPCI),
>> +    .instance_init = virtio_iommu_pci_instance_init,
>> +    .class_init    = virtio_iommu_pci_class_init,
>> +};
>> +
>> +static void virtio_iommu_pci_register(void)
>> +{
>> +    virtio_pci_types_register(&virtio_iommu_pci_info);
>> +}
>> +
>> +type_init(virtio_iommu_pci_register)
>> +
>> +
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index aaf1b9f70d..492ea7e68d 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -86,6 +86,7 @@ extern bool pci_available;
>>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
>>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
>>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
>> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
>>  
>>  #define PCI_VENDOR_ID_REDHAT             0x1b36
>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>> index 56c8b4e57f..893ac65c0b 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -25,6 +25,7 @@
>>  #include "hw/pci/pci.h"
>>  
>>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
>> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
>>  #define VIRTIO_IOMMU(obj) \
>>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
>>  
>> diff --git a/qdev-monitor.c b/qdev-monitor.c
>> index 58222c2211..74cf090c61 100644
>> --- a/qdev-monitor.c
>> +++ b/qdev-monitor.c
>> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
>>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
>>      { "virtio-input-host-pci", "virtio-input-host",
>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
>>      { "virtio-keyboard-pci", "virtio-keyboard",
>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>> -- 
>> 2.20.1
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-08-01 12:15     ` Auger Eric
@ 2019-08-01 13:06       ` Michael S. Tsirkin
  2019-08-01 13:49         ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-08-01 13:06 UTC (permalink / raw)
  To: Auger Eric
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Thu, Aug 01, 2019 at 02:15:03PM +0200, Auger Eric wrote:
> Hi Michael,
> 
> On 7/30/19 9:35 PM, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
> >> This patch adds virtio-iommu-pci, which is the pci proxy for
> >> the virtio-iommu device.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > 
> > This part I'm not sure we should merge just yet.  The reason being I
> > think we should limit it to mmio where DT can be used to describe iommu
> > topology. For PCI I don't see why we shouldn't always expose this
> > in the config space, and I think it's preferable not to
> > need to support a mix of DT,ACPI and PCI as options.
> 
> For context, some discussion related to this topic already arose on v7
> revision of the driver:
> 
> [1] Re: [PATCH v7 0/7] Add virtio-iommu driver
> https://lore.kernel.org/linux-pci/87a7ioby9u.fsf@morokweng.localdomain/
> 
> Some additional thoughts.
> 
> First considering DT boot.
> 
> THE DT description features an iommu-map property in the
> pci-host-ecam-generic node that describes which RIDs are handled by the
> virtio-iommu and a possible offset/mask to be applied inbetween the RID
> and the streamID at the input of the IOMMU
> (Documentation/devicetree/bindings/pci/pci-iommu.txt)
> 
> As far as I understand when a DMA capable device is setup, its DMA
> configuration is built using that call chain:
> 
> pci_dma_configure
> |_ of_dma_configure
>    |_ of_iommu_configure
>       |_ of_pci_iommu_init
>          |_ of_map_rid
> 
> I understand you would like the iommu-map/iommu-map-mask info to be
> exposed directly into the config space of the device instead of inside
> the DT or IORT table. Assuming a module is initialized sufficiently
> early to retrieve this info, we would need the resulting info to be
> consolidated to allow pci_dma_configure chain to work seemlessly. This
> sounds a significant impact on above kernel infrastructure.

I don't really know what consolidated means.
It is pretty common for IOMMUs to expose config through
PCI registers. This typically happens as a fixup.

I would write a tiny driver to do exactly that,
and run it from the fixup.


> This comes in addition to the development of the "small module that
> loads early and pokes at the IOMMU sufficiently to get the data about
> which devices use the IOMMU out of it using standard virtio config
> space" evoked in [1] + the definition of the data formats to be put in
> the very cfg space.

That last part is true but that's exactly why I propose we
wait on this patch a bit.

> With ACPI I understand we have the same kind of infrastructure:
> drivers/acpi/arm64/iort.c currently extracts the mapping between RC RIDs
> and IOMMU streamids
> 
> pci_dma_configure(
> |_ acpi_dma_configure
>    |_ iort_iommu_configure
>       |_ iort_pci_iommu_init
>          |_ iort_node_map_id
>             |_ iort_id_map
> 
> Maybe I fail to see the easy and right way to do the integration at
> kernel level but I am a bit frightened by the efforts that would be
> requested to follow your suggestion, whereas the DT infra is ready and
> fully upstreamed to accept the use case.

Did you take a look at drivers/pci/quirks.c and how these run?
I think it's just a question of adding DECLARE_PCI_FIXUP_CLASS_EARLY
and running your hook from there.


> For ACPI I agree AFAIK IORT was primarily defined by ARM, for ARM but we
> prototyped IORT integration with x86 and it worked for pc machine
> without major trouble.
> 
> I sent the kernel and qemu patches prototyping this IORT integration:
> 
> https://github.com/eauger/linux/tree/virtio-iommu-v0.9-iort-x86
> https://github.com/eauger/qemu/tree/v3.1.0-rc3-virtio-iommu-v0.9-x86
> 
> There ACPI IORT was built for PC machine and the integration effort at
> both kernel and QEMU level was low. This work would need to be rebased
> and depends on kernel ACPI related patches that are not yet upstreamed
> though.
> 
> Thanks
> 
> Eric

In the end it might turn out you are right.  But it does us no harm to
delay this just a bit, and for now limit things to ARM where it's
already used and where alternatives exist.


> > 
> >> ---
> >>
> >> v8 -> v9:
> >> - add the msi-bypass property
> >> - create virtio-iommu-pci.c
> >> ---
> >>  hw/virtio/Makefile.objs          |  1 +
> >>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
> >>  include/hw/pci/pci.h             |  1 +
> >>  include/hw/virtio/virtio-iommu.h |  1 +
> >>  qdev-monitor.c                   |  1 +
> >>  5 files changed, 92 insertions(+)
> >>  create mode 100644 hw/virtio/virtio-iommu-pci.c
> >>
> >> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> >> index f42e4dd94f..80ca719f1c 100644
> >> --- a/hw/virtio/Makefile.objs
> >> +++ b/hw/virtio/Makefile.objs
> >> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
> >>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
> >>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
> >>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
> >> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
> >>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
> >>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
> >>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
> >> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> >> new file mode 100644
> >> index 0000000000..f9977096bd
> >> --- /dev/null
> >> +++ b/hw/virtio/virtio-iommu-pci.c
> >> @@ -0,0 +1,88 @@
> >> +/*
> >> + * Virtio IOMMU PCI Bindings
> >> + *
> >> + * Copyright (c) 2019 Red Hat, Inc.
> >> + * Written by Eric Auger
> >> + *
> >> + *  This program is free software; you can redistribute it and/or modify
> >> + *  it under the terms of the GNU General Public License version 2 or
> >> + *  (at your option) any later version.
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +
> >> +#include "virtio-pci.h"
> >> +#include "hw/virtio/virtio-iommu.h"
> >> +
> >> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
> >> +
> >> +/*
> >> + * virtio-iommu-pci: This extends VirtioPCIProxy.
> >> + *
> >> + */
> >> +#define VIRTIO_IOMMU_PCI(obj) \
> >> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
> >> +
> >> +struct VirtIOIOMMUPCI {
> >> +    VirtIOPCIProxy parent_obj;
> >> +    VirtIOIOMMU vdev;
> >> +};
> >> +
> >> +static Property virtio_iommu_pci_properties[] = {
> >> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> >> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
> >> +    DEFINE_PROP_END_OF_LIST(),
> >> +};
> >> +
> >> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >> +{
> >> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
> >> +    DeviceState *vdev = DEVICE(&dev->vdev);
> >> +
> >> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> >> +    object_property_set_link(OBJECT(dev),
> >> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
> >> +                             "primary-bus", errp);
> >> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> >> +}
> >> +
> >> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
> >> +{
> >> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> >> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> >> +    k->realize = virtio_iommu_pci_realize;
> >> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >> +    dc->props = virtio_iommu_pci_properties;
> >> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> >> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
> >> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> >> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> >> +}
> >> +
> >> +static void virtio_iommu_pci_instance_init(Object *obj)
> >> +{
> >> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
> >> +
> >> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> >> +                                TYPE_VIRTIO_IOMMU);
> >> +}
> >> +
> >> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
> >> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
> >> +    .generic_name          = "virtio-iommu-pci",
> >> +    .transitional_name     = "virtio-iommu-pci-transitional",
> >> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
> >> +    .instance_size = sizeof(VirtIOIOMMUPCI),
> >> +    .instance_init = virtio_iommu_pci_instance_init,
> >> +    .class_init    = virtio_iommu_pci_class_init,
> >> +};
> >> +
> >> +static void virtio_iommu_pci_register(void)
> >> +{
> >> +    virtio_pci_types_register(&virtio_iommu_pci_info);
> >> +}
> >> +
> >> +type_init(virtio_iommu_pci_register)
> >> +
> >> +
> >> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >> index aaf1b9f70d..492ea7e68d 100644
> >> --- a/include/hw/pci/pci.h
> >> +++ b/include/hw/pci/pci.h
> >> @@ -86,6 +86,7 @@ extern bool pci_available;
> >>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> >>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> >>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> >> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
> >>  
> >>  #define PCI_VENDOR_ID_REDHAT             0x1b36
> >>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> >> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> >> index 56c8b4e57f..893ac65c0b 100644
> >> --- a/include/hw/virtio/virtio-iommu.h
> >> +++ b/include/hw/virtio/virtio-iommu.h
> >> @@ -25,6 +25,7 @@
> >>  #include "hw/pci/pci.h"
> >>  
> >>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
> >> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
> >>  #define VIRTIO_IOMMU(obj) \
> >>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
> >>  
> >> diff --git a/qdev-monitor.c b/qdev-monitor.c
> >> index 58222c2211..74cf090c61 100644
> >> --- a/qdev-monitor.c
> >> +++ b/qdev-monitor.c
> >> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
> >>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
> >>      { "virtio-input-host-pci", "virtio-input-host",
> >>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
> >>      { "virtio-keyboard-pci", "virtio-keyboard",
> >>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >> -- 
> >> 2.20.1
> > 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-08-01 13:06       ` Michael S. Tsirkin
@ 2019-08-01 13:49         ` Auger Eric
  2019-09-01  6:40           ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-08-01 13:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Michael,

On 8/1/19 3:06 PM, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 02:15:03PM +0200, Auger Eric wrote:
>> Hi Michael,
>>
>> On 7/30/19 9:35 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
>>>> This patch adds virtio-iommu-pci, which is the pci proxy for
>>>> the virtio-iommu device.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> This part I'm not sure we should merge just yet.  The reason being I
>>> think we should limit it to mmio where DT can be used to describe iommu
>>> topology. For PCI I don't see why we shouldn't always expose this
>>> in the config space, and I think it's preferable not to
>>> need to support a mix of DT,ACPI and PCI as options.
>>
>> For context, some discussion related to this topic already arose on v7
>> revision of the driver:
>>
>> [1] Re: [PATCH v7 0/7] Add virtio-iommu driver
>> https://lore.kernel.org/linux-pci/87a7ioby9u.fsf@morokweng.localdomain/
>>
>> Some additional thoughts.
>>
>> First considering DT boot.
>>
>> THE DT description features an iommu-map property in the
>> pci-host-ecam-generic node that describes which RIDs are handled by the
>> virtio-iommu and a possible offset/mask to be applied inbetween the RID
>> and the streamID at the input of the IOMMU
>> (Documentation/devicetree/bindings/pci/pci-iommu.txt)
>>
>> As far as I understand when a DMA capable device is setup, its DMA
>> configuration is built using that call chain:
>>
>> pci_dma_configure
>> |_ of_dma_configure
>>    |_ of_iommu_configure
>>       |_ of_pci_iommu_init
>>          |_ of_map_rid
>>
>> I understand you would like the iommu-map/iommu-map-mask info to be
>> exposed directly into the config space of the device instead of inside
>> the DT or IORT table. Assuming a module is initialized sufficiently
>> early to retrieve this info, we would need the resulting info to be
>> consolidated to allow pci_dma_configure chain to work seemlessly. This
>> sounds a significant impact on above kernel infrastructure.
> 
> I don't really know what consolidated means.
> It is pretty common for IOMMUs to expose config through
> PCI registers. This typically happens as a fixup.
I meant: instead of retrieving the info through the of_* code you need
to interoperate with the module to retrieve the same info and detect
when you need to take that path instead of the of one.
> 
> I would write a tiny driver to do exactly that,
> and run it from the fixup.
> 
> 
>> This comes in addition to the development of the "small module that
>> loads early and pokes at the IOMMU sufficiently to get the data about
>> which devices use the IOMMU out of it using standard virtio config
>> space" evoked in [1] + the definition of the data formats to be put in
>> the very cfg space.
> 
> That last part is true but that's exactly why I propose we
> wait on this patch a bit.
> 
>> With ACPI I understand we have the same kind of infrastructure:
>> drivers/acpi/arm64/iort.c currently extracts the mapping between RC RIDs
>> and IOMMU streamids
>>
>> pci_dma_configure(
>> |_ acpi_dma_configure
>>    |_ iort_iommu_configure
>>       |_ iort_pci_iommu_init
>>          |_ iort_node_map_id
>>             |_ iort_id_map
>>
>> Maybe I fail to see the easy and right way to do the integration at
>> kernel level but I am a bit frightened by the efforts that would be
>> requested to follow your suggestion, whereas the DT infra is ready and
>> fully upstreamed to accept the use case.
> 
> Did you take a look at drivers/pci/quirks.c and how these run?
> I think it's just a question of adding DECLARE_PCI_FIXUP_CLASS_EARLY
> and running your hook from there.
I will do and trace the code.
> 
> 
>> For ACPI I agree AFAIK IORT was primarily defined by ARM, for ARM but we
>> prototyped IORT integration with x86 and it worked for pc machine
>> without major trouble.
>>
>> I sent the kernel and qemu patches prototyping this IORT integration:
>>
>> https://github.com/eauger/linux/tree/virtio-iommu-v0.9-iort-x86
>> https://github.com/eauger/qemu/tree/v3.1.0-rc3-virtio-iommu-v0.9-x86
>>
>> There ACPI IORT was built for PC machine and the integration effort at
>> both kernel and QEMU level was low. This work would need to be rebased
>> and depends on kernel ACPI related patches that are not yet upstreamed
>> though.
>>
>> Thanks
>>
>> Eric
> 
> In the end it might turn out you are right.  But it does us no harm to
> delay this just a bit, and for now limit things to ARM where it's
> already used and where alternatives exist.
So if my understanding is correct, at the moment you would accept a DT
integration using MMIO. Is that correct? Meanwhile we can prototype your
suggestion.

Thanks

Eric
> 
> 
>>>
>>>> ---
>>>>
>>>> v8 -> v9:
>>>> - add the msi-bypass property
>>>> - create virtio-iommu-pci.c
>>>> ---
>>>>  hw/virtio/Makefile.objs          |  1 +
>>>>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
>>>>  include/hw/pci/pci.h             |  1 +
>>>>  include/hw/virtio/virtio-iommu.h |  1 +
>>>>  qdev-monitor.c                   |  1 +
>>>>  5 files changed, 92 insertions(+)
>>>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
>>>>
>>>> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
>>>> index f42e4dd94f..80ca719f1c 100644
>>>> --- a/hw/virtio/Makefile.objs
>>>> +++ b/hw/virtio/Makefile.objs
>>>> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
>>>>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
>>>>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
>>>>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
>>>> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
>>>>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
>>>>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
>>>>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
>>>> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
>>>> new file mode 100644
>>>> index 0000000000..f9977096bd
>>>> --- /dev/null
>>>> +++ b/hw/virtio/virtio-iommu-pci.c
>>>> @@ -0,0 +1,88 @@
>>>> +/*
>>>> + * Virtio IOMMU PCI Bindings
>>>> + *
>>>> + * Copyright (c) 2019 Red Hat, Inc.
>>>> + * Written by Eric Auger
>>>> + *
>>>> + *  This program is free software; you can redistribute it and/or modify
>>>> + *  it under the terms of the GNU General Public License version 2 or
>>>> + *  (at your option) any later version.
>>>> + */
>>>> +
>>>> +#include "qemu/osdep.h"
>>>> +
>>>> +#include "virtio-pci.h"
>>>> +#include "hw/virtio/virtio-iommu.h"
>>>> +
>>>> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
>>>> +
>>>> +/*
>>>> + * virtio-iommu-pci: This extends VirtioPCIProxy.
>>>> + *
>>>> + */
>>>> +#define VIRTIO_IOMMU_PCI(obj) \
>>>> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
>>>> +
>>>> +struct VirtIOIOMMUPCI {
>>>> +    VirtIOPCIProxy parent_obj;
>>>> +    VirtIOIOMMU vdev;
>>>> +};
>>>> +
>>>> +static Property virtio_iommu_pci_properties[] = {
>>>> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>>> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
>>>> +    DEFINE_PROP_END_OF_LIST(),
>>>> +};
>>>> +
>>>> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>>> +{
>>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
>>>> +    DeviceState *vdev = DEVICE(&dev->vdev);
>>>> +
>>>> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
>>>> +    object_property_set_link(OBJECT(dev),
>>>> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
>>>> +                             "primary-bus", errp);
>>>> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
>>>> +}
>>>> +
>>>> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
>>>> +{
>>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>>>> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
>>>> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
>>>> +    k->realize = virtio_iommu_pci_realize;
>>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>>> +    dc->props = virtio_iommu_pci_properties;
>>>> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
>>>> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
>>>> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
>>>> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
>>>> +}
>>>> +
>>>> +static void virtio_iommu_pci_instance_init(Object *obj)
>>>> +{
>>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
>>>> +
>>>> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
>>>> +                                TYPE_VIRTIO_IOMMU);
>>>> +}
>>>> +
>>>> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
>>>> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
>>>> +    .generic_name          = "virtio-iommu-pci",
>>>> +    .transitional_name     = "virtio-iommu-pci-transitional",
>>>> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
>>>> +    .instance_size = sizeof(VirtIOIOMMUPCI),
>>>> +    .instance_init = virtio_iommu_pci_instance_init,
>>>> +    .class_init    = virtio_iommu_pci_class_init,
>>>> +};
>>>> +
>>>> +static void virtio_iommu_pci_register(void)
>>>> +{
>>>> +    virtio_pci_types_register(&virtio_iommu_pci_info);
>>>> +}
>>>> +
>>>> +type_init(virtio_iommu_pci_register)
>>>> +
>>>> +
>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>> index aaf1b9f70d..492ea7e68d 100644
>>>> --- a/include/hw/pci/pci.h
>>>> +++ b/include/hw/pci/pci.h
>>>> @@ -86,6 +86,7 @@ extern bool pci_available;
>>>>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
>>>>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
>>>>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
>>>> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
>>>>  
>>>>  #define PCI_VENDOR_ID_REDHAT             0x1b36
>>>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
>>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>>>> index 56c8b4e57f..893ac65c0b 100644
>>>> --- a/include/hw/virtio/virtio-iommu.h
>>>> +++ b/include/hw/virtio/virtio-iommu.h
>>>> @@ -25,6 +25,7 @@
>>>>  #include "hw/pci/pci.h"
>>>>  
>>>>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
>>>> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
>>>>  #define VIRTIO_IOMMU(obj) \
>>>>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
>>>>  
>>>> diff --git a/qdev-monitor.c b/qdev-monitor.c
>>>> index 58222c2211..74cf090c61 100644
>>>> --- a/qdev-monitor.c
>>>> +++ b/qdev-monitor.c
>>>> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
>>>>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
>>>>      { "virtio-input-host-pci", "virtio-input-host",
>>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>>> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>>>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
>>>>      { "virtio-keyboard-pci", "virtio-keyboard",
>>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>>> -- 
>>>> 2.20.1
>>>
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton Eric Auger
@ 2019-08-15 13:54   ` Peter Xu
  2019-08-29 12:18     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-15 13:54 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:25PM +0200, Eric Auger wrote:
> +static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
> +    struct virtio_iommu_req_head head;
> +    struct virtio_iommu_req_tail tail;

[1]

> +    VirtQueueElement *elem;
> +    unsigned int iov_cnt;
> +    struct iovec *iov;
> +    size_t sz;
> +
> +    for (;;) {
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            return;
> +        }
> +
> +        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
> +            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
> +            virtio_error(vdev, "virtio-iommu bad head/tail size");
> +            virtqueue_detach_element(vq, elem, 0);
> +            g_free(elem);
> +            break;
> +        }
> +
> +        iov_cnt = elem->out_num;
> +        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);

Could I ask why memdup is needed here?

> +        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
> +        if (unlikely(sz != sizeof(head))) {
> +            tail.status = VIRTIO_IOMMU_S_DEVERR;

Do you need to zero the reserved bits to make sure it won't contain
garbage?  Same question to below uses of tail.

> +            goto out;
> +        }
> +        qemu_mutex_lock(&s->mutex);
> +        switch (head.type) {
> +        case VIRTIO_IOMMU_T_ATTACH:
> +            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
> +            break;
> +        case VIRTIO_IOMMU_T_DETACH:
> +            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
> +            break;
> +        case VIRTIO_IOMMU_T_MAP:
> +            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
> +            break;
> +        case VIRTIO_IOMMU_T_UNMAP:
> +            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
> +            break;
> +        default:
> +            tail.status = VIRTIO_IOMMU_S_UNSUPP;
> +        }
> +        qemu_mutex_unlock(&s->mutex);
> +
> +out:
> +        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> +                          &tail, sizeof(tail));
> +        assert(sz == sizeof(tail));
> +
> +        virtqueue_push(vq, elem, sizeof(tail));

s/tail/head/ (though they are the same size)?

> +        virtio_notify(vdev, vq);
> +        g_free(elem);
> +    }
> +}

[...]

> +static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
> +{
> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> +
> +    dev->acked_features = val;
> +    trace_virtio_iommu_set_features(dev->acked_features);
> +}
> +
> +static const VMStateDescription vmstate_virtio_iommu_device = {
> +    .name = "virtio-iommu-device",
> +    .unmigratable = 1,

Curious, is there explicit reason to not support migration from the
first version? :)

> +};
> +
> +static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
> +
> +    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
> +                sizeof(struct virtio_iommu_config));
> +
> +    s->req_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
> +                             virtio_iommu_handle_command);
> +    s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
> +
> +    s->config.page_size_mask = TARGET_PAGE_MASK;
> +    s->config.input_range.end = -1UL;
> +    s->config.domain_range.start = 0;

Zero input_range.start = 0?  After all domain_range.start is zeroed.

> +    s->config.domain_range.end = 32;
> +
> +    virtio_add_feature(&s->features, VIRTIO_RING_F_EVENT_IDX);
> +    virtio_add_feature(&s->features, VIRTIO_RING_F_INDIRECT_DESC);
> +    virtio_add_feature(&s->features, VIRTIO_F_VERSION_1);
> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_INPUT_RANGE);
> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_DOMAIN_RANGE);
> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
> +}

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions Eric Auger
@ 2019-08-16  4:00   ` Peter Xu
  2019-08-29 12:51     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-16  4:00 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:27PM +0200, Eric Auger wrote:

[...]

>  static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> @@ -266,6 +333,15 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
> +
> +    memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
> +    s->as_by_busptr = g_hash_table_new(NULL, NULL);

VT-d was using g_hash_table_new_full() so that potentially VTDBus can
still be freed.  Here for IOMMUPCIBus allocated in
virtio_iommu_find_add_as() I think it'll be leaked if we remove
entries in the hash table?

So I started to wonder whether PCI/PCIe buses are allowed to be
plugged/unplugged after all because I never tried.  With latest
5.3.0-rc4 guest I gave it a shot and I see the error below.  It could
be something that I did wrong or it could be simply that it's not
working at all.  Have you tried anything like that?  Michael/Alex?

bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=on -smp 8 -m 2G -cpu host \
     -monitor telnet::6666,server,nowait -nographic \
     -device e1000,netdev=net0 \
     -netdev user,id=net0,hostfwd=tcp::5555-:22 \
     -device pcie-pci-bridge,bus=pcie.0,id=pci.1 \
     -drive file=/images/default.qcow2,if=none,cache=none,id=drive0 \
     -device virtio-blk,drive=drive0

(qemu) device_add pci-bridge,bus=pci.1,id=pci.2,chassis_nr=1,addr=1.0

[   66.172352] pci 0000:01:01.0: [1b36:0001] type 01 class 0x060400
[   66.176897] pci 0000:01:01.0: reg 0x10: [mem 0x00000000-0x000000ff 64bit]
[   66.186130] pci 0000:01:01.0: No bus number available for hot-added bridge
[   66.189489] shpchp 0000:00:03.0: BAR 14: assigned [mem 0x80000000-0x800fffff]
[   66.193235] pci 0000:01:01.0: BAR 0: assigned [mem 0x80000000-0x800000ff 64bit]
[   66.198587] shpchp 0000:00:03.0: PCI bridge to [bus 01]
[   66.204113] shpchp 0000:00:03.0:   bridge window [mem 0x80000000-0x800fffff]
[   66.215212] shpchp 0000:01:01.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[   66.218531] shpchp 0000:01:01.0: enabling device (0000 -> 0002)
[   66.229204] BUG: kernel NULL pointer dereference, address: 00000000000000e2
[   66.232124] #PF: supervisor write access in kernel mode
[   66.234369] #PF: error_code(0x0002) - not-present page
[   66.236585] PGD 0 P4D 0
[   66.237431] Oops: 0002 [#1] SMP PTI
[   66.238617] CPU: 2 PID: 277 Comm: kworker/2:1 Kdump: loaded Not tainted 5.3.0-rc4 #85
[   66.241200] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   66.244916] Workqueue: shpchp-1 shpchp_pushbutton_thread
[   66.246583] RIP: 0010:shpc_init.cold+0x5c3/0x8a1
[   66.248041] Code: 24 90 01 00 00 8b 49 08 40 80 fe 02 0f 85 f4 01 00 00 f7 c1 00 00 00 f0 0f 84 b2 01 00 00 b9 13 00 00 00 80 3d 33 40 38 02 00 <88> 8a e26
[   66.253771] RSP: 0018:ffffc9000025bb68 EFLAGS: 00010246
[   66.255418] RAX: 00000000000000ff RBX: 0000000000000000 RCX: 0000000000000000
[   66.257763] RDX: 0000000000000000 RSI: ffffffff826bcd01 RDI: ffffffff826bcd60
[   66.260065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   66.263184] R10: 0000000000000005 R11: 0000000000000000 R12: ffff888032425400
[   66.265706] R13: ffffc9000017109c R14: ffff888033da7000 R15: 000000000000001f
[   66.268200] FS:  0000000000000000(0000) GS:ffff88807fc80000(0000) knlGS:0000000000000000
[   66.270826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.272731] CR2: 00000000000000e2 CR3: 0000000033afc002 CR4: 0000000000360ee0
[   66.275373] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.277947] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.279965] Call Trace:
[   66.280627]  shpc_probe+0x91/0x32b
[   66.281644]  local_pci_probe+0x42/0x80
[   66.282752]  pci_device_probe+0x107/0x1a0
[   66.283877]  really_probe+0xf0/0x380
[   66.284862]  driver_probe_device+0x59/0xd0
[   66.285988]  ? driver_allows_async_probing+0x50/0x50
[   66.287937]  bus_for_each_drv+0x7e/0xc0
[   66.289752]  __device_attach+0xe1/0x160
[   66.292076]  pci_bus_add_device+0x4b/0x70
[   66.295244]  pci_bus_add_devices+0x2c/0x64
[   66.297429]  shpchp_configure_device+0xc1/0xe0
[   66.299692]  board_added+0x117/0x240
[   66.301589]  shpchp_enable_slot+0x121/0x2e0
[   66.303686]  shpchp_pushbutton_thread+0x70/0xa0
[   66.305941]  process_one_work+0x221/0x500
[   66.308253]  worker_thread+0x50/0x3b0
[   66.310512]  kthread+0xfb/0x130
[   66.312422]  ? process_one_work+0x500/0x500
[   66.314617]  ? kthread_park+0x80/0x80
[   66.316489]  ret_from_fork+0x3a/0x50
[   66.318293] Modules linked in: intel_rapl_msr intel_rapl_common kvm_intel kvm crct10dif_pclmul bochs_drm crc32_pclmul drm_vram_helper ghash_clmulni_intel o
[   66.331179] CR2: 00000000000000e2
[   66.333090] ---[ end trace cfc73b2e92e207d4 ]---
[   66.335431] RIP: 0010:shpc_init.cold+0x5c3/0x8a1
[   66.337790] Code: 24 90 01 00 00 8b 49 08 40 80 fe 02 0f 85 f4 01 00 00 f7 c1 00 00 00 f0 0f 84 b2 01 00 00 b9 13 00 00 00 80 3d 33 40 38 02 00 <88> 8a e26
[   66.346561] RSP: 0018:ffffc9000025bb68 EFLAGS: 00010246
[   66.348659] RAX: 00000000000000ff RBX: 0000000000000000 RCX: 0000000000000000
[   66.351412] RDX: 0000000000000000 RSI: ffffffff826bcd01 RDI: ffffffff826bcd60
[   66.354204] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   66.357013] R10: 0000000000000005 R11: 0000000000000000 R12: ffff888032425400
[   66.360117] R13: ffffc9000017109c R14: ffff888033da7000 R15: 000000000000001f
[   66.362953] FS:  0000000000000000(0000) GS:ffff88807fc80000(0000) knlGS:0000000000000000
[   66.366003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.368756] CR2: 00000000000000e2 CR3: 0000000033afc002 CR4: 0000000000360ee0
[   66.371769] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.376036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
@ 2019-08-16  4:17   ` Peter Xu
  2019-11-04 18:31   ` Jean-Philippe Brucker
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Xu @ 2019-08-16  4:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:28PM +0200, Eric Auger wrote:
>  static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -334,6 +444,8 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
>  
> +    qemu_mutex_init(&s->mutex);

It's a bit strange to init a mutex which has already been used in
patch 3. :)

Thanks,

> +
>      memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
>      s->as_by_busptr = g_hash_table_new(NULL, NULL);
>  
> @@ -342,11 +454,20 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>      } else {
>          error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
>      }
> +
> +    s->domains = g_tree_new_full((GCompareDataFunc)int_cmp,
> +                                 NULL, NULL, virtio_iommu_put_domain);
> +    s->endpoints = g_tree_new_full((GCompareDataFunc)int_cmp,
> +                                   NULL, NULL, virtio_iommu_put_endpoint);
>  }

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command Eric Auger
@ 2019-08-16  4:27   ` Peter Xu
  2019-08-29 14:24     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-16  4:27 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:29PM +0200, Eric Auger wrote:
> This patch implements the endpoint attach/detach to/from
> a domain.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> ---
>  hw/virtio/virtio-iommu.c | 40 ++++++++++++++++++++++++++++++++++------
>  1 file changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 77dccecc0a..5ea0930cc2 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -80,8 +80,8 @@ static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>      ep->domain = NULL;
>  }
>  
> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)

These lines were just introduced in previous patch, I wanted to ask
why the definition was needed but I don't know whether it'll be used
in follow up patches.  Looks like it wasn't really used.

I would prefer patches like these to be squashed together not only to
avoid the maintainance of diffs like this between patches, but also as
a reviewer it'll be easier too when with all the contexts together.
But I won't ask for it because it can be a personal preference only...

> +static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> +                                                  uint32_t ep_id)
>  {
>      viommu_endpoint *ep;
>  
> @@ -110,8 +110,8 @@ static void virtio_iommu_put_endpoint(gpointer data)
>      g_free(ep);
>  }
>  
> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
> +static viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s,
> +                                              uint32_t domain_id)
>  {
>      viommu_domain *domain;
>  
> @@ -187,10 +187,27 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>  {
>      uint32_t domain_id = le32_to_cpu(req->domain);
>      uint32_t ep_id = le32_to_cpu(req->endpoint);
> +    viommu_domain *domain;
> +    viommu_endpoint *ep;
>  
>      trace_virtio_iommu_attach(domain_id, ep_id);
>  
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    ep = virtio_iommu_get_endpoint(s, ep_id);
> +    if (ep->domain) {
> +        /*
> +         * the device is already attached to a domain,
> +         * detach it first
> +         */
> +        virtio_iommu_detach_endpoint_from_domain(ep);

Hmm... so this can be called without virtio_iommu_put_endpoint().
Then I think we'd better move:

        g_tree_unref(ep->domain->mappings);

From virtio_iommu_put_endpoint() to inside
virtio_iommu_detach_endpoint_from_domain() otherwise domain refs might
leak?

> +    }
> +
> +    domain = virtio_iommu_get_domain(s, domain_id);
> +    QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
> +
> +    ep->domain = domain;
> +    g_tree_ref(domain->mappings);
> +
> +    return VIRTIO_IOMMU_S_OK;
>  }

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap Eric Auger
@ 2019-08-19  8:11   ` Peter Xu
  2019-09-03 11:37     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-19  8:11 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:

[...]

> +    mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
> +
> +    while (mapping) {
> +        viommu_interval current;
> +        uint64_t low  = mapping->virt_addr;
> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> +
> +        current.low = low;
> +        current.high = high;
> +
> +        if (low == interval.low && size >= mapping->size) {
> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> +            interval.low = high + 1;
> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
> +                interval.low, interval.high);
> +        } else if (high == interval.high && size >= mapping->size) {
> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
> +                interval.low, interval.high);
> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> +            interval.high = low - 1;
> +        } else if (low > interval.low && high < interval.high) {
> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> +        } else {
> +            break;
> +        }
> +        if (interval.low >= interval.high) {
> +            return VIRTIO_IOMMU_S_OK;
> +        } else {
> +            mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
> +        }
> +    }
> +
> +    if (mapping) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
> +                     __func__, interval.low, size,
> +                     mapping->virt_addr, mapping->size);
> +    } else {
> +        return VIRTIO_IOMMU_S_OK;
> +    }
> +
> +    return VIRTIO_IOMMU_S_INVAL;

Could the above chunk be simplified as something like below?

  while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
    g_tree_remove(domain->mappings, mapping);
  }

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate Eric Auger
@ 2019-08-19  8:24   ` Peter Xu
  2019-09-03 11:45     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-19  8:24 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:31PM +0200, Eric Auger wrote:
> @@ -464,19 +464,75 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>                                              int iommu_idx)
>  {
>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> +    VirtIOIOMMU *s = sdev->viommu;
>      uint32_t sid;
> +    viommu_endpoint *ep;
> +    viommu_mapping *mapping;
> +    viommu_interval interval;
> +    bool bypass_allowed;
> +
> +    interval.low = addr;
> +    interval.high = addr + 1;
>  
>      IOMMUTLBEntry entry = {
>          .target_as = &address_space_memory,
>          .iova = addr,
>          .translated_addr = addr,
> -        .addr_mask = ~(hwaddr)0,
> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
>          .perm = IOMMU_NONE,
>      };
>  
> +    bypass_allowed = virtio_has_feature(s->acked_features,
> +                                        VIRTIO_IOMMU_F_BYPASS);
> +
>      sid = virtio_iommu_get_sid(sdev);
>  
>      trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
> +    qemu_mutex_lock(&s->mutex);
> +
> +    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
> +    if (!ep) {
> +        if (!bypass_allowed) {
> +            error_report("%s sid=%d is not known!!", __func__, sid);

Maybe use error_report_once() to avoid DOS attack?  Also would it be
good to unify the debug prints?  I see both error_report() and
qemu_log_mask() are used in the whole patchset.  Or is that attempted?

> +        } else {
> +            entry.perm = flag;
> +        }
> +        goto unlock;
> +    }
> +
> +    if (!ep->domain) {
> +        if (!bypass_allowed) {
> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                          "%s %02x:%02x.%01x not attached to any domain\n",
> +                          __func__, PCI_BUS_NUM(sid),
> +                          PCI_SLOT(sid), PCI_FUNC(sid));
> +        } else {
> +            entry.perm = flag;
> +        }
> +        goto unlock;
> +    }
> +
> +    mapping = g_tree_lookup(ep->domain->mappings, (gpointer)(&interval));
> +    if (!mapping) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "%s no mapping for 0x%"PRIx64" for sid=%d\n",
> +                      __func__, addr, sid);
> +        goto unlock;
> +    }
> +
> +    if (((flag & IOMMU_RO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
> +        ((flag & IOMMU_WO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
> +                      addr, flag, mapping->flags);
> +        goto unlock;
> +    }
> +    entry.translated_addr = addr - mapping->virt_addr + mapping->phys_addr;
> +    entry.perm = flag;
> +    trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
> +
> +unlock:
> +    qemu_mutex_unlock(&s->mutex);
>      return entry;
>  }
>  
> -- 
> 2.20.1
> 

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request Eric Auger
@ 2019-08-19 12:08   ` Peter Xu
  2019-09-03 12:23     ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-19 12:08 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:32PM +0200, Eric Auger wrote:

[...]

> +/* Fill the properties[] buffer with properties of type @type */
> +static int virtio_iommu_fill_property(int type,
> +                                      viommu_property_buffer *bufstate)
> +{
> +    int ret = -ENOSPC;
> +
> +    if (bufstate->filled + sizeof(struct virtio_iommu_probe_property)
> +            >= VIOMMU_PROBE_SIZE) {
> +        /* no space left for the header */
> +        bufstate->error = true;
> +        goto out;
> +    }
> +
> +    switch (type) {
> +    case VIRTIO_IOMMU_PROBE_T_NONE:
> +        ret = virtio_iommu_fill_none_prop(bufstate);
> +        break;
> +    case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> +    {
> +        viommu_endpoint *ep = bufstate->endpoint;
> +
> +        g_tree_foreach(ep->reserved_regions,
> +                       virtio_iommu_fill_resv_mem_prop,
> +                       bufstate);
> +        if (!bufstate->error) {
> +            ret = 0;
> +        }
> +        break;
> +    }
> +    default:
> +        ret = -ENOENT;
> +        break;
> +    }
> +out:
> +    if (ret) {
> +        error_report("%s property of type=%d could not be filled (%d),"
> +                     " remaining size = 0x%lx",
> +                     __func__, type, ret, bufstate->filled);

Nit: If this can really be triggered then we might still change it to
error_report_once()?  If it's not (which it seems to), maybe assert
directly?

Other than that it looks good to me:

Reviewed-by: Peter Xu <peterx@redhat.com>

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process Eric Auger
@ 2019-08-19 12:44   ` Peter Xu
  2019-09-01  6:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Xu @ 2019-08-19 12:44 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:35PM +0200, Eric Auger wrote:
> When translating an address we need to check if it belongs to
> a reserved virtual address range. If it does, there are 2 cases:
> 
> - it belongs to a RESERVED region: the guest should neither use
>   this address in a MAP not instruct the end-point to DMA on
>   them. We report an error
> 
> - It belongs to an MSI region: we bypass the translation.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton
  2019-08-15 13:54   ` Peter Xu
@ 2019-08-29 12:18     ` Auger Eric
  2019-08-30  1:26       ` Peter Xu
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-08-29 12:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,

First of all, please forgive me for the delay.
On 8/15/19 3:54 PM, Peter Xu wrote:
> On Tue, Jul 30, 2019 at 07:21:25PM +0200, Eric Auger wrote:
>> +static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
>> +{
>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
>> +    struct virtio_iommu_req_head head;
>> +    struct virtio_iommu_req_tail tail;
> 
> [1]
> 
>> +    VirtQueueElement *elem;
>> +    unsigned int iov_cnt;
>> +    struct iovec *iov;
>> +    size_t sz;
>> +
>> +    for (;;) {
>> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>> +        if (!elem) {
>> +            return;
>> +        }
>> +
>> +        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
>> +            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
>> +            virtio_error(vdev, "virtio-iommu bad head/tail size");
>> +            virtqueue_detach_element(vq, elem, 0);
>> +            g_free(elem);
>> +            break;
>> +        }
>> +
>> +        iov_cnt = elem->out_num;
>> +        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
> 
> Could I ask why memdup is needed here?
Indeed I don't think it is needed and besides iov is not freed!

I got inspired from hw/net/virtio-net.c. To be honest I don't get why
the g_memdup is needed there either. The out_sg gets duplicated and
commands work on the duplicated data and not in place.
> 
>> +        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
>> +        if (unlikely(sz != sizeof(head))) {
>> +            tail.status = VIRTIO_IOMMU_S_DEVERR;
> 
> Do you need to zero the reserved bits to make sure it won't contain
> garbage?  Same question to below uses of tail.
yes. I initialized tail.
> 
>> +            goto out;
>> +        }
>> +        qemu_mutex_lock(&s->mutex);
>> +        switch (head.type) {
>> +        case VIRTIO_IOMMU_T_ATTACH:
>> +            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
>> +            break;
>> +        case VIRTIO_IOMMU_T_DETACH:
>> +            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
>> +            break;
>> +        case VIRTIO_IOMMU_T_MAP:
>> +            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
>> +            break;
>> +        case VIRTIO_IOMMU_T_UNMAP:
>> +            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
>> +            break;
>> +        default:
>> +            tail.status = VIRTIO_IOMMU_S_UNSUPP;
>> +        }
>> +        qemu_mutex_unlock(&s->mutex);
>> +
>> +out:
>> +        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
>> +                          &tail, sizeof(tail));
>> +        assert(sz == sizeof(tail));
>> +
>> +        virtqueue_push(vq, elem, sizeof(tail));
> 
> s/tail/head/ (though they are the same size)?
That's unclear to me. Similarly when checking against virtio-net.c, the
element is pushed back to the used ring and len is set to the size of
the status with:

/*
 * Control virtqueue data structures
 *
 * The control virtqueue expects a header in the first sg entry
 * and an ack/status response in the last entry.  Data for the
 * command goes in between.
 */
> 
>> +        virtio_notify(vdev, vq);
>> +        g_free(elem);
>> +    }
>> +}
> 
> [...]
> 
>> +static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
>> +{
>> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
>> +
>> +    dev->acked_features = val;
>> +    trace_virtio_iommu_set_features(dev->acked_features);
>> +}
>> +
>> +static const VMStateDescription vmstate_virtio_iommu_device = {
>> +    .name = "virtio-iommu-device",
>> +    .unmigratable = 1,
> 
> Curious, is there explicit reason to not support migration from the
> first version? :)
The state is made of red black trees, lists. For the former there is no
VMSTATE* ready. I am working on it but I think this should be handled
separately
> 
>> +};
>> +
>> +static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>> +{
>> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>> +
>> +    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
>> +                sizeof(struct virtio_iommu_config));
>> +
>> +    s->req_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
>> +                             virtio_iommu_handle_command);
>> +    s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
>> +
>> +    s->config.page_size_mask = TARGET_PAGE_MASK;
>> +    s->config.input_range.end = -1UL;
>> +    s->config.domain_range.start = 0;
> 
> Zero input_range.start = 0?  After all domain_range.start is zeroed.
virtio_init does:
    if (vdev->config_len) {
        vdev->config = g_malloc0(config_size);

but I should be homogeneous and then remove s->config.domain_range.start
= 0;
> 
>> +    s->config.domain_range.end = 32;
>> +
>> +    virtio_add_feature(&s->features, VIRTIO_RING_F_EVENT_IDX);
>> +    virtio_add_feature(&s->features, VIRTIO_RING_F_INDIRECT_DESC);
>> +    virtio_add_feature(&s->features, VIRTIO_F_VERSION_1);
>> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_INPUT_RANGE);
>> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_DOMAIN_RANGE);
>> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
>> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
>> +    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
>> +}
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions
  2019-08-16  4:00   ` Peter Xu
@ 2019-08-29 12:51     ` Auger Eric
  0 siblings, 0 replies; 55+ messages in thread
From: Auger Eric @ 2019-08-29 12:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,

On 8/16/19 6:00 AM, Peter Xu wrote:
> On Tue, Jul 30, 2019 at 07:21:27PM +0200, Eric Auger wrote:
> 
> [...]
> 
>>  static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
>>  {
>>      VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
>> @@ -266,6 +333,15 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
>>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
>>      virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
>> +
>> +    memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
>> +    s->as_by_busptr = g_hash_table_new(NULL, NULL);
> 
> VT-d was using g_hash_table_new_full() so that potentially VTDBus can
> still be freed.  Here for IOMMUPCIBus allocated in
> virtio_iommu_find_add_as() I think it'll be leaked if we remove
> entries in the hash table?
> 
> So I started to wonder whether PCI/PCIe buses are allowed to be
> plugged/unplugged after all because I never tried.  With latest
> 5.3.0-rc4 guest I gave it a shot and I see the error below.  It could
> be something that I did wrong or it could be simply that it's not
> working at all.  Have you tried anything like that?  Michael/Alex?

I have never tried this on my end.

However looking at docs/pcie_pci_bridge.txt it seems possible to hotplug
a pcie_pci_bridge downstream to a pcie-root-port under specific
conditions (see limitations section). So I guess the situation you
describe may happen. I switched to _full version.

Thanks

Eric


> 
> bin=x86_64-softmmu/qemu-system-x86_64
> $bin -M q35,accel=kvm,kernel-irqchip=on -smp 8 -m 2G -cpu host \
>      -monitor telnet::6666,server,nowait -nographic \
>      -device e1000,netdev=net0 \
>      -netdev user,id=net0,hostfwd=tcp::5555-:22 \
>      -device pcie-pci-bridge,bus=pcie.0,id=pci.1 \
>      -drive file=/images/default.qcow2,if=none,cache=none,id=drive0 \
>      -device virtio-blk,drive=drive0
> 
> (qemu) device_add pci-bridge,bus=pci.1,id=pci.2,chassis_nr=1,addr=1.0
> 
> [   66.172352] pci 0000:01:01.0: [1b36:0001] type 01 class 0x060400
> [   66.176897] pci 0000:01:01.0: reg 0x10: [mem 0x00000000-0x000000ff 64bit]
> [   66.186130] pci 0000:01:01.0: No bus number available for hot-added bridge
> [   66.189489] shpchp 0000:00:03.0: BAR 14: assigned [mem 0x80000000-0x800fffff]
> [   66.193235] pci 0000:01:01.0: BAR 0: assigned [mem 0x80000000-0x800000ff 64bit]
> [   66.198587] shpchp 0000:00:03.0: PCI bridge to [bus 01]
> [   66.204113] shpchp 0000:00:03.0:   bridge window [mem 0x80000000-0x800fffff]
> [   66.215212] shpchp 0000:01:01.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
> [   66.218531] shpchp 0000:01:01.0: enabling device (0000 -> 0002)
> [   66.229204] BUG: kernel NULL pointer dereference, address: 00000000000000e2
> [   66.232124] #PF: supervisor write access in kernel mode
> [   66.234369] #PF: error_code(0x0002) - not-present page
> [   66.236585] PGD 0 P4D 0
> [   66.237431] Oops: 0002 [#1] SMP PTI
> [   66.238617] CPU: 2 PID: 277 Comm: kworker/2:1 Kdump: loaded Not tainted 5.3.0-rc4 #85
> [   66.241200] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [   66.244916] Workqueue: shpchp-1 shpchp_pushbutton_thread
> [   66.246583] RIP: 0010:shpc_init.cold+0x5c3/0x8a1
> [   66.248041] Code: 24 90 01 00 00 8b 49 08 40 80 fe 02 0f 85 f4 01 00 00 f7 c1 00 00 00 f0 0f 84 b2 01 00 00 b9 13 00 00 00 80 3d 33 40 38 02 00 <88> 8a e26
> [   66.253771] RSP: 0018:ffffc9000025bb68 EFLAGS: 00010246
> [   66.255418] RAX: 00000000000000ff RBX: 0000000000000000 RCX: 0000000000000000
> [   66.257763] RDX: 0000000000000000 RSI: ffffffff826bcd01 RDI: ffffffff826bcd60
> [   66.260065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [   66.263184] R10: 0000000000000005 R11: 0000000000000000 R12: ffff888032425400
> [   66.265706] R13: ffffc9000017109c R14: ffff888033da7000 R15: 000000000000001f
> [   66.268200] FS:  0000000000000000(0000) GS:ffff88807fc80000(0000) knlGS:0000000000000000
> [   66.270826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   66.272731] CR2: 00000000000000e2 CR3: 0000000033afc002 CR4: 0000000000360ee0
> [   66.275373] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   66.277947] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   66.279965] Call Trace:
> [   66.280627]  shpc_probe+0x91/0x32b
> [   66.281644]  local_pci_probe+0x42/0x80
> [   66.282752]  pci_device_probe+0x107/0x1a0
> [   66.283877]  really_probe+0xf0/0x380
> [   66.284862]  driver_probe_device+0x59/0xd0
> [   66.285988]  ? driver_allows_async_probing+0x50/0x50
> [   66.287937]  bus_for_each_drv+0x7e/0xc0
> [   66.289752]  __device_attach+0xe1/0x160
> [   66.292076]  pci_bus_add_device+0x4b/0x70
> [   66.295244]  pci_bus_add_devices+0x2c/0x64
> [   66.297429]  shpchp_configure_device+0xc1/0xe0
> [   66.299692]  board_added+0x117/0x240
> [   66.301589]  shpchp_enable_slot+0x121/0x2e0
> [   66.303686]  shpchp_pushbutton_thread+0x70/0xa0
> [   66.305941]  process_one_work+0x221/0x500
> [   66.308253]  worker_thread+0x50/0x3b0
> [   66.310512]  kthread+0xfb/0x130
> [   66.312422]  ? process_one_work+0x500/0x500
> [   66.314617]  ? kthread_park+0x80/0x80
> [   66.316489]  ret_from_fork+0x3a/0x50
> [   66.318293] Modules linked in: intel_rapl_msr intel_rapl_common kvm_intel kvm crct10dif_pclmul bochs_drm crc32_pclmul drm_vram_helper ghash_clmulni_intel o
> [   66.331179] CR2: 00000000000000e2
> [   66.333090] ---[ end trace cfc73b2e92e207d4 ]---
> [   66.335431] RIP: 0010:shpc_init.cold+0x5c3/0x8a1
> [   66.337790] Code: 24 90 01 00 00 8b 49 08 40 80 fe 02 0f 85 f4 01 00 00 f7 c1 00 00 00 f0 0f 84 b2 01 00 00 b9 13 00 00 00 80 3d 33 40 38 02 00 <88> 8a e26
> [   66.346561] RSP: 0018:ffffc9000025bb68 EFLAGS: 00010246
> [   66.348659] RAX: 00000000000000ff RBX: 0000000000000000 RCX: 0000000000000000
> [   66.351412] RDX: 0000000000000000 RSI: ffffffff826bcd01 RDI: ffffffff826bcd60
> [   66.354204] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [   66.357013] R10: 0000000000000005 R11: 0000000000000000 R12: ffff888032425400
> [   66.360117] R13: ffffc9000017109c R14: ffff888033da7000 R15: 000000000000001f
> [   66.362953] FS:  0000000000000000(0000) GS:ffff88807fc80000(0000) knlGS:0000000000000000
> [   66.366003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   66.368756] CR2: 00000000000000e2 CR3: 0000000033afc002 CR4: 0000000000360ee0
> [   66.371769] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   66.376036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command
  2019-08-16  4:27   ` Peter Xu
@ 2019-08-29 14:24     ` Auger Eric
  0 siblings, 0 replies; 55+ messages in thread
From: Auger Eric @ 2019-08-29 14:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,
On 8/16/19 6:27 AM, Peter Xu wrote:
> On Tue, Jul 30, 2019 at 07:21:29PM +0200, Eric Auger wrote:
>> This patch implements the endpoint attach/detach to/from
>> a domain.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> ---
>>  hw/virtio/virtio-iommu.c | 40 ++++++++++++++++++++++++++++++++++------
>>  1 file changed, 34 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 77dccecc0a..5ea0930cc2 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -80,8 +80,8 @@ static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>>      ep->domain = NULL;
>>  }
>>  
>> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
>> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
> 
> These lines were just introduced in previous patch, I wanted to ask
> why the definition was needed but I don't know whether it'll be used
> in follow up patches.  Looks like it wasn't really used.
> 
> I would prefer patches like these to be squashed together not only to
> avoid the maintainance of diffs like this between patches, but also as
> a reviewer it'll be easier too when with all the contexts together.
> But I won't ask for it because it can be a personal preference only...

Yes that's a tradeoff. I tried to split the series to ease the review;
helpers were introduced separately in previous patch but not yet used.
Here I introduce call sites and they become static.
> 
>> +static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>> +                                                  uint32_t ep_id)
>>  {
>>      viommu_endpoint *ep;
>>  
>> @@ -110,8 +110,8 @@ static void virtio_iommu_put_endpoint(gpointer data)
>>      g_free(ep);
>>  }
>>  
>> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
>> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
>> +static viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s,
>> +                                              uint32_t domain_id)
>>  {
>>      viommu_domain *domain;
>>  
>> @@ -187,10 +187,27 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>  {
>>      uint32_t domain_id = le32_to_cpu(req->domain);
>>      uint32_t ep_id = le32_to_cpu(req->endpoint);
>> +    viommu_domain *domain;
>> +    viommu_endpoint *ep;
>>  
>>      trace_virtio_iommu_attach(domain_id, ep_id);
>>  
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    ep = virtio_iommu_get_endpoint(s, ep_id);
>> +    if (ep->domain) {
>> +        /*
>> +         * the device is already attached to a domain,
>> +         * detach it first
>> +         */
>> +        virtio_iommu_detach_endpoint_from_domain(ep);
> 
> Hmm... so this can be called without virtio_iommu_put_endpoint().
> Then I think we'd better move:
> 
>         g_tree_unref(ep->domain->mappings);
> 
> From virtio_iommu_put_endpoint() to inside
> virtio_iommu_detach_endpoint_from_domain() otherwise domain refs might
> leak?

I agree with you. I Also removed g_tree_destroy from
virtio_iommu_put_domain as detaching all its end points should now do
the job.

Thanks

Eric
> 
>> +    }
>> +
>> +    domain = virtio_iommu_get_domain(s, domain_id);
>> +    QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
>> +
>> +    ep->domain = domain;
>> +    g_tree_ref(domain->mappings);
>> +
>> +    return VIRTIO_IOMMU_S_OK;
>>  }
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton
  2019-08-29 12:18     ` Auger Eric
@ 2019-08-30  1:26       ` Peter Xu
  2019-08-30  8:12         ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-08-30  1:26 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Thu, Aug 29, 2019 at 02:18:42PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> First of all, please forgive me for the delay.
> On 8/15/19 3:54 PM, Peter Xu wrote:
> > On Tue, Jul 30, 2019 at 07:21:25PM +0200, Eric Auger wrote:
> >> +static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
> >> +{
> >> +    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
> >> +    struct virtio_iommu_req_head head;
> >> +    struct virtio_iommu_req_tail tail;
> > 
> > [1]
> > 
> >> +    VirtQueueElement *elem;
> >> +    unsigned int iov_cnt;
> >> +    struct iovec *iov;
> >> +    size_t sz;
> >> +
> >> +    for (;;) {
> >> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> >> +        if (!elem) {
> >> +            return;
> >> +        }
> >> +
> >> +        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
> >> +            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
> >> +            virtio_error(vdev, "virtio-iommu bad head/tail size");
> >> +            virtqueue_detach_element(vq, elem, 0);
> >> +            g_free(elem);
> >> +            break;
> >> +        }
> >> +
> >> +        iov_cnt = elem->out_num;
> >> +        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
> > 
> > Could I ask why memdup is needed here?
> Indeed I don't think it is needed and besides iov is not freed!
> 
> I got inspired from hw/net/virtio-net.c. To be honest I don't get why
> the g_memdup is needed there either. The out_sg gets duplicated and
> commands work on the duplicated data and not in place.

Oh true, I found that it's because of calling of iov_discard_front().
Please have a look at 771b6ed37e3.  Though it seems to me that
virtio-iommu does not truncate iovs so it should not be needed.

> > 
> >> +        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
> >> +        if (unlikely(sz != sizeof(head))) {
> >> +            tail.status = VIRTIO_IOMMU_S_DEVERR;
> > 
> > Do you need to zero the reserved bits to make sure it won't contain
> > garbage?  Same question to below uses of tail.
> yes. I initialized tail.
> > 
> >> +            goto out;
> >> +        }
> >> +        qemu_mutex_lock(&s->mutex);
> >> +        switch (head.type) {
> >> +        case VIRTIO_IOMMU_T_ATTACH:
> >> +            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
> >> +            break;
> >> +        case VIRTIO_IOMMU_T_DETACH:
> >> +            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
> >> +            break;
> >> +        case VIRTIO_IOMMU_T_MAP:
> >> +            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
> >> +            break;
> >> +        case VIRTIO_IOMMU_T_UNMAP:
> >> +            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
> >> +            break;
> >> +        default:
> >> +            tail.status = VIRTIO_IOMMU_S_UNSUPP;
> >> +        }
> >> +        qemu_mutex_unlock(&s->mutex);
> >> +
> >> +out:
> >> +        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> >> +                          &tail, sizeof(tail));
> >> +        assert(sz == sizeof(tail));
> >> +
> >> +        virtqueue_push(vq, elem, sizeof(tail));
> > 
> > s/tail/head/ (though they are the same size)?
> That's unclear to me. Similarly when checking against virtio-net.c, the
> element is pushed back to the used ring and len is set to the size of
> the status with:
> 
> /*
>  * Control virtqueue data structures
>  *
>  * The control virtqueue expects a header in the first sg entry
>  * and an ack/status response in the last entry.  Data for the
>  * command goes in between.
>  */

I was referencing the balloon code when reading the patch, e.g.,
virtio_balloon_handle_output().  Though after I read more carefully I
see that other places are using it as you described.  Now I tend to
agree with you, because virtqueue_push() who calls
virtqueue_unmap_sg() used the len to unmap in_sg[] rather than
out_sg[].  So please ignore my previous comment.

(then I'm not sure whether the usage in the balloon code was correct
 now...)

> > 
> >> +        virtio_notify(vdev, vq);
> >> +        g_free(elem);
> >> +    }
> >> +}
> > 
> > [...]
> > 
> >> +static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
> >> +{
> >> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> >> +
> >> +    dev->acked_features = val;
> >> +    trace_virtio_iommu_set_features(dev->acked_features);
> >> +}
> >> +
> >> +static const VMStateDescription vmstate_virtio_iommu_device = {
> >> +    .name = "virtio-iommu-device",
> >> +    .unmigratable = 1,
> > 
> > Curious, is there explicit reason to not support migration from the
> > first version? :)
> The state is made of red black trees, lists. For the former there is no
> VMSTATE* ready. I am working on it but I think this should be handled
> separately

Fair enough.  Would you mind to add a similar comment above
unmigratable?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton
  2019-08-30  1:26       ` Peter Xu
@ 2019-08-30  8:12         ` Auger Eric
  0 siblings, 0 replies; 55+ messages in thread
From: Auger Eric @ 2019-08-30  8:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,
On 8/30/19 3:26 AM, Peter Xu wrote:
> On Thu, Aug 29, 2019 at 02:18:42PM +0200, Auger Eric wrote:
>> Hi Peter,
>>
>> First of all, please forgive me for the delay.
>> On 8/15/19 3:54 PM, Peter Xu wrote:
>>> On Tue, Jul 30, 2019 at 07:21:25PM +0200, Eric Auger wrote:
>>>> +static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
>>>> +{
>>>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
>>>> +    struct virtio_iommu_req_head head;
>>>> +    struct virtio_iommu_req_tail tail;
>>>
>>> [1]
>>>
>>>> +    VirtQueueElement *elem;
>>>> +    unsigned int iov_cnt;
>>>> +    struct iovec *iov;
>>>> +    size_t sz;
>>>> +
>>>> +    for (;;) {
>>>> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>>>> +        if (!elem) {
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
>>>> +            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
>>>> +            virtio_error(vdev, "virtio-iommu bad head/tail size");
>>>> +            virtqueue_detach_element(vq, elem, 0);
>>>> +            g_free(elem);
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        iov_cnt = elem->out_num;
>>>> +        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
>>>
>>> Could I ask why memdup is needed here?
>> Indeed I don't think it is needed and besides iov is not freed!
>>
>> I got inspired from hw/net/virtio-net.c. To be honest I don't get why
>> the g_memdup is needed there either. The out_sg gets duplicated and
>> commands work on the duplicated data and not in place.
> 
> Oh true, I found that it's because of calling of iov_discard_front().
> Please have a look at 771b6ed37e3.  Though it seems to me that
> virtio-iommu does not truncate iovs so it should not be needed.

thanks for the sha1. indeed virtio-iommu does not use iov_discard_front
so I shouldn't need it.
> 
>>>
>>>> +        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
>>>> +        if (unlikely(sz != sizeof(head))) {
>>>> +            tail.status = VIRTIO_IOMMU_S_DEVERR;
>>>
>>> Do you need to zero the reserved bits to make sure it won't contain
>>> garbage?  Same question to below uses of tail.
>> yes. I initialized tail.
>>>
>>>> +            goto out;
>>>> +        }
>>>> +        qemu_mutex_lock(&s->mutex);
>>>> +        switch (head.type) {
>>>> +        case VIRTIO_IOMMU_T_ATTACH:
>>>> +            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
>>>> +            break;
>>>> +        case VIRTIO_IOMMU_T_DETACH:
>>>> +            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
>>>> +            break;
>>>> +        case VIRTIO_IOMMU_T_MAP:
>>>> +            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
>>>> +            break;
>>>> +        case VIRTIO_IOMMU_T_UNMAP:
>>>> +            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
>>>> +            break;
>>>> +        default:
>>>> +            tail.status = VIRTIO_IOMMU_S_UNSUPP;
>>>> +        }
>>>> +        qemu_mutex_unlock(&s->mutex);
>>>> +
>>>> +out:
>>>> +        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
>>>> +                          &tail, sizeof(tail));
>>>> +        assert(sz == sizeof(tail));
>>>> +
>>>> +        virtqueue_push(vq, elem, sizeof(tail));
>>>
>>> s/tail/head/ (though they are the same size)?
>> That's unclear to me. Similarly when checking against virtio-net.c, the
>> element is pushed back to the used ring and len is set to the size of
>> the status with:
>>
>> /*
>>  * Control virtqueue data structures
>>  *
>>  * The control virtqueue expects a header in the first sg entry
>>  * and an ack/status response in the last entry.  Data for the
>>  * command goes in between.
>>  */
> 
> I was referencing the balloon code when reading the patch, e.g.,
> virtio_balloon_handle_output().  Though after I read more carefully I
> see that other places are using it as you described.  Now I tend to
> agree with you, because virtqueue_push() who calls
> virtqueue_unmap_sg() used the len to unmap in_sg[] rather than
> out_sg[].  So please ignore my previous comment.

OK
> 
> (then I'm not sure whether the usage in the balloon code was correct
>  now...)
> 
>>>
>>>> +        virtio_notify(vdev, vq);
>>>> +        g_free(elem);
>>>> +    }
>>>> +}
>>>
>>> [...]
>>>
>>>> +static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
>>>> +{
>>>> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
>>>> +
>>>> +    dev->acked_features = val;
>>>> +    trace_virtio_iommu_set_features(dev->acked_features);
>>>> +}
>>>> +
>>>> +static const VMStateDescription vmstate_virtio_iommu_device = {
>>>> +    .name = "virtio-iommu-device",
>>>> +    .unmigratable = 1,
>>>
>>> Curious, is there explicit reason to not support migration from the
>>> first version? :)
>> The state is made of red black trees, lists. For the former there is no
>> VMSTATE* ready. I am working on it but I think this should be handled
>> separately
> 
> Fair enough.  Would you mind to add a similar comment above
> unmigratable?
sure

Thanks!

Eric
> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process Eric Auger
  2019-08-19 12:44   ` Peter Xu
@ 2019-09-01  6:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-09-01  6:38 UTC (permalink / raw)
  To: Eric Auger
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Tue, Jul 30, 2019 at 07:21:35PM +0200, Eric Auger wrote:
> When translating an address we need to check if it belongs to
> a reserved virtual address range. If it does, there are 2 cases:
> 
> - it belongs to a RESERVED region: the guest should neither use
>   this address in a MAP not instruct the end-point to DMA on
>   them. We report an error
> 
> - It belongs to an MSI region: we bypass the translation.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

something wrong with the subject here.

> ---
> 
> v9 -> v10:
> - in case of MSI region, we immediatly return
> ---
>  hw/virtio/virtio-iommu.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 8e54a17227..20d92b7ab0 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -711,6 +711,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>      viommu_interval interval;
>      bool bypass_allowed;
>      bool read_fault, write_fault;
> +    struct virtio_iommu_probe_resv_mem *reg;
>  
>      interval.low = addr;
>      interval.high = addr + 1;
> @@ -743,6 +744,21 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>          goto unlock;
>      }
>  
> +    reg = g_tree_lookup(ep->reserved_regions, (gpointer)(&interval));
> +    if (reg) {
> +        switch (reg->subtype) {
> +        case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> +            entry.perm = flag;
> +            return entry;
> +        case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> +        default:
> +            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
> +                                      0, sid, addr);
> +            break;
> +        }
> +        goto unlock;
> +    }
> +
>      if (!ep->domain) {
>          if (!bypass_allowed) {
>              qemu_log_mask(LOG_GUEST_ERROR,
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-08-01 13:49         ` Auger Eric
@ 2019-09-01  6:40           ` Michael S. Tsirkin
  2019-09-04 14:19             ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-09-01  6:40 UTC (permalink / raw)
  To: Auger Eric
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Thu, Aug 01, 2019 at 03:49:37PM +0200, Auger Eric wrote:
> Hi Michael,
> 
> On 8/1/19 3:06 PM, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 02:15:03PM +0200, Auger Eric wrote:
> >> Hi Michael,
> >>
> >> On 7/30/19 9:35 PM, Michael S. Tsirkin wrote:
> >>> On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
> >>>> This patch adds virtio-iommu-pci, which is the pci proxy for
> >>>> the virtio-iommu device.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>
> >>> This part I'm not sure we should merge just yet.  The reason being I
> >>> think we should limit it to mmio where DT can be used to describe iommu
> >>> topology. For PCI I don't see why we shouldn't always expose this
> >>> in the config space, and I think it's preferable not to
> >>> need to support a mix of DT,ACPI and PCI as options.
> >>
> >> For context, some discussion related to this topic already arose on v7
> >> revision of the driver:
> >>
> >> [1] Re: [PATCH v7 0/7] Add virtio-iommu driver
> >> https://lore.kernel.org/linux-pci/87a7ioby9u.fsf@morokweng.localdomain/
> >>
> >> Some additional thoughts.
> >>
> >> First considering DT boot.
> >>
> >> THE DT description features an iommu-map property in the
> >> pci-host-ecam-generic node that describes which RIDs are handled by the
> >> virtio-iommu and a possible offset/mask to be applied inbetween the RID
> >> and the streamID at the input of the IOMMU
> >> (Documentation/devicetree/bindings/pci/pci-iommu.txt)
> >>
> >> As far as I understand when a DMA capable device is setup, its DMA
> >> configuration is built using that call chain:
> >>
> >> pci_dma_configure
> >> |_ of_dma_configure
> >>    |_ of_iommu_configure
> >>       |_ of_pci_iommu_init
> >>          |_ of_map_rid
> >>
> >> I understand you would like the iommu-map/iommu-map-mask info to be
> >> exposed directly into the config space of the device instead of inside
> >> the DT or IORT table. Assuming a module is initialized sufficiently
> >> early to retrieve this info, we would need the resulting info to be
> >> consolidated to allow pci_dma_configure chain to work seemlessly. This
> >> sounds a significant impact on above kernel infrastructure.
> > 
> > I don't really know what consolidated means.
> > It is pretty common for IOMMUs to expose config through
> > PCI registers. This typically happens as a fixup.
> I meant: instead of retrieving the info through the of_* code you need
> to interoperate with the module to retrieve the same info and detect
> when you need to take that path instead of the of one.

The way to do it would be with a quirk,
and the quirk would not be part of the
virtio module - it can poke at the device using
virtio_pci_cfg_cap.

> > 
> > I would write a tiny driver to do exactly that,
> > and run it from the fixup.
> > 
> > 
> >> This comes in addition to the development of the "small module that
> >> loads early and pokes at the IOMMU sufficiently to get the data about
> >> which devices use the IOMMU out of it using standard virtio config
> >> space" evoked in [1] + the definition of the data formats to be put in
> >> the very cfg space.
> > 
> > That last part is true but that's exactly why I propose we
> > wait on this patch a bit.
> > 
> >> With ACPI I understand we have the same kind of infrastructure:
> >> drivers/acpi/arm64/iort.c currently extracts the mapping between RC RIDs
> >> and IOMMU streamids
> >>
> >> pci_dma_configure(
> >> |_ acpi_dma_configure
> >>    |_ iort_iommu_configure
> >>       |_ iort_pci_iommu_init
> >>          |_ iort_node_map_id
> >>             |_ iort_id_map
> >>
> >> Maybe I fail to see the easy and right way to do the integration at
> >> kernel level but I am a bit frightened by the efforts that would be
> >> requested to follow your suggestion, whereas the DT infra is ready and
> >> fully upstreamed to accept the use case.
> > 
> > Did you take a look at drivers/pci/quirks.c and how these run?
> > I think it's just a question of adding DECLARE_PCI_FIXUP_CLASS_EARLY
> > and running your hook from there.
> I will do and trace the code.
> > 
> > 
> >> For ACPI I agree AFAIK IORT was primarily defined by ARM, for ARM but we
> >> prototyped IORT integration with x86 and it worked for pc machine
> >> without major trouble.
> >>
> >> I sent the kernel and qemu patches prototyping this IORT integration:
> >>
> >> https://github.com/eauger/linux/tree/virtio-iommu-v0.9-iort-x86
> >> https://github.com/eauger/qemu/tree/v3.1.0-rc3-virtio-iommu-v0.9-x86
> >>
> >> There ACPI IORT was built for PC machine and the integration effort at
> >> both kernel and QEMU level was low. This work would need to be rebased
> >> and depends on kernel ACPI related patches that are not yet upstreamed
> >> though.
> >>
> >> Thanks
> >>
> >> Eric
> > 
> > In the end it might turn out you are right.  But it does us no harm to
> > delay this just a bit, and for now limit things to ARM where it's
> > already used and where alternatives exist.
> So if my understanding is correct, at the moment you would accept a DT
> integration using MMIO. Is that correct? Meanwhile we can prototype your
> suggestion.
> 
> Thanks
> 
> Eric

Right.

> > 
> > 
> >>>
> >>>> ---
> >>>>
> >>>> v8 -> v9:
> >>>> - add the msi-bypass property
> >>>> - create virtio-iommu-pci.c
> >>>> ---
> >>>>  hw/virtio/Makefile.objs          |  1 +
> >>>>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
> >>>>  include/hw/pci/pci.h             |  1 +
> >>>>  include/hw/virtio/virtio-iommu.h |  1 +
> >>>>  qdev-monitor.c                   |  1 +
> >>>>  5 files changed, 92 insertions(+)
> >>>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
> >>>>
> >>>> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> >>>> index f42e4dd94f..80ca719f1c 100644
> >>>> --- a/hw/virtio/Makefile.objs
> >>>> +++ b/hw/virtio/Makefile.objs
> >>>> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
> >>>>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
> >>>>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
> >>>>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
> >>>> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
> >>>>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
> >>>>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
> >>>>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
> >>>> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> >>>> new file mode 100644
> >>>> index 0000000000..f9977096bd
> >>>> --- /dev/null
> >>>> +++ b/hw/virtio/virtio-iommu-pci.c
> >>>> @@ -0,0 +1,88 @@
> >>>> +/*
> >>>> + * Virtio IOMMU PCI Bindings
> >>>> + *
> >>>> + * Copyright (c) 2019 Red Hat, Inc.
> >>>> + * Written by Eric Auger
> >>>> + *
> >>>> + *  This program is free software; you can redistribute it and/or modify
> >>>> + *  it under the terms of the GNU General Public License version 2 or
> >>>> + *  (at your option) any later version.
> >>>> + */
> >>>> +
> >>>> +#include "qemu/osdep.h"
> >>>> +
> >>>> +#include "virtio-pci.h"
> >>>> +#include "hw/virtio/virtio-iommu.h"
> >>>> +
> >>>> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
> >>>> +
> >>>> +/*
> >>>> + * virtio-iommu-pci: This extends VirtioPCIProxy.
> >>>> + *
> >>>> + */
> >>>> +#define VIRTIO_IOMMU_PCI(obj) \
> >>>> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
> >>>> +
> >>>> +struct VirtIOIOMMUPCI {
> >>>> +    VirtIOPCIProxy parent_obj;
> >>>> +    VirtIOIOMMU vdev;
> >>>> +};
> >>>> +
> >>>> +static Property virtio_iommu_pci_properties[] = {
> >>>> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> >>>> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
> >>>> +    DEFINE_PROP_END_OF_LIST(),
> >>>> +};
> >>>> +
> >>>> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >>>> +{
> >>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
> >>>> +    DeviceState *vdev = DEVICE(&dev->vdev);
> >>>> +
> >>>> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> >>>> +    object_property_set_link(OBJECT(dev),
> >>>> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
> >>>> +                             "primary-bus", errp);
> >>>> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> >>>> +}
> >>>> +
> >>>> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
> >>>> +{
> >>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >>>> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> >>>> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> >>>> +    k->realize = virtio_iommu_pci_realize;
> >>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >>>> +    dc->props = virtio_iommu_pci_properties;
> >>>> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> >>>> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
> >>>> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> >>>> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> >>>> +}
> >>>> +
> >>>> +static void virtio_iommu_pci_instance_init(Object *obj)
> >>>> +{
> >>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
> >>>> +
> >>>> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> >>>> +                                TYPE_VIRTIO_IOMMU);
> >>>> +}
> >>>> +
> >>>> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
> >>>> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
> >>>> +    .generic_name          = "virtio-iommu-pci",
> >>>> +    .transitional_name     = "virtio-iommu-pci-transitional",
> >>>> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
> >>>> +    .instance_size = sizeof(VirtIOIOMMUPCI),
> >>>> +    .instance_init = virtio_iommu_pci_instance_init,
> >>>> +    .class_init    = virtio_iommu_pci_class_init,
> >>>> +};
> >>>> +
> >>>> +static void virtio_iommu_pci_register(void)
> >>>> +{
> >>>> +    virtio_pci_types_register(&virtio_iommu_pci_info);
> >>>> +}
> >>>> +
> >>>> +type_init(virtio_iommu_pci_register)
> >>>> +
> >>>> +
> >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>>> index aaf1b9f70d..492ea7e68d 100644
> >>>> --- a/include/hw/pci/pci.h
> >>>> +++ b/include/hw/pci/pci.h
> >>>> @@ -86,6 +86,7 @@ extern bool pci_available;
> >>>>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> >>>>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> >>>>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> >>>> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
> >>>>  
> >>>>  #define PCI_VENDOR_ID_REDHAT             0x1b36
> >>>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> >>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> >>>> index 56c8b4e57f..893ac65c0b 100644
> >>>> --- a/include/hw/virtio/virtio-iommu.h
> >>>> +++ b/include/hw/virtio/virtio-iommu.h
> >>>> @@ -25,6 +25,7 @@
> >>>>  #include "hw/pci/pci.h"
> >>>>  
> >>>>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
> >>>> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
> >>>>  #define VIRTIO_IOMMU(obj) \
> >>>>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
> >>>>  
> >>>> diff --git a/qdev-monitor.c b/qdev-monitor.c
> >>>> index 58222c2211..74cf090c61 100644
> >>>> --- a/qdev-monitor.c
> >>>> +++ b/qdev-monitor.c
> >>>> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
> >>>>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
> >>>>      { "virtio-input-host-pci", "virtio-input-host",
> >>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>>> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>>>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
> >>>>      { "virtio-keyboard-pci", "virtio-keyboard",
> >>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>>> -- 
> >>>> 2.20.1
> >>>
> > 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-08-19  8:11   ` Peter Xu
@ 2019-09-03 11:37     ` Auger Eric
  2019-09-04  1:44       ` Peter Xu
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-09-03 11:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,

On 8/19/19 10:11 AM, Peter Xu wrote:
> On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
> 
> [...]
> 
>> +    mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
>> +
>> +    while (mapping) {
>> +        viommu_interval current;
>> +        uint64_t low  = mapping->virt_addr;
>> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
>> +
>> +        current.low = low;
>> +        current.high = high;
>> +
>> +        if (low == interval.low && size >= mapping->size) {
>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
>> +            interval.low = high + 1;
>> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
>> +                interval.low, interval.high);
>> +        } else if (high == interval.high && size >= mapping->size) {
>> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
>> +                interval.low, interval.high);
>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
>> +            interval.high = low - 1;
>> +        } else if (low > interval.low && high < interval.high) {
>> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
>> +        } else {
>> +            break;
>> +        }
>> +        if (interval.low >= interval.high) {
>> +            return VIRTIO_IOMMU_S_OK;
>> +        } else {
>> +            mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
>> +        }
>> +    }
>> +
>> +    if (mapping) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
>> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
>> +                     __func__, interval.low, size,
>> +                     mapping->virt_addr, mapping->size);
>> +    } else {
>> +        return VIRTIO_IOMMU_S_OK;
>> +    }
>> +
>> +    return VIRTIO_IOMMU_S_INVAL;
> 
> Could the above chunk be simplified as something like below?
> 
>   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
>     g_tree_remove(domain->mappings, mapping);
>   }
Indeed the code could be simplified. I only need to make sure I don't
split an existing mapping.

Also I needed to use g_tree_lookup_extended to retrieve the actual key
to remove. The usage of g_tree_lookup_extended() allows me to remove the
virt_addr and size fields from the mapping value value struct as those
info can be retrieved from the key.

Thanks!

Eric
> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate
  2019-08-19  8:24   ` Peter Xu
@ 2019-09-03 11:45     ` Auger Eric
  2019-09-04  1:58       ` Peter Xu
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-09-03 11:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,

On 8/19/19 10:24 AM, Peter Xu wrote:
> On Tue, Jul 30, 2019 at 07:21:31PM +0200, Eric Auger wrote:
>> @@ -464,19 +464,75 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>>                                              int iommu_idx)
>>  {
>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
>> +    VirtIOIOMMU *s = sdev->viommu;
>>      uint32_t sid;
>> +    viommu_endpoint *ep;
>> +    viommu_mapping *mapping;
>> +    viommu_interval interval;
>> +    bool bypass_allowed;
>> +
>> +    interval.low = addr;
>> +    interval.high = addr + 1;
>>  
>>      IOMMUTLBEntry entry = {
>>          .target_as = &address_space_memory,
>>          .iova = addr,
>>          .translated_addr = addr,
>> -        .addr_mask = ~(hwaddr)0,
>> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
>>          .perm = IOMMU_NONE,
>>      };
>>  
>> +    bypass_allowed = virtio_has_feature(s->acked_features,
>> +                                        VIRTIO_IOMMU_F_BYPASS);
>> +
>>      sid = virtio_iommu_get_sid(sdev);
>>  
>>      trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
>> +    qemu_mutex_lock(&s->mutex);
>> +
>> +    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
>> +    if (!ep) {
>> +        if (!bypass_allowed) {
>> +            error_report("%s sid=%d is not known!!", __func__, sid);
> 
> Maybe use error_report_once() to avoid DOS attack?  Also would it be
> good to unify the debug prints?  I see both error_report() and
> qemu_log_mask() are used in the whole patchset.  Or is that attempted?

I switched to error_report_once()

I understand that qemu_log_mask() should be used whenever the root cause
is a bad action of the guest OS (in below case, the EP was not attached
to any domain). Above, there is an EP that attempts to talk through the
IOMMU and this was not expected (rather a platform description issue or
a qemu bug).

Thanks

Eric
> 
>> +        } else {
>> +            entry.perm = flag;
>> +        }
>> +        goto unlock;
>> +    }
>> +
>> +    if (!ep->domain) {
>> +        if (!bypass_allowed) {
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                          "%s %02x:%02x.%01x not attached to any domain\n",
>> +                          __func__, PCI_BUS_NUM(sid),
>> +                          PCI_SLOT(sid), PCI_FUNC(sid));
>> +        } else {
>> +            entry.perm = flag;
>> +        }
>> +        goto unlock;
>> +    }
>> +
>> +    mapping = g_tree_lookup(ep->domain->mappings, (gpointer)(&interval));
>> +    if (!mapping) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "%s no mapping for 0x%"PRIx64" for sid=%d\n",
>> +                      __func__, addr, sid);
>> +        goto unlock;
>> +    }
>> +
>> +    if (((flag & IOMMU_RO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
>> +        ((flag & IOMMU_WO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
>> +                      addr, flag, mapping->flags);
>> +        goto unlock;
>> +    }
>> +    entry.translated_addr = addr - mapping->virt_addr + mapping->phys_addr;
>> +    entry.perm = flag;
>> +    trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
>> +
>> +unlock:
>> +    qemu_mutex_unlock(&s->mutex);
>>      return entry;
>>  }
>>  
>> -- 
>> 2.20.1
>>
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request
  2019-08-19 12:08   ` Peter Xu
@ 2019-09-03 12:23     ` Auger Eric
  0 siblings, 0 replies; 55+ messages in thread
From: Auger Eric @ 2019-09-03 12:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

Hi Peter,

On 8/19/19 2:08 PM, Peter Xu wrote:
> On Tue, Jul 30, 2019 at 07:21:32PM +0200, Eric Auger wrote:
> 
> [...]
> 
>> +/* Fill the properties[] buffer with properties of type @type */
>> +static int virtio_iommu_fill_property(int type,
>> +                                      viommu_property_buffer *bufstate)
>> +{
>> +    int ret = -ENOSPC;
>> +
>> +    if (bufstate->filled + sizeof(struct virtio_iommu_probe_property)
>> +            >= VIOMMU_PROBE_SIZE) {
>> +        /* no space left for the header */
>> +        bufstate->error = true;
>> +        goto out;
>> +    }
>> +
>> +    switch (type) {
>> +    case VIRTIO_IOMMU_PROBE_T_NONE:
>> +        ret = virtio_iommu_fill_none_prop(bufstate);
>> +        break;
>> +    case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>> +    {
>> +        viommu_endpoint *ep = bufstate->endpoint;
>> +
>> +        g_tree_foreach(ep->reserved_regions,
>> +                       virtio_iommu_fill_resv_mem_prop,
>> +                       bufstate);
>> +        if (!bufstate->error) {
>> +            ret = 0;
>> +        }
>> +        break;
>> +    }
>> +    default:
>> +        ret = -ENOENT;
>> +        break;
>> +    }
>> +out:
>> +    if (ret) {
>> +        error_report("%s property of type=%d could not be filled (%d),"
>> +                     " remaining size = 0x%lx",
>> +                     __func__, type, ret, bufstate->filled);
> 
> Nit: If this can really be triggered then we might still change it to
> error_report_once()?  If it's not (which it seems to), maybe assert
> directly?
I put error_report_once() at the moment. The reserved regions may be
passed through cfg or device properties. I think it may happen that
their size get larger than the size set in the device config.


> 
> Other than that it looks good to me:
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thank you for the review!

Best Regards

Eric
> 
> Regards,
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-09-03 11:37     ` Auger Eric
@ 2019-09-04  1:44       ` Peter Xu
  2019-09-04  4:23         ` Tian, Kevin
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-09-04  1:44 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Sep 03, 2019 at 01:37:11PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 8/19/19 10:11 AM, Peter Xu wrote:
> > On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
> > 
> > [...]
> > 
> >> +    mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
> >> +
> >> +    while (mapping) {
> >> +        viommu_interval current;
> >> +        uint64_t low  = mapping->virt_addr;
> >> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> >> +
> >> +        current.low = low;
> >> +        current.high = high;
> >> +
> >> +        if (low == interval.low && size >= mapping->size) {
> >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> >> +            interval.low = high + 1;
> >> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
> >> +                interval.low, interval.high);
> >> +        } else if (high == interval.high && size >= mapping->size) {
> >> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
> >> +                interval.low, interval.high);
> >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> >> +            interval.high = low - 1;
> >> +        } else if (low > interval.low && high < interval.high) {
> >> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
> >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> >> +        } else {
> >> +            break;
> >> +        }
> >> +        if (interval.low >= interval.high) {
> >> +            return VIRTIO_IOMMU_S_OK;
> >> +        } else {
> >> +            mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
> >> +        }
> >> +    }
> >> +
> >> +    if (mapping) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR,
> >> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> >> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
> >> +                     __func__, interval.low, size,
> >> +                     mapping->virt_addr, mapping->size);
> >> +    } else {
> >> +        return VIRTIO_IOMMU_S_OK;
> >> +    }
> >> +
> >> +    return VIRTIO_IOMMU_S_INVAL;
> > 
> > Could the above chunk be simplified as something like below?
> > 
> >   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
> >     g_tree_remove(domain->mappings, mapping);
> >   }
> Indeed the code could be simplified. I only need to make sure I don't
> split an existing mapping.

Hmm... Do we need to still split an existing mapping if necessary?
For example when with this mapping:

  iova=0x1000, size=0x2000, phys=ADDR1, flags=FLAGS1

And if we want to unmap the range (iova=0, size=0x2000), then we
should split the existing mappping and leave this one:

  iova=0x2000, size=0x1000, phys=(ADDR1+0x1000), flags=FLAGS1

Right?

> 
> Also I needed to use g_tree_lookup_extended to retrieve the actual key
> to remove. The usage of g_tree_lookup_extended() allows me to remove the
> virt_addr and size fields from the mapping value value struct as those
> info can be retrieved from the key.

True.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate
  2019-09-03 11:45     ` Auger Eric
@ 2019-09-04  1:58       ` Peter Xu
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Xu @ 2019-09-04  1:58 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Tue, Sep 03, 2019 at 01:45:22PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 8/19/19 10:24 AM, Peter Xu wrote:
> > On Tue, Jul 30, 2019 at 07:21:31PM +0200, Eric Auger wrote:
> >> @@ -464,19 +464,75 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> >>                                              int iommu_idx)
> >>  {
> >>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> >> +    VirtIOIOMMU *s = sdev->viommu;
> >>      uint32_t sid;
> >> +    viommu_endpoint *ep;
> >> +    viommu_mapping *mapping;
> >> +    viommu_interval interval;
> >> +    bool bypass_allowed;
> >> +
> >> +    interval.low = addr;
> >> +    interval.high = addr + 1;
> >>  
> >>      IOMMUTLBEntry entry = {
> >>          .target_as = &address_space_memory,
> >>          .iova = addr,
> >>          .translated_addr = addr,
> >> -        .addr_mask = ~(hwaddr)0,
> >> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
> >>          .perm = IOMMU_NONE,
> >>      };
> >>  
> >> +    bypass_allowed = virtio_has_feature(s->acked_features,
> >> +                                        VIRTIO_IOMMU_F_BYPASS);
> >> +
> >>      sid = virtio_iommu_get_sid(sdev);
> >>  
> >>      trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
> >> +    qemu_mutex_lock(&s->mutex);
> >> +
> >> +    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
> >> +    if (!ep) {
> >> +        if (!bypass_allowed) {
> >> +            error_report("%s sid=%d is not known!!", __func__, sid);
> > 
> > Maybe use error_report_once() to avoid DOS attack?  Also would it be
> > good to unify the debug prints?  I see both error_report() and
> > qemu_log_mask() are used in the whole patchset.  Or is that attempted?
> 
> I switched to error_report_once()
> 
> I understand that qemu_log_mask() should be used whenever the root cause
> is a bad action of the guest OS (in below case, the EP was not attached
> to any domain). Above, there is an EP that attempts to talk through the
> IOMMU and this was not expected (rather a platform description issue or
> a qemu bug).

I see. It's a bit unclear at least to me on how to use these.  I have
seen, and used error_report*() to report guest misbehaves as well just
for the debugging and triaging simply because error_report*() will
always be there even without "-d" (because when issue happens most
users are without it...).  Then with these information captured by
either libvirt or direct QEMU users we can triage guest bugs easier.
I hope I'm not severly wrong, and please feel free to use
qemu_log_mask() no matter what.

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-09-04  1:44       ` Peter Xu
@ 2019-09-04  4:23         ` Tian, Kevin
  2019-09-04  5:37           ` Peter Xu
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2019-09-04  4:23 UTC (permalink / raw)
  To: Peter Xu, Auger Eric
  Cc: peter.maydell, mst, tn, qemu-devel, alex.williamson, qemu-arm,
	jean-philippe, bharat.bhushan, eric.auger.pro

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Wednesday, September 4, 2019 9:44 AM
> 
> On Tue, Sep 03, 2019 at 01:37:11PM +0200, Auger Eric wrote:
> > Hi Peter,
> >
> > On 8/19/19 10:11 AM, Peter Xu wrote:
> > > On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
> > >
> > > [...]
> > >
> > >> +    mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
> > >> +
> > >> +    while (mapping) {
> > >> +        viommu_interval current;
> > >> +        uint64_t low  = mapping->virt_addr;
> > >> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> > >> +
> > >> +        current.low = low;
> > >> +        current.high = high;
> > >> +
> > >> +        if (low == interval.low && size >= mapping->size) {
> > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > >> +            interval.low = high + 1;
> > >> +            trace_virtio_iommu_unmap_left_interval(current.low,
> current.high,
> > >> +                interval.low, interval.high);
> > >> +        } else if (high == interval.high && size >= mapping->size) {
> > >> +            trace_virtio_iommu_unmap_right_interval(current.low,
> current.high,
> > >> +                interval.low, interval.high);
> > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > >> +            interval.high = low - 1;
> > >> +        } else if (low > interval.low && high < interval.high) {
> > >> +            trace_virtio_iommu_unmap_inc_interval(current.low,
> current.high);
> > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > >> +        } else {
> > >> +            break;
> > >> +        }
> > >> +        if (interval.low >= interval.high) {
> > >> +            return VIRTIO_IOMMU_S_OK;
> > >> +        } else {
> > >> +            mapping = g_tree_lookup(domain->mappings,
> (gpointer)(&interval));
> > >> +        }
> > >> +    }
> > >> +
> > >> +    if (mapping) {
> > >> +        qemu_log_mask(LOG_GUEST_ERROR,
> > >> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> > >> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
> > >> +                     __func__, interval.low, size,
> > >> +                     mapping->virt_addr, mapping->size);
> > >> +    } else {
> > >> +        return VIRTIO_IOMMU_S_OK;
> > >> +    }
> > >> +
> > >> +    return VIRTIO_IOMMU_S_INVAL;
> > >
> > > Could the above chunk be simplified as something like below?
> > >
> > >   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
> > >     g_tree_remove(domain->mappings, mapping);
> > >   }
> > Indeed the code could be simplified. I only need to make sure I don't
> > split an existing mapping.
> 
> Hmm... Do we need to still split an existing mapping if necessary?
> For example when with this mapping:
> 
>   iova=0x1000, size=0x2000, phys=ADDR1, flags=FLAGS1
> 
> And if we want to unmap the range (iova=0, size=0x2000), then we
> should split the existing mappping and leave this one:
> 
>   iova=0x2000, size=0x1000, phys=(ADDR1+0x1000), flags=FLAGS1
> 
> Right?
> 

virtio-iommu spec explicitly disallows partial unmap.

5.11.6.6.1 Driver Requirements: UNMAP request

The first address of a range MUST either be the first address of a 
mapping or be outside any mapping. The last address of a range 
MUST either be the last address of a mapping or be outside any 
mapping.

5.11.6.6.2 Device Requirements: UNMAP request

If a mapping affected by the range is not covered in its entirety 
by the range (the UNMAP request would split the mapping), 
then the device SHOULD set the request status to VIRTIO_IOMMU
_S_RANGE, and SHOULD NOT remove any mapping.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-09-04  4:23         ` Tian, Kevin
@ 2019-09-04  5:37           ` Peter Xu
  2019-09-04  5:46             ` Tian, Kevin
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Xu @ 2019-09-04  5:37 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: peter.maydell, mst, tn, qemu-devel, Auger Eric, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Wed, Sep 04, 2019 at 04:23:50AM +0000, Tian, Kevin wrote:
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Wednesday, September 4, 2019 9:44 AM
> > 
> > On Tue, Sep 03, 2019 at 01:37:11PM +0200, Auger Eric wrote:
> > > Hi Peter,
> > >
> > > On 8/19/19 10:11 AM, Peter Xu wrote:
> > > > On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
> > > >
> > > > [...]
> > > >
> > > >> +    mapping = g_tree_lookup(domain->mappings, (gpointer)(&interval));
> > > >> +
> > > >> +    while (mapping) {
> > > >> +        viommu_interval current;
> > > >> +        uint64_t low  = mapping->virt_addr;
> > > >> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> > > >> +
> > > >> +        current.low = low;
> > > >> +        current.high = high;
> > > >> +
> > > >> +        if (low == interval.low && size >= mapping->size) {
> > > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > > >> +            interval.low = high + 1;
> > > >> +            trace_virtio_iommu_unmap_left_interval(current.low,
> > current.high,
> > > >> +                interval.low, interval.high);
> > > >> +        } else if (high == interval.high && size >= mapping->size) {
> > > >> +            trace_virtio_iommu_unmap_right_interval(current.low,
> > current.high,
> > > >> +                interval.low, interval.high);
> > > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > > >> +            interval.high = low - 1;
> > > >> +        } else if (low > interval.low && high < interval.high) {
> > > >> +            trace_virtio_iommu_unmap_inc_interval(current.low,
> > current.high);
> > > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > > >> +        } else {
> > > >> +            break;
> > > >> +        }
> > > >> +        if (interval.low >= interval.high) {
> > > >> +            return VIRTIO_IOMMU_S_OK;
> > > >> +        } else {
> > > >> +            mapping = g_tree_lookup(domain->mappings,
> > (gpointer)(&interval));
> > > >> +        }
> > > >> +    }
> > > >> +
> > > >> +    if (mapping) {
> > > >> +        qemu_log_mask(LOG_GUEST_ERROR,
> > > >> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> > > >> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
> > > >> +                     __func__, interval.low, size,
> > > >> +                     mapping->virt_addr, mapping->size);
> > > >> +    } else {
> > > >> +        return VIRTIO_IOMMU_S_OK;
> > > >> +    }
> > > >> +
> > > >> +    return VIRTIO_IOMMU_S_INVAL;
> > > >
> > > > Could the above chunk be simplified as something like below?
> > > >
> > > >   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
> > > >     g_tree_remove(domain->mappings, mapping);
> > > >   }
> > > Indeed the code could be simplified. I only need to make sure I don't
> > > split an existing mapping.
> > 
> > Hmm... Do we need to still split an existing mapping if necessary?
> > For example when with this mapping:
> > 
> >   iova=0x1000, size=0x2000, phys=ADDR1, flags=FLAGS1
> > 
> > And if we want to unmap the range (iova=0, size=0x2000), then we
> > should split the existing mappping and leave this one:
> > 
> >   iova=0x2000, size=0x1000, phys=(ADDR1+0x1000), flags=FLAGS1
> > 
> > Right?
> > 
> 
> virtio-iommu spec explicitly disallows partial unmap.
> 
> 5.11.6.6.1 Driver Requirements: UNMAP request
> 
> The first address of a range MUST either be the first address of a 
> mapping or be outside any mapping. The last address of a range 
> MUST either be the last address of a mapping or be outside any 
> mapping.
> 
> 5.11.6.6.2 Device Requirements: UNMAP request
> 
> If a mapping affected by the range is not covered in its entirety 
> by the range (the UNMAP request would split the mapping), 
> then the device SHOULD set the request status to VIRTIO_IOMMU
> _S_RANGE, and SHOULD NOT remove any mapping.

I see, thanks Kevin.

Though why so strict?  (Sorry if I missed some discussions
... pointers welcomed...)

What I'm thinking is when we want to allocate a bunch of buffers
(e.g., 1M) while we will also need to be able to free them with
smaller chunks (e.g., 4K), then it would be even better that we allow
to allocate a whole 1M buffer within the guest and map it as a whole,
then we can selectively unmap the pages after used.  If with the
strict rule, we'll need to map one by one, that can be a total of
1M/4K roundtrips.

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-09-04  5:37           ` Peter Xu
@ 2019-09-04  5:46             ` Tian, Kevin
  2019-09-04  7:54               ` Auger Eric
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2019-09-04  5:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: peter.maydell, mst, tn, qemu-devel, Auger Eric, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Wednesday, September 4, 2019 1:37 PM
> 
> On Wed, Sep 04, 2019 at 04:23:50AM +0000, Tian, Kevin wrote:
> > > From: Peter Xu [mailto:peterx@redhat.com]
> > > Sent: Wednesday, September 4, 2019 9:44 AM
> > >
> > > On Tue, Sep 03, 2019 at 01:37:11PM +0200, Auger Eric wrote:
> > > > Hi Peter,
> > > >
> > > > On 8/19/19 10:11 AM, Peter Xu wrote:
> > > > > On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > >> +    mapping = g_tree_lookup(domain->mappings,
> (gpointer)(&interval));
> > > > >> +
> > > > >> +    while (mapping) {
> > > > >> +        viommu_interval current;
> > > > >> +        uint64_t low  = mapping->virt_addr;
> > > > >> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> > > > >> +
> > > > >> +        current.low = low;
> > > > >> +        current.high = high;
> > > > >> +
> > > > >> +        if (low == interval.low && size >= mapping->size) {
> > > > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > > > >> +            interval.low = high + 1;
> > > > >> +            trace_virtio_iommu_unmap_left_interval(current.low,
> > > current.high,
> > > > >> +                interval.low, interval.high);
> > > > >> +        } else if (high == interval.high && size >= mapping->size) {
> > > > >> +            trace_virtio_iommu_unmap_right_interval(current.low,
> > > current.high,
> > > > >> +                interval.low, interval.high);
> > > > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > > > >> +            interval.high = low - 1;
> > > > >> +        } else if (low > interval.low && high < interval.high) {
> > > > >> +            trace_virtio_iommu_unmap_inc_interval(current.low,
> > > current.high);
> > > > >> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> > > > >> +        } else {
> > > > >> +            break;
> > > > >> +        }
> > > > >> +        if (interval.low >= interval.high) {
> > > > >> +            return VIRTIO_IOMMU_S_OK;
> > > > >> +        } else {
> > > > >> +            mapping = g_tree_lookup(domain->mappings,
> > > (gpointer)(&interval));
> > > > >> +        }
> > > > >> +    }
> > > > >> +
> > > > >> +    if (mapping) {
> > > > >> +        qemu_log_mask(LOG_GUEST_ERROR,
> > > > >> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> > > > >> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
> > > > >> +                     __func__, interval.low, size,
> > > > >> +                     mapping->virt_addr, mapping->size);
> > > > >> +    } else {
> > > > >> +        return VIRTIO_IOMMU_S_OK;
> > > > >> +    }
> > > > >> +
> > > > >> +    return VIRTIO_IOMMU_S_INVAL;
> > > > >
> > > > > Could the above chunk be simplified as something like below?
> > > > >
> > > > >   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
> > > > >     g_tree_remove(domain->mappings, mapping);
> > > > >   }
> > > > Indeed the code could be simplified. I only need to make sure I don't
> > > > split an existing mapping.
> > >
> > > Hmm... Do we need to still split an existing mapping if necessary?
> > > For example when with this mapping:
> > >
> > >   iova=0x1000, size=0x2000, phys=ADDR1, flags=FLAGS1
> > >
> > > And if we want to unmap the range (iova=0, size=0x2000), then we
> > > should split the existing mappping and leave this one:
> > >
> > >   iova=0x2000, size=0x1000, phys=(ADDR1+0x1000), flags=FLAGS1
> > >
> > > Right?
> > >
> >
> > virtio-iommu spec explicitly disallows partial unmap.
> >
> > 5.11.6.6.1 Driver Requirements: UNMAP request
> >
> > The first address of a range MUST either be the first address of a
> > mapping or be outside any mapping. The last address of a range
> > MUST either be the last address of a mapping or be outside any
> > mapping.
> >
> > 5.11.6.6.2 Device Requirements: UNMAP request
> >
> > If a mapping affected by the range is not covered in its entirety
> > by the range (the UNMAP request would split the mapping),
> > then the device SHOULD set the request status to VIRTIO_IOMMU
> > _S_RANGE, and SHOULD NOT remove any mapping.
> 
> I see, thanks Kevin.
> 
> Though why so strict?  (Sorry if I missed some discussions
> ... pointers welcomed...)
> 
> What I'm thinking is when we want to allocate a bunch of buffers
> (e.g., 1M) while we will also need to be able to free them with
> smaller chunks (e.g., 4K), then it would be even better that we allow
> to allocate a whole 1M buffer within the guest and map it as a whole,
> then we can selectively unmap the pages after used.  If with the
> strict rule, we'll need to map one by one, that can be a total of
> 1M/4K roundtrips.
> 

Sorry I forgot the original discussion. Need Jean to respond. :-)

A possible reason is that no such usage exists today, thus simplification
was made? 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-09-04  5:46             ` Tian, Kevin
@ 2019-09-04  7:54               ` Auger Eric
  2019-09-04  8:32                 ` Peter Xu
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-09-04  7:54 UTC (permalink / raw)
  To: Tian, Kevin, Peter Xu
  Cc: peter.maydell, mst, tn, qemu-devel, alex.williamson, qemu-arm,
	jean-philippe, bharat.bhushan, eric.auger.pro

Hi,

On 9/4/19 7:46 AM, Tian, Kevin wrote:
>> From: Peter Xu [mailto:peterx@redhat.com]
>> Sent: Wednesday, September 4, 2019 1:37 PM
>>
>> On Wed, Sep 04, 2019 at 04:23:50AM +0000, Tian, Kevin wrote:
>>>> From: Peter Xu [mailto:peterx@redhat.com]
>>>> Sent: Wednesday, September 4, 2019 9:44 AM
>>>>
>>>> On Tue, Sep 03, 2019 at 01:37:11PM +0200, Auger Eric wrote:
>>>>> Hi Peter,
>>>>>
>>>>> On 8/19/19 10:11 AM, Peter Xu wrote:
>>>>>> On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> +    mapping = g_tree_lookup(domain->mappings,
>> (gpointer)(&interval));
>>>>>>> +
>>>>>>> +    while (mapping) {
>>>>>>> +        viommu_interval current;
>>>>>>> +        uint64_t low  = mapping->virt_addr;
>>>>>>> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
>>>>>>> +
>>>>>>> +        current.low = low;
>>>>>>> +        current.high = high;
>>>>>>> +
>>>>>>> +        if (low == interval.low && size >= mapping->size) {
>>>>>>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
>>>>>>> +            interval.low = high + 1;
>>>>>>> +            trace_virtio_iommu_unmap_left_interval(current.low,
>>>> current.high,
>>>>>>> +                interval.low, interval.high);
>>>>>>> +        } else if (high == interval.high && size >= mapping->size) {
>>>>>>> +            trace_virtio_iommu_unmap_right_interval(current.low,
>>>> current.high,
>>>>>>> +                interval.low, interval.high);
>>>>>>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
>>>>>>> +            interval.high = low - 1;
>>>>>>> +        } else if (low > interval.low && high < interval.high) {
>>>>>>> +            trace_virtio_iommu_unmap_inc_interval(current.low,
>>>> current.high);
>>>>>>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
>>>>>>> +        } else {
>>>>>>> +            break;
>>>>>>> +        }
>>>>>>> +        if (interval.low >= interval.high) {
>>>>>>> +            return VIRTIO_IOMMU_S_OK;
>>>>>>> +        } else {
>>>>>>> +            mapping = g_tree_lookup(domain->mappings,
>>>> (gpointer)(&interval));
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (mapping) {
>>>>>>> +        qemu_log_mask(LOG_GUEST_ERROR,
>>>>>>> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
>>>>>>> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
>>>>>>> +                     __func__, interval.low, size,
>>>>>>> +                     mapping->virt_addr, mapping->size);
>>>>>>> +    } else {
>>>>>>> +        return VIRTIO_IOMMU_S_OK;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return VIRTIO_IOMMU_S_INVAL;
>>>>>>
>>>>>> Could the above chunk be simplified as something like below?
>>>>>>
>>>>>>   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
>>>>>>     g_tree_remove(domain->mappings, mapping);
>>>>>>   }
>>>>> Indeed the code could be simplified. I only need to make sure I don't
>>>>> split an existing mapping.
>>>>
>>>> Hmm... Do we need to still split an existing mapping if necessary?
>>>> For example when with this mapping:
>>>>
>>>>   iova=0x1000, size=0x2000, phys=ADDR1, flags=FLAGS1
>>>>
>>>> And if we want to unmap the range (iova=0, size=0x2000), then we
>>>> should split the existing mappping and leave this one:
>>>>
>>>>   iova=0x2000, size=0x1000, phys=(ADDR1+0x1000), flags=FLAGS1
>>>>
>>>> Right?
>>>>
>>>
>>> virtio-iommu spec explicitly disallows partial unmap.
>>>
>>> 5.11.6.6.1 Driver Requirements: UNMAP request
>>>
>>> The first address of a range MUST either be the first address of a
>>> mapping or be outside any mapping. The last address of a range
>>> MUST either be the last address of a mapping or be outside any
>>> mapping.
>>>
>>> 5.11.6.6.2 Device Requirements: UNMAP request
>>>
>>> If a mapping affected by the range is not covered in its entirety
>>> by the range (the UNMAP request would split the mapping),
>>> then the device SHOULD set the request status to VIRTIO_IOMMU
>>> _S_RANGE, and SHOULD NOT remove any mapping.
>>
>> I see, thanks Kevin.
>>
>> Though why so strict?  (Sorry if I missed some discussions
>> ... pointers welcomed...)
>>
>> What I'm thinking is when we want to allocate a bunch of buffers
>> (e.g., 1M) while we will also need to be able to free them with
>> smaller chunks (e.g., 4K), then it would be even better that we allow
>> to allocate a whole 1M buffer within the guest and map it as a whole,
>> then we can selectively unmap the pages after used.  If with the
>> strict rule, we'll need to map one by one, that can be a total of
>> 1M/4K roundtrips.
>>
> 
> Sorry I forgot the original discussion. Need Jean to respond. :-)
> 
> A possible reason is that no such usage exists today, thus simplification
> was made? 

In
https://virtualization.linux-foundation.narkive.com/q6XOkO76/rfc-0-3-virtio-iommu-a-paravirtualized-iommu

I found

"
(Note: the semantics of unmap are chosen to be compatible with VFIO's
type1 v2 IOMMU API. This way a device serving as intermediary between
guest and VFIO doesn't have to keep an internal tree of mappings. They are
a bit tighter than VFIO, in that they don't allow unmap spilling outside
mapped regions. Spilling is 'undefined' at the moment, because it should
work in most cases but I don't know if it's worth the added complexity in
devices that are not simply transmitting requests to VFIO. Splitting
mappings won't ever be allowed, but see the relaxed proposal in 3/3 for
more lenient semantics)
"

Thanks

Eric
> 
> Thanks
> Kevin
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap
  2019-09-04  7:54               ` Auger Eric
@ 2019-09-04  8:32                 ` Peter Xu
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Xu @ 2019-09-04  8:32 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, Tian, Kevin, mst, tn, qemu-devel, alex.williamson,
	qemu-arm, jean-philippe, bharat.bhushan, eric.auger.pro

On Wed, Sep 04, 2019 at 09:54:12AM +0200, Auger Eric wrote:
> Hi,
> 
> On 9/4/19 7:46 AM, Tian, Kevin wrote:
> >> From: Peter Xu [mailto:peterx@redhat.com]
> >> Sent: Wednesday, September 4, 2019 1:37 PM
> >>
> >> On Wed, Sep 04, 2019 at 04:23:50AM +0000, Tian, Kevin wrote:
> >>>> From: Peter Xu [mailto:peterx@redhat.com]
> >>>> Sent: Wednesday, September 4, 2019 9:44 AM
> >>>>
> >>>> On Tue, Sep 03, 2019 at 01:37:11PM +0200, Auger Eric wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> On 8/19/19 10:11 AM, Peter Xu wrote:
> >>>>>> On Tue, Jul 30, 2019 at 07:21:30PM +0200, Eric Auger wrote:
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>> +    mapping = g_tree_lookup(domain->mappings,
> >> (gpointer)(&interval));
> >>>>>>> +
> >>>>>>> +    while (mapping) {
> >>>>>>> +        viommu_interval current;
> >>>>>>> +        uint64_t low  = mapping->virt_addr;
> >>>>>>> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> >>>>>>> +
> >>>>>>> +        current.low = low;
> >>>>>>> +        current.high = high;
> >>>>>>> +
> >>>>>>> +        if (low == interval.low && size >= mapping->size) {
> >>>>>>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> >>>>>>> +            interval.low = high + 1;
> >>>>>>> +            trace_virtio_iommu_unmap_left_interval(current.low,
> >>>> current.high,
> >>>>>>> +                interval.low, interval.high);
> >>>>>>> +        } else if (high == interval.high && size >= mapping->size) {
> >>>>>>> +            trace_virtio_iommu_unmap_right_interval(current.low,
> >>>> current.high,
> >>>>>>> +                interval.low, interval.high);
> >>>>>>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> >>>>>>> +            interval.high = low - 1;
> >>>>>>> +        } else if (low > interval.low && high < interval.high) {
> >>>>>>> +            trace_virtio_iommu_unmap_inc_interval(current.low,
> >>>> current.high);
> >>>>>>> +            g_tree_remove(domain->mappings, (gpointer)(&current));
> >>>>>>> +        } else {
> >>>>>>> +            break;
> >>>>>>> +        }
> >>>>>>> +        if (interval.low >= interval.high) {
> >>>>>>> +            return VIRTIO_IOMMU_S_OK;
> >>>>>>> +        } else {
> >>>>>>> +            mapping = g_tree_lookup(domain->mappings,
> >>>> (gpointer)(&interval));
> >>>>>>> +        }
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    if (mapping) {
> >>>>>>> +        qemu_log_mask(LOG_GUEST_ERROR,
> >>>>>>> +                      "****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> >>>>>>> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported\n",
> >>>>>>> +                     __func__, interval.low, size,
> >>>>>>> +                     mapping->virt_addr, mapping->size);
> >>>>>>> +    } else {
> >>>>>>> +        return VIRTIO_IOMMU_S_OK;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    return VIRTIO_IOMMU_S_INVAL;
> >>>>>>
> >>>>>> Could the above chunk be simplified as something like below?
> >>>>>>
> >>>>>>   while ((mapping = g_tree_lookup(domain->mappings, &interval))) {
> >>>>>>     g_tree_remove(domain->mappings, mapping);
> >>>>>>   }
> >>>>> Indeed the code could be simplified. I only need to make sure I don't
> >>>>> split an existing mapping.
> >>>>
> >>>> Hmm... Do we need to still split an existing mapping if necessary?
> >>>> For example when with this mapping:
> >>>>
> >>>>   iova=0x1000, size=0x2000, phys=ADDR1, flags=FLAGS1
> >>>>
> >>>> And if we want to unmap the range (iova=0, size=0x2000), then we
> >>>> should split the existing mappping and leave this one:
> >>>>
> >>>>   iova=0x2000, size=0x1000, phys=(ADDR1+0x1000), flags=FLAGS1
> >>>>
> >>>> Right?
> >>>>
> >>>
> >>> virtio-iommu spec explicitly disallows partial unmap.
> >>>
> >>> 5.11.6.6.1 Driver Requirements: UNMAP request
> >>>
> >>> The first address of a range MUST either be the first address of a
> >>> mapping or be outside any mapping. The last address of a range
> >>> MUST either be the last address of a mapping or be outside any
> >>> mapping.
> >>>
> >>> 5.11.6.6.2 Device Requirements: UNMAP request
> >>>
> >>> If a mapping affected by the range is not covered in its entirety
> >>> by the range (the UNMAP request would split the mapping),
> >>> then the device SHOULD set the request status to VIRTIO_IOMMU
> >>> _S_RANGE, and SHOULD NOT remove any mapping.
> >>
> >> I see, thanks Kevin.
> >>
> >> Though why so strict?  (Sorry if I missed some discussions
> >> ... pointers welcomed...)
> >>
> >> What I'm thinking is when we want to allocate a bunch of buffers
> >> (e.g., 1M) while we will also need to be able to free them with
> >> smaller chunks (e.g., 4K), then it would be even better that we allow
> >> to allocate a whole 1M buffer within the guest and map it as a whole,
> >> then we can selectively unmap the pages after used.  If with the
> >> strict rule, we'll need to map one by one, that can be a total of
> >> 1M/4K roundtrips.
> >>
> > 
> > Sorry I forgot the original discussion. Need Jean to respond. :-)
> > 
> > A possible reason is that no such usage exists today, thus simplification
> > was made? 
> 
> In
> https://virtualization.linux-foundation.narkive.com/q6XOkO76/rfc-0-3-virtio-iommu-a-paravirtualized-iommu
> 
> I found
> 
> "
> (Note: the semantics of unmap are chosen to be compatible with VFIO's
> type1 v2 IOMMU API. This way a device serving as intermediary between
> guest and VFIO doesn't have to keep an internal tree of mappings. They are
> a bit tighter than VFIO, in that they don't allow unmap spilling outside
> mapped regions. Spilling is 'undefined' at the moment, because it should
> work in most cases but I don't know if it's worth the added complexity in
> devices that are not simply transmitting requests to VFIO. Splitting
> mappings won't ever be allowed, but see the relaxed proposal in 3/3 for
> more lenient semantics)
> "

Yes it makes sense to follow vfio type1v2 here.  Though I'm not sure
whether the maintainance of "an internal tree of mappings" could be
avoided by this, at least if using current QEMU IOMMU notifier
framework.  The problem is currently the IOMMU notifiers cannot fail
(e.g., the VFIO_IOMMU_MAP_DMA ioctl from vfio-pci device will assume
the messages delivered from vIOMMUs are always valid), so AFAICT the
vIOMMU needs to tell which mapping request from the guest driver is
valid before delivering the request to vfio.  It seems impossible to
do that if without the internal tree of mapping.  But that seems to be
another story.

In all cases, I think I'm fine with the approach for this patch.

Thanks!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-09-01  6:40           ` Michael S. Tsirkin
@ 2019-09-04 14:19             ` Auger Eric
  2019-09-04 21:36               ` Michael S. Tsirkin
  0 siblings, 1 reply; 55+ messages in thread
From: Auger Eric @ 2019-09-04 14:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Michael,

On 9/1/19 8:40 AM, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 03:49:37PM +0200, Auger Eric wrote:
>> Hi Michael,
>>
>> On 8/1/19 3:06 PM, Michael S. Tsirkin wrote:
>>> On Thu, Aug 01, 2019 at 02:15:03PM +0200, Auger Eric wrote:
>>>> Hi Michael,
>>>>
>>>> On 7/30/19 9:35 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
>>>>>> This patch adds virtio-iommu-pci, which is the pci proxy for
>>>>>> the virtio-iommu device.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> This part I'm not sure we should merge just yet.  The reason being I
>>>>> think we should limit it to mmio where DT can be used to describe iommu
>>>>> topology. For PCI I don't see why we shouldn't always expose this
>>>>> in the config space, and I think it's preferable not to
>>>>> need to support a mix of DT,ACPI and PCI as options.
>>>>
>>>> For context, some discussion related to this topic already arose on v7
>>>> revision of the driver:
>>>>
>>>> [1] Re: [PATCH v7 0/7] Add virtio-iommu driver
>>>> https://lore.kernel.org/linux-pci/87a7ioby9u.fsf@morokweng.localdomain/
>>>>
>>>> Some additional thoughts.
>>>>
>>>> First considering DT boot.
>>>>
>>>> THE DT description features an iommu-map property in the
>>>> pci-host-ecam-generic node that describes which RIDs are handled by the
>>>> virtio-iommu and a possible offset/mask to be applied inbetween the RID
>>>> and the streamID at the input of the IOMMU
>>>> (Documentation/devicetree/bindings/pci/pci-iommu.txt)
>>>>
>>>> As far as I understand when a DMA capable device is setup, its DMA
>>>> configuration is built using that call chain:
>>>>
>>>> pci_dma_configure
>>>> |_ of_dma_configure
>>>>    |_ of_iommu_configure
>>>>       |_ of_pci_iommu_init
>>>>          |_ of_map_rid
>>>>
>>>> I understand you would like the iommu-map/iommu-map-mask info to be
>>>> exposed directly into the config space of the device instead of inside
>>>> the DT or IORT table. Assuming a module is initialized sufficiently
>>>> early to retrieve this info, we would need the resulting info to be
>>>> consolidated to allow pci_dma_configure chain to work seemlessly. This
>>>> sounds a significant impact on above kernel infrastructure.
>>>
>>> I don't really know what consolidated means.
>>> It is pretty common for IOMMUs to expose config through
>>> PCI registers. This typically happens as a fixup.
>> I meant: instead of retrieving the info through the of_* code you need
>> to interoperate with the module to retrieve the same info and detect
>> when you need to take that path instead of the of one.
> 
> The way to do it would be with a quirk,
> and the quirk would not be part of the
> virtio module - it can poke at the device using
> virtio_pci_cfg_cap.

I got this preliminary quirk function working. However it only works for
a DECLARE_PCI_FIXUP_ENABLE quirk. In an EARLY quirk, the guest crashes
on the first ioread that attempts to read the BAR as mem accesses are
not enabled yet I guess.

So assuming I get the proper system config data in the device
configuration, the iommu bindings will be set up late.

By the way I have not yet attempted to generate iommu bindings from the
quirk function (job done in drivers/acpi/arm64/iort.c) which does not
sound to be straightforward.
> 
>>>
>>> I would write a tiny driver to do exactly that,
>>> and run it from the fixup.
>>>
>>>
>>>> This comes in addition to the development of the "small module that
>>>> loads early and pokes at the IOMMU sufficiently to get the data about
>>>> which devices use the IOMMU out of it using standard virtio config
>>>> space" evoked in [1] + the definition of the data formats to be put in
>>>> the very cfg space.
>>>
>>> That last part is true but that's exactly why I propose we
>>> wait on this patch a bit.
>>>
>>>> With ACPI I understand we have the same kind of infrastructure:
>>>> drivers/acpi/arm64/iort.c currently extracts the mapping between RC RIDs
>>>> and IOMMU streamids
>>>>
>>>> pci_dma_configure(
>>>> |_ acpi_dma_configure
>>>>    |_ iort_iommu_configure
>>>>       |_ iort_pci_iommu_init
>>>>          |_ iort_node_map_id
>>>>             |_ iort_id_map
>>>>
>>>> Maybe I fail to see the easy and right way to do the integration at
>>>> kernel level but I am a bit frightened by the efforts that would be
>>>> requested to follow your suggestion, whereas the DT infra is ready and
>>>> fully upstreamed to accept the use case.
>>>
>>> Did you take a look at drivers/pci/quirks.c and how these run?
>>> I think it's just a question of adding DECLARE_PCI_FIXUP_CLASS_EARLY
>>> and running your hook from there.
>> I will do and trace the code.
>>>
>>>
>>>> For ACPI I agree AFAIK IORT was primarily defined by ARM, for ARM but we
>>>> prototyped IORT integration with x86 and it worked for pc machine
>>>> without major trouble.
>>>>
>>>> I sent the kernel and qemu patches prototyping this IORT integration:
>>>>
>>>> https://github.com/eauger/linux/tree/virtio-iommu-v0.9-iort-x86
>>>> https://github.com/eauger/qemu/tree/v3.1.0-rc3-virtio-iommu-v0.9-x86
>>>>
>>>> There ACPI IORT was built for PC machine and the integration effort at
>>>> both kernel and QEMU level was low. This work would need to be rebased
>>>> and depends on kernel ACPI related patches that are not yet upstreamed
>>>> though.
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>
>>> In the end it might turn out you are right.  But it does us no harm to
>>> delay this just a bit, and for now limit things to ARM where it's
>>> already used and where alternatives exist.
>> So if my understanding is correct, at the moment you would accept a DT
>> integration using MMIO. Is that correct? Meanwhile we can prototype your
>> suggestion.
>>
>> Thanks
>>
>> Eric
> 
> Right.

Thank you for the confirmation. However I am not sure Peter will accept
to get the device integrated as a virtio-mmio device and then deprecated
in favor of a virtio-pci device, all the more so work was prepared to
get a PCI integration. Peter?

Thanks

Eric
> 
>>>
>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> v8 -> v9:
>>>>>> - add the msi-bypass property
>>>>>> - create virtio-iommu-pci.c
>>>>>> ---
>>>>>>  hw/virtio/Makefile.objs          |  1 +
>>>>>>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
>>>>>>  include/hw/pci/pci.h             |  1 +
>>>>>>  include/hw/virtio/virtio-iommu.h |  1 +
>>>>>>  qdev-monitor.c                   |  1 +
>>>>>>  5 files changed, 92 insertions(+)
>>>>>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
>>>>>>
>>>>>> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
>>>>>> index f42e4dd94f..80ca719f1c 100644
>>>>>> --- a/hw/virtio/Makefile.objs
>>>>>> +++ b/hw/virtio/Makefile.objs
>>>>>> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
>>>>>>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
>>>>>>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
>>>>>>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
>>>>>> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
>>>>>>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
>>>>>>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
>>>>>>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
>>>>>> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
>>>>>> new file mode 100644
>>>>>> index 0000000000..f9977096bd
>>>>>> --- /dev/null
>>>>>> +++ b/hw/virtio/virtio-iommu-pci.c
>>>>>> @@ -0,0 +1,88 @@
>>>>>> +/*
>>>>>> + * Virtio IOMMU PCI Bindings
>>>>>> + *
>>>>>> + * Copyright (c) 2019 Red Hat, Inc.
>>>>>> + * Written by Eric Auger
>>>>>> + *
>>>>>> + *  This program is free software; you can redistribute it and/or modify
>>>>>> + *  it under the terms of the GNU General Public License version 2 or
>>>>>> + *  (at your option) any later version.
>>>>>> + */
>>>>>> +
>>>>>> +#include "qemu/osdep.h"
>>>>>> +
>>>>>> +#include "virtio-pci.h"
>>>>>> +#include "hw/virtio/virtio-iommu.h"
>>>>>> +
>>>>>> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
>>>>>> +
>>>>>> +/*
>>>>>> + * virtio-iommu-pci: This extends VirtioPCIProxy.
>>>>>> + *
>>>>>> + */
>>>>>> +#define VIRTIO_IOMMU_PCI(obj) \
>>>>>> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
>>>>>> +
>>>>>> +struct VirtIOIOMMUPCI {
>>>>>> +    VirtIOPCIProxy parent_obj;
>>>>>> +    VirtIOIOMMU vdev;
>>>>>> +};
>>>>>> +
>>>>>> +static Property virtio_iommu_pci_properties[] = {
>>>>>> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>>>>> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
>>>>>> +    DEFINE_PROP_END_OF_LIST(),
>>>>>> +};
>>>>>> +
>>>>>> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>>>>> +{
>>>>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
>>>>>> +    DeviceState *vdev = DEVICE(&dev->vdev);
>>>>>> +
>>>>>> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
>>>>>> +    object_property_set_link(OBJECT(dev),
>>>>>> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
>>>>>> +                             "primary-bus", errp);
>>>>>> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
>>>>>> +}
>>>>>> +
>>>>>> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
>>>>>> +{
>>>>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>>>>>> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
>>>>>> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
>>>>>> +    k->realize = virtio_iommu_pci_realize;
>>>>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>>>>> +    dc->props = virtio_iommu_pci_properties;
>>>>>> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
>>>>>> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
>>>>>> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
>>>>>> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
>>>>>> +}
>>>>>> +
>>>>>> +static void virtio_iommu_pci_instance_init(Object *obj)
>>>>>> +{
>>>>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
>>>>>> +
>>>>>> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
>>>>>> +                                TYPE_VIRTIO_IOMMU);
>>>>>> +}
>>>>>> +
>>>>>> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
>>>>>> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
>>>>>> +    .generic_name          = "virtio-iommu-pci",
>>>>>> +    .transitional_name     = "virtio-iommu-pci-transitional",
>>>>>> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
>>>>>> +    .instance_size = sizeof(VirtIOIOMMUPCI),
>>>>>> +    .instance_init = virtio_iommu_pci_instance_init,
>>>>>> +    .class_init    = virtio_iommu_pci_class_init,
>>>>>> +};
>>>>>> +
>>>>>> +static void virtio_iommu_pci_register(void)
>>>>>> +{
>>>>>> +    virtio_pci_types_register(&virtio_iommu_pci_info);
>>>>>> +}
>>>>>> +
>>>>>> +type_init(virtio_iommu_pci_register)
>>>>>> +
>>>>>> +
>>>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>>>> index aaf1b9f70d..492ea7e68d 100644
>>>>>> --- a/include/hw/pci/pci.h
>>>>>> +++ b/include/hw/pci/pci.h
>>>>>> @@ -86,6 +86,7 @@ extern bool pci_available;
>>>>>>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
>>>>>>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
>>>>>>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
>>>>>> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
>>>>>>  
>>>>>>  #define PCI_VENDOR_ID_REDHAT             0x1b36
>>>>>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
>>>>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>>>>>> index 56c8b4e57f..893ac65c0b 100644
>>>>>> --- a/include/hw/virtio/virtio-iommu.h
>>>>>> +++ b/include/hw/virtio/virtio-iommu.h
>>>>>> @@ -25,6 +25,7 @@
>>>>>>  #include "hw/pci/pci.h"
>>>>>>  
>>>>>>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
>>>>>> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
>>>>>>  #define VIRTIO_IOMMU(obj) \
>>>>>>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
>>>>>>  
>>>>>> diff --git a/qdev-monitor.c b/qdev-monitor.c
>>>>>> index 58222c2211..74cf090c61 100644
>>>>>> --- a/qdev-monitor.c
>>>>>> +++ b/qdev-monitor.c
>>>>>> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
>>>>>>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
>>>>>>      { "virtio-input-host-pci", "virtio-input-host",
>>>>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>>>>> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>>>>>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
>>>>>>      { "virtio-keyboard-pci", "virtio-keyboard",
>>>>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
>>>>>> -- 
>>>>>> 2.20.1
>>>>>
>>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support
  2019-09-04 14:19             ` Auger Eric
@ 2019-09-04 21:36               ` Michael S. Tsirkin
  0 siblings, 0 replies; 55+ messages in thread
From: Michael S. Tsirkin @ 2019-09-04 21:36 UTC (permalink / raw)
  To: Auger Eric
  Cc: jean-philippe, kevin.tian, peter.maydell, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

On Wed, Sep 04, 2019 at 04:19:33PM +0200, Auger Eric wrote:
> Hi Michael,
> 
> On 9/1/19 8:40 AM, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 03:49:37PM +0200, Auger Eric wrote:
> >> Hi Michael,
> >>
> >> On 8/1/19 3:06 PM, Michael S. Tsirkin wrote:
> >>> On Thu, Aug 01, 2019 at 02:15:03PM +0200, Auger Eric wrote:
> >>>> Hi Michael,
> >>>>
> >>>> On 7/30/19 9:35 PM, Michael S. Tsirkin wrote:
> >>>>> On Tue, Jul 30, 2019 at 07:21:36PM +0200, Eric Auger wrote:
> >>>>>> This patch adds virtio-iommu-pci, which is the pci proxy for
> >>>>>> the virtio-iommu device.
> >>>>>>
> >>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>
> >>>>> This part I'm not sure we should merge just yet.  The reason being I
> >>>>> think we should limit it to mmio where DT can be used to describe iommu
> >>>>> topology. For PCI I don't see why we shouldn't always expose this
> >>>>> in the config space, and I think it's preferable not to
> >>>>> need to support a mix of DT,ACPI and PCI as options.
> >>>>
> >>>> For context, some discussion related to this topic already arose on v7
> >>>> revision of the driver:
> >>>>
> >>>> [1] Re: [PATCH v7 0/7] Add virtio-iommu driver
> >>>> https://lore.kernel.org/linux-pci/87a7ioby9u.fsf@morokweng.localdomain/
> >>>>
> >>>> Some additional thoughts.
> >>>>
> >>>> First considering DT boot.
> >>>>
> >>>> THE DT description features an iommu-map property in the
> >>>> pci-host-ecam-generic node that describes which RIDs are handled by the
> >>>> virtio-iommu and a possible offset/mask to be applied inbetween the RID
> >>>> and the streamID at the input of the IOMMU
> >>>> (Documentation/devicetree/bindings/pci/pci-iommu.txt)
> >>>>
> >>>> As far as I understand when a DMA capable device is setup, its DMA
> >>>> configuration is built using that call chain:
> >>>>
> >>>> pci_dma_configure
> >>>> |_ of_dma_configure
> >>>>    |_ of_iommu_configure
> >>>>       |_ of_pci_iommu_init
> >>>>          |_ of_map_rid
> >>>>
> >>>> I understand you would like the iommu-map/iommu-map-mask info to be
> >>>> exposed directly into the config space of the device instead of inside
> >>>> the DT or IORT table. Assuming a module is initialized sufficiently
> >>>> early to retrieve this info, we would need the resulting info to be
> >>>> consolidated to allow pci_dma_configure chain to work seemlessly. This
> >>>> sounds a significant impact on above kernel infrastructure.
> >>>
> >>> I don't really know what consolidated means.
> >>> It is pretty common for IOMMUs to expose config through
> >>> PCI registers. This typically happens as a fixup.
> >> I meant: instead of retrieving the info through the of_* code you need
> >> to interoperate with the module to retrieve the same info and detect
> >> when you need to take that path instead of the of one.
> > 
> > The way to do it would be with a quirk,
> > and the quirk would not be part of the
> > virtio module - it can poke at the device using
> > virtio_pci_cfg_cap.
> 
> I got this preliminary quirk function working. However it only works for
> a DECLARE_PCI_FIXUP_ENABLE quirk. In an EARLY quirk, the guest crashes
> on the first ioread that attempts to read the BAR as mem accesses are
> not enabled yet I guess.

This is why I suggested using virtio_pci_cfg_cap.
This allows bar access when mem accesses are disabled.
 

> So assuming I get the proper system config data in the device
> configuration, the iommu bindings will be set up late.

If it all works with a late quirk, then great.
If not we can fall back to config cycles.

> By the way I have not yet attempted to generate iommu bindings from the
> quirk function (job done in drivers/acpi/arm64/iort.c) which does not
> sound to be straightforward.
> > 
> >>>
> >>> I would write a tiny driver to do exactly that,
> >>> and run it from the fixup.
> >>>
> >>>
> >>>> This comes in addition to the development of the "small module that
> >>>> loads early and pokes at the IOMMU sufficiently to get the data about
> >>>> which devices use the IOMMU out of it using standard virtio config
> >>>> space" evoked in [1] + the definition of the data formats to be put in
> >>>> the very cfg space.
> >>>
> >>> That last part is true but that's exactly why I propose we
> >>> wait on this patch a bit.
> >>>
> >>>> With ACPI I understand we have the same kind of infrastructure:
> >>>> drivers/acpi/arm64/iort.c currently extracts the mapping between RC RIDs
> >>>> and IOMMU streamids
> >>>>
> >>>> pci_dma_configure(
> >>>> |_ acpi_dma_configure
> >>>>    |_ iort_iommu_configure
> >>>>       |_ iort_pci_iommu_init
> >>>>          |_ iort_node_map_id
> >>>>             |_ iort_id_map
> >>>>
> >>>> Maybe I fail to see the easy and right way to do the integration at
> >>>> kernel level but I am a bit frightened by the efforts that would be
> >>>> requested to follow your suggestion, whereas the DT infra is ready and
> >>>> fully upstreamed to accept the use case.
> >>>
> >>> Did you take a look at drivers/pci/quirks.c and how these run?
> >>> I think it's just a question of adding DECLARE_PCI_FIXUP_CLASS_EARLY
> >>> and running your hook from there.
> >> I will do and trace the code.
> >>>
> >>>
> >>>> For ACPI I agree AFAIK IORT was primarily defined by ARM, for ARM but we
> >>>> prototyped IORT integration with x86 and it worked for pc machine
> >>>> without major trouble.
> >>>>
> >>>> I sent the kernel and qemu patches prototyping this IORT integration:
> >>>>
> >>>> https://github.com/eauger/linux/tree/virtio-iommu-v0.9-iort-x86
> >>>> https://github.com/eauger/qemu/tree/v3.1.0-rc3-virtio-iommu-v0.9-x86
> >>>>
> >>>> There ACPI IORT was built for PC machine and the integration effort at
> >>>> both kernel and QEMU level was low. This work would need to be rebased
> >>>> and depends on kernel ACPI related patches that are not yet upstreamed
> >>>> though.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>
> >>> In the end it might turn out you are right.  But it does us no harm to
> >>> delay this just a bit, and for now limit things to ARM where it's
> >>> already used and where alternatives exist.
> >> So if my understanding is correct, at the moment you would accept a DT
> >> integration using MMIO. Is that correct? Meanwhile we can prototype your
> >> suggestion.
> >>
> >> Thanks
> >>
> >> Eric
> > 
> > Right.
> 
> Thank you for the confirmation. However I am not sure Peter will accept
> to get the device integrated as a virtio-mmio device and then deprecated
> in favor of a virtio-pci device, all the more so work was prepared to
> get a PCI integration. Peter?
> 
> Thanks
> 
> Eric
> > 
> >>>
> >>>
> >>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> v8 -> v9:
> >>>>>> - add the msi-bypass property
> >>>>>> - create virtio-iommu-pci.c
> >>>>>> ---
> >>>>>>  hw/virtio/Makefile.objs          |  1 +
> >>>>>>  hw/virtio/virtio-iommu-pci.c     | 88 ++++++++++++++++++++++++++++++++
> >>>>>>  include/hw/pci/pci.h             |  1 +
> >>>>>>  include/hw/virtio/virtio-iommu.h |  1 +
> >>>>>>  qdev-monitor.c                   |  1 +
> >>>>>>  5 files changed, 92 insertions(+)
> >>>>>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
> >>>>>>
> >>>>>> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> >>>>>> index f42e4dd94f..80ca719f1c 100644
> >>>>>> --- a/hw/virtio/Makefile.objs
> >>>>>> +++ b/hw/virtio/Makefile.objs
> >>>>>> @@ -27,6 +27,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
> >>>>>>  obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
> >>>>>>  obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
> >>>>>>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
> >>>>>> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
> >>>>>>  obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
> >>>>>>  obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
> >>>>>>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
> >>>>>> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> >>>>>> new file mode 100644
> >>>>>> index 0000000000..f9977096bd
> >>>>>> --- /dev/null
> >>>>>> +++ b/hw/virtio/virtio-iommu-pci.c
> >>>>>> @@ -0,0 +1,88 @@
> >>>>>> +/*
> >>>>>> + * Virtio IOMMU PCI Bindings
> >>>>>> + *
> >>>>>> + * Copyright (c) 2019 Red Hat, Inc.
> >>>>>> + * Written by Eric Auger
> >>>>>> + *
> >>>>>> + *  This program is free software; you can redistribute it and/or modify
> >>>>>> + *  it under the terms of the GNU General Public License version 2 or
> >>>>>> + *  (at your option) any later version.
> >>>>>> + */
> >>>>>> +
> >>>>>> +#include "qemu/osdep.h"
> >>>>>> +
> >>>>>> +#include "virtio-pci.h"
> >>>>>> +#include "hw/virtio/virtio-iommu.h"
> >>>>>> +
> >>>>>> +typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
> >>>>>> +
> >>>>>> +/*
> >>>>>> + * virtio-iommu-pci: This extends VirtioPCIProxy.
> >>>>>> + *
> >>>>>> + */
> >>>>>> +#define VIRTIO_IOMMU_PCI(obj) \
> >>>>>> +        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
> >>>>>> +
> >>>>>> +struct VirtIOIOMMUPCI {
> >>>>>> +    VirtIOPCIProxy parent_obj;
> >>>>>> +    VirtIOIOMMU vdev;
> >>>>>> +};
> >>>>>> +
> >>>>>> +static Property virtio_iommu_pci_properties[] = {
> >>>>>> +    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> >>>>>> +    DEFINE_PROP_BOOL("msi-bypass", VirtIOIOMMUPCI, vdev.msi_bypass, true),
> >>>>>> +    DEFINE_PROP_END_OF_LIST(),
> >>>>>> +};
> >>>>>> +
> >>>>>> +static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >>>>>> +{
> >>>>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
> >>>>>> +    DeviceState *vdev = DEVICE(&dev->vdev);
> >>>>>> +
> >>>>>> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> >>>>>> +    object_property_set_link(OBJECT(dev),
> >>>>>> +                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
> >>>>>> +                             "primary-bus", errp);
> >>>>>> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
> >>>>>> +{
> >>>>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >>>>>> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> >>>>>> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> >>>>>> +    k->realize = virtio_iommu_pci_realize;
> >>>>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >>>>>> +    dc->props = virtio_iommu_pci_properties;
> >>>>>> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> >>>>>> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
> >>>>>> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> >>>>>> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void virtio_iommu_pci_instance_init(Object *obj)
> >>>>>> +{
> >>>>>> +    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
> >>>>>> +
> >>>>>> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> >>>>>> +                                TYPE_VIRTIO_IOMMU);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
> >>>>>> +    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
> >>>>>> +    .generic_name          = "virtio-iommu-pci",
> >>>>>> +    .transitional_name     = "virtio-iommu-pci-transitional",
> >>>>>> +    .non_transitional_name = "virtio-iommu-pci-non-transitional",
> >>>>>> +    .instance_size = sizeof(VirtIOIOMMUPCI),
> >>>>>> +    .instance_init = virtio_iommu_pci_instance_init,
> >>>>>> +    .class_init    = virtio_iommu_pci_class_init,
> >>>>>> +};
> >>>>>> +
> >>>>>> +static void virtio_iommu_pci_register(void)
> >>>>>> +{
> >>>>>> +    virtio_pci_types_register(&virtio_iommu_pci_info);
> >>>>>> +}
> >>>>>> +
> >>>>>> +type_init(virtio_iommu_pci_register)
> >>>>>> +
> >>>>>> +
> >>>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>>>>> index aaf1b9f70d..492ea7e68d 100644
> >>>>>> --- a/include/hw/pci/pci.h
> >>>>>> +++ b/include/hw/pci/pci.h
> >>>>>> @@ -86,6 +86,7 @@ extern bool pci_available;
> >>>>>>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> >>>>>>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> >>>>>>  #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> >>>>>> +#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
> >>>>>>  
> >>>>>>  #define PCI_VENDOR_ID_REDHAT             0x1b36
> >>>>>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> >>>>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> >>>>>> index 56c8b4e57f..893ac65c0b 100644
> >>>>>> --- a/include/hw/virtio/virtio-iommu.h
> >>>>>> +++ b/include/hw/virtio/virtio-iommu.h
> >>>>>> @@ -25,6 +25,7 @@
> >>>>>>  #include "hw/pci/pci.h"
> >>>>>>  
> >>>>>>  #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
> >>>>>> +#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
> >>>>>>  #define VIRTIO_IOMMU(obj) \
> >>>>>>          OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
> >>>>>>  
> >>>>>> diff --git a/qdev-monitor.c b/qdev-monitor.c
> >>>>>> index 58222c2211..74cf090c61 100644
> >>>>>> --- a/qdev-monitor.c
> >>>>>> +++ b/qdev-monitor.c
> >>>>>> @@ -63,6 +63,7 @@ static const QDevAlias qdev_alias_table[] = {
> >>>>>>      { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
> >>>>>>      { "virtio-input-host-pci", "virtio-input-host",
> >>>>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>>>>> +    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>>>>>      { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
> >>>>>>      { "virtio-keyboard-pci", "virtio-keyboard",
> >>>>>>              QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
> >>>>>> -- 
> >>>>>> 2.20.1
> >>>>>
> >>>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers
  2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
  2019-08-16  4:17   ` Peter Xu
@ 2019-11-04 18:31   ` Jean-Philippe Brucker
  2019-11-25 13:14     ` Auger Eric
  1 sibling, 1 reply; 55+ messages in thread
From: Jean-Philippe Brucker @ 2019-11-04 18:31 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Eric,

On Tue, Jul 30, 2019 at 07:21:28PM +0200, Eric Auger wrote:
>  static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
> +
> +    g_tree_destroy(s->domains);
> +    g_tree_destroy(s->endpoints);

virtio_iommu_device_reset() must completely clear the internal state as
well (noticed while testing modprobe/rmmod).

Thanks,
Jean


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers
  2019-11-04 18:31   ` Jean-Philippe Brucker
@ 2019-11-25 13:14     ` Auger Eric
  0 siblings, 0 replies; 55+ messages in thread
From: Auger Eric @ 2019-11-25 13:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: peter.maydell, kevin.tian, mst, tn, qemu-devel, peterx,
	alex.williamson, qemu-arm, bharat.bhushan, eric.auger.pro

Hi Jean,
On 11/4/19 7:31 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Tue, Jul 30, 2019 at 07:21:28PM +0200, Eric Auger wrote:
>>  static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
>>  {
>>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>> +
>> +    g_tree_destroy(s->domains);
>> +    g_tree_destroy(s->endpoints);
> 
> virtio_iommu_device_reset() must completely clear the internal state as
> well (noticed while testing modprobe/rmmod).
I just noticed I forgot to take into account this comment in v11.

I will fix that shortly.

Thanks

Eric
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2019-11-25 13:23 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 17:21 [Qemu-devel] [PATCH for-4.2 v10 00/15] VIRTIO-IOMMU device Eric Auger
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 01/15] update-linux-headers: Import virtio_iommu.h Eric Auger
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 02/15] linux-headers: update against 5.3-rc2 Eric Auger
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 03/15] virtio-iommu: Add skeleton Eric Auger
2019-08-15 13:54   ` Peter Xu
2019-08-29 12:18     ` Auger Eric
2019-08-30  1:26       ` Peter Xu
2019-08-30  8:12         ` Auger Eric
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 04/15] virtio-iommu: Decode the command payload Eric Auger
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 05/15] virtio-iommu: Add the iommu regions Eric Auger
2019-08-16  4:00   ` Peter Xu
2019-08-29 12:51     ` Auger Eric
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 06/15] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
2019-08-16  4:17   ` Peter Xu
2019-11-04 18:31   ` Jean-Philippe Brucker
2019-11-25 13:14     ` Auger Eric
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 07/15] virtio-iommu: Implement attach/detach command Eric Auger
2019-08-16  4:27   ` Peter Xu
2019-08-29 14:24     ` Auger Eric
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 08/15] virtio-iommu: Implement map/unmap Eric Auger
2019-08-19  8:11   ` Peter Xu
2019-09-03 11:37     ` Auger Eric
2019-09-04  1:44       ` Peter Xu
2019-09-04  4:23         ` Tian, Kevin
2019-09-04  5:37           ` Peter Xu
2019-09-04  5:46             ` Tian, Kevin
2019-09-04  7:54               ` Auger Eric
2019-09-04  8:32                 ` Peter Xu
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 09/15] virtio-iommu: Implement translate Eric Auger
2019-08-19  8:24   ` Peter Xu
2019-09-03 11:45     ` Auger Eric
2019-09-04  1:58       ` Peter Xu
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 10/15] virtio-iommu: Implement probe request Eric Auger
2019-08-19 12:08   ` Peter Xu
2019-09-03 12:23     ` Auger Eric
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 11/15] virtio-iommu: Expose the IOAPIC MSI reserved region when relevant Eric Auger
2019-07-30 19:38   ` Michael S. Tsirkin
2019-07-30 23:20     ` Tian, Kevin
2019-07-31  9:05       ` Auger Eric
2019-07-31 19:25       ` Michael S. Tsirkin
2019-07-31 19:44         ` Auger Eric
2019-07-31 23:23           ` Tian, Kevin
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 12/15] virtio-iommu: Implement fault reporting Eric Auger
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 13/15] virtio_iommu: Handle reserved regions in translation process Eric Auger
2019-08-19 12:44   ` Peter Xu
2019-09-01  6:38   ` Michael S. Tsirkin
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 14/15] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
2019-07-30 19:35   ` Michael S. Tsirkin
2019-08-01 12:15     ` Auger Eric
2019-08-01 13:06       ` Michael S. Tsirkin
2019-08-01 13:49         ` Auger Eric
2019-09-01  6:40           ` Michael S. Tsirkin
2019-09-04 14:19             ` Auger Eric
2019-09-04 21:36               ` Michael S. Tsirkin
2019-07-30 17:21 ` [Qemu-devel] [PATCH for-4.2 v10 15/15] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).