kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] ASID support in vhost-vdpa net
@ 2022-08-05 16:39 Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 1/6] linux-headers: Update kernel headers Eugenio Pérez
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

Control VQ is the way net devices use to send changes to the device state, like
the number of active queues or its mac address.

QEMU needs to intercept this queue so it can track these changes and is able to
migrate the device. It can do it from 1576dbb5bbc4 ("vdpa: Add x-svq to
NetdevVhostVDPAOptions"). However, to enable x-svq implies to shadow all VirtIO
device's virtqueues, which will damage performance.

This series adds address space isolation, so the device and the guest
communicate directly with them (passthrough) and CVQ communication is split in
two: The guest communicates with QEMU and QEMU forwards the commands to the
device.

For example, NIC_RX_FILTER_CHANGED without the need of x-svq parameter can be
received in QMP now.

This series is based on [1], and this needs to be applied on top of that.  Each
one of them adds a feature on isolation and could be merged individually once
conflicts are solved.

Comments are welcome. Thanks!

v4:
- Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
- Squash vhost_vdpa_cvq_group_is_independent.
- Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
  that callback registered in that NetClientInfo.
- Add comment specifying behavior if device does not support _F_ASID
- Update headers to a later Linux commit to not to remove SETUP_RNG_SEED

v3:
- Do not return an error but just print a warning if vdpa device initialization
  returns failure while getting AS num of VQ groups
- Delete extra newline

v2:
- Much as commented on series [1], handle vhost_net backend through
  NetClientInfo callbacks instead of directly.
- Fix not freeing SVQ properly when device does not support CVQ
- Add BIT_ULL missed checking device's backend feature for _F_ASID.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00349.html

Eugenio Pérez (6):
  linux-headers: Update kernel headers
  vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop
  vdpa: Allocate SVQ unconditionally
  vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  vdpa: Store x-svq parameter in VhostVDPAState
  vdpa: Always start CVQ in SVQ mode

 include/hw/virtio/vhost-vdpa.h               |   8 +-
 include/standard-headers/asm-x86/bootparam.h |   7 +-
 include/standard-headers/drm/drm_fourcc.h    |  69 ++++++++++
 include/standard-headers/linux/ethtool.h     |   1 +
 include/standard-headers/linux/input.h       |  12 +-
 include/standard-headers/linux/pci_regs.h    |   1 +
 include/standard-headers/linux/vhost_types.h |  11 +-
 include/standard-headers/linux/virtio_ids.h  |  14 +-
 linux-headers/asm-arm64/kvm.h                |  27 ++++
 linux-headers/asm-generic/unistd.h           |   4 +-
 linux-headers/asm-riscv/kvm.h                |  20 +++
 linux-headers/asm-riscv/unistd.h             |   3 +-
 linux-headers/asm-x86/kvm.h                  |  11 +-
 linux-headers/asm-x86/mman.h                 |  14 --
 linux-headers/linux/kvm.h                    |  56 +++++++-
 linux-headers/linux/userfaultfd.h            |  10 +-
 linux-headers/linux/vfio.h                   |   4 +-
 linux-headers/linux/vhost.h                  |  26 +++-
 hw/virtio/vhost-vdpa.c                       |  65 ++++-----
 net/vhost-vdpa.c                             | 134 ++++++++++++++++++-
 hw/virtio/trace-events                       |   4 +-
 21 files changed, 408 insertions(+), 93 deletions(-)

--
2.31.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/6] linux-headers: Update kernel headers
  2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
@ 2022-08-05 16:39 ` Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 2/6] vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop Eugenio Pérez
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

Main reason is for new vhost_vdpa address space ioctls to be available.

Update kernel headers until
9de1f9c8ca51 ("Merge tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/standard-headers/asm-x86/bootparam.h |  7 +-
 include/standard-headers/drm/drm_fourcc.h    | 69 ++++++++++++++++++++
 include/standard-headers/linux/ethtool.h     |  1 +
 include/standard-headers/linux/input.h       | 12 ++--
 include/standard-headers/linux/pci_regs.h    |  1 +
 include/standard-headers/linux/vhost_types.h | 11 +++-
 include/standard-headers/linux/virtio_ids.h  | 14 ++--
 linux-headers/asm-arm64/kvm.h                | 27 ++++++++
 linux-headers/asm-generic/unistd.h           |  4 +-
 linux-headers/asm-riscv/kvm.h                | 20 ++++++
 linux-headers/asm-riscv/unistd.h             |  3 +-
 linux-headers/asm-x86/kvm.h                  | 11 ++--
 linux-headers/asm-x86/mman.h                 | 14 ----
 linux-headers/linux/kvm.h                    | 56 +++++++++++++++-
 linux-headers/linux/userfaultfd.h            | 10 ++-
 linux-headers/linux/vfio.h                   |  4 +-
 linux-headers/linux/vhost.h                  | 26 ++++++--
 17 files changed, 240 insertions(+), 50 deletions(-)

diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h
index b2aaad10e5..0b06d2bff1 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,12 +10,13 @@
 #define SETUP_EFI			4
 #define SETUP_APPLE_PROPERTIES		5
 #define SETUP_JAILHOUSE			6
+#define SETUP_CC_BLOB			7
+#define SETUP_IMA			8
 #define SETUP_RNG_SEED			9
+#define SETUP_ENUM_MAX			SETUP_RNG_SEED
 
 #define SETUP_INDIRECT			(1<<31)
-
-/* SETUP_INDIRECT | max(SETUP_*) */
-#define SETUP_TYPE_MAX			(SETUP_INDIRECT | SETUP_JAILHOUSE)
+#define SETUP_TYPE_MAX			(SETUP_ENUM_MAX | SETUP_INDIRECT)
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK	0x07FF
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
index 4888f85f69..0b051545d3 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -571,6 +571,53 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
+/*
+ * Intel Tile 4 layout
+ *
+ * This is a tiled layout using 4KB tiles in a row-major layout. It has the same
+ * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
+ * only differs from Tile Y at the 256B granularity in between. At this
+ * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape
+ * of 64B x 8 rows.
+ */
+#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. For semi-planar formats
+ * like NV12, the Y and UV planes are Tile 4 and are located at plane indices
+ * 0 and 1, respectively. The CCS for all planes are stored outside of the
+ * GEM object in a reserved memory area dedicated for the storage of the
+ * CCS data for all RC/RC_CC/MC compressible GEM objects. The main surface
+ * pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths. The
+ * clear color is stored at plane index 1 and the pitch should be ignored. The
+ * format of the 256 bits of clear color data matches the one used for the
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
+ * for details.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -608,6 +655,28 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_QCOM_COMPRESSED	fourcc_mod_code(QCOM, 1)
 
+/*
+ * Qualcomm Tiled Format
+ *
+ * Similar to DRM_FORMAT_MOD_QCOM_COMPRESSED but not compressed.
+ * Implementation may be platform and base-format specific.
+ *
+ * Each macrotile consists of m x n (mostly 4 x 4) tiles.
+ * Pixel data pitch/stride is aligned with macrotile width.
+ * Pixel data height is aligned with macrotile height.
+ * Entire pixel data buffer is aligned with 4k(bytes).
+ */
+#define DRM_FORMAT_MOD_QCOM_TILED3	fourcc_mod_code(QCOM, 3)
+
+/*
+ * Qualcomm Alternate Tiled Format
+ *
+ * Alternate tiled format typically only used within GMEM.
+ * Implementation may be platform and base-format specific.
+ */
+#define DRM_FORMAT_MOD_QCOM_TILED2	fourcc_mod_code(QCOM, 2)
+
+
 /* Vivante framebuffer modifiers */
 
 /*
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 38d5a4cd6e..5a99f13471 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -1691,6 +1691,7 @@ enum ethtool_link_mode_bit_indices {
 	ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT	 = 89,
 	ETHTOOL_LINK_MODE_100baseFX_Half_BIT		 = 90,
 	ETHTOOL_LINK_MODE_100baseFX_Full_BIT		 = 91,
+	ETHTOOL_LINK_MODE_10baseT1L_Full_BIT		 = 92,
 	/* must be last entry */
 	__ETHTOOL_LINK_MODE_MASK_NBITS
 };
diff --git a/include/standard-headers/linux/input.h b/include/standard-headers/linux/input.h
index 7822c24178..942ea6aaa9 100644
--- a/include/standard-headers/linux/input.h
+++ b/include/standard-headers/linux/input.h
@@ -75,10 +75,13 @@ struct input_id {
  * Note that input core does not clamp reported values to the
  * [minimum, maximum] limits, such task is left to userspace.
  *
- * The default resolution for main axes (ABS_X, ABS_Y, ABS_Z)
- * is reported in units per millimeter (units/mm), resolution
- * for rotational axes (ABS_RX, ABS_RY, ABS_RZ) is reported
- * in units per radian.
+ * The default resolution for main axes (ABS_X, ABS_Y, ABS_Z,
+ * ABS_MT_POSITION_X, ABS_MT_POSITION_Y) is reported in units
+ * per millimeter (units/mm), resolution for rotational axes
+ * (ABS_RX, ABS_RY, ABS_RZ) is reported in units per radian.
+ * The resolution for the size axes (ABS_MT_TOUCH_MAJOR,
+ * ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MAJOR, ABS_MT_WIDTH_MINOR)
+ * is reported in units per millimeter (units/mm).
  * When INPUT_PROP_ACCELEROMETER is set the resolution changes.
  * The main axes (ABS_X, ABS_Y, ABS_Z) are then reported in
  * units per g (units/g) and in units per degree per second
@@ -268,6 +271,7 @@ struct input_mask {
 #define BUS_RMI			0x1D
 #define BUS_CEC			0x1E
 #define BUS_INTEL_ISHTP		0x1F
+#define BUS_AMD_SFH		0x20
 
 /*
  * MT_TOOL types
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index bee1a9ed6e..108f8523fa 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -616,6 +616,7 @@
 #define  PCI_EXP_SLTCTL_PWR_OFF        0x0400 /* Power Off */
 #define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
 #define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
+#define  PCI_EXP_SLTCTL_ASPL_DISABLE	0x2000 /* Auto Slot Power Limit Disable */
 #define  PCI_EXP_SLTCTL_IBPD_DISABLE	0x4000 /* In-band PD disable */
 #define PCI_EXP_SLTSTA		0x1a	/* Slot Status */
 #define  PCI_EXP_SLTSTA_ABP	0x0001	/* Attention Button Pressed */
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
index 0bd2684a2a..ce78551b0f 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -87,7 +87,7 @@ struct vhost_msg {
 
 struct vhost_msg_v2 {
 	uint32_t type;
-	uint32_t reserved;
+	uint32_t asid;
 	union {
 		struct vhost_iotlb_msg iotlb;
 		uint8_t padding[64];
@@ -153,4 +153,13 @@ struct vhost_vdpa_iova_range {
 /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
 #define VHOST_NET_F_VIRTIO_NET_HDR 27
 
+/* Use message type V2 */
+#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
+/* IOTLB can accept batching hints */
+#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
+/* IOTLB can accept address space identifier through V2 type of IOTLB
+ * message
+ */
+#define VHOST_BACKEND_F_IOTLB_ASID  0x3
+
 #endif
diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 80d76b75bc..7aa2eb7662 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -73,12 +73,12 @@
  * Virtio Transitional IDs
  */
 
-#define VIRTIO_TRANS_ID_NET		1000 /* transitional virtio net */
-#define VIRTIO_TRANS_ID_BLOCK		1001 /* transitional virtio block */
-#define VIRTIO_TRANS_ID_BALLOON		1002 /* transitional virtio balloon */
-#define VIRTIO_TRANS_ID_CONSOLE		1003 /* transitional virtio console */
-#define VIRTIO_TRANS_ID_SCSI		1004 /* transitional virtio SCSI */
-#define VIRTIO_TRANS_ID_RNG		1005 /* transitional virtio rng */
-#define VIRTIO_TRANS_ID_9P		1009 /* transitional virtio 9p console */
+#define VIRTIO_TRANS_ID_NET		0x1000 /* transitional virtio net */
+#define VIRTIO_TRANS_ID_BLOCK		0x1001 /* transitional virtio block */
+#define VIRTIO_TRANS_ID_BALLOON		0x1002 /* transitional virtio balloon */
+#define VIRTIO_TRANS_ID_CONSOLE		0x1003 /* transitional virtio console */
+#define VIRTIO_TRANS_ID_SCSI		0x1004 /* transitional virtio SCSI */
+#define VIRTIO_TRANS_ID_RNG		0x1005 /* transitional virtio rng */
+#define VIRTIO_TRANS_ID_9P		0x1009 /* transitional virtio 9p console */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 5c28a9737a..286668285f 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -139,8 +139,10 @@ struct kvm_guest_debug_arch {
 	__u64 dbg_wvr[KVM_ARM_MAX_DBG_REGS];
 };
 
+#define KVM_DEBUG_ARCH_HSR_HIGH_VALID	(1 << 0)
 struct kvm_debug_exit_arch {
 	__u32 hsr;
+	__u32 hsr_high;	/* ESR_EL2[61:32] */
 	__u64 far;	/* used for watchpoints */
 };
 
@@ -332,6 +334,31 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM64_SVE_VLS_WORDS	\
 	((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
 
+/* Bitmap feature firmware registers */
+#define KVM_REG_ARM_FW_FEAT_BMAP		(0x0016 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r)		(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
+						KVM_REG_ARM_FW_FEAT_BMAP |	\
+						((r) & 0xffff))
+
+#define KVM_REG_ARM_STD_BMAP			KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
+
+enum {
+	KVM_REG_ARM_STD_BIT_TRNG_V1_0	= 0,
+};
+
+#define KVM_REG_ARM_STD_HYP_BMAP		KVM_REG_ARM_FW_FEAT_BMAP_REG(1)
+
+enum {
+	KVM_REG_ARM_STD_HYP_BIT_PV_TIME	= 0,
+};
+
+#define KVM_REG_ARM_VENDOR_HYP_BMAP		KVM_REG_ARM_FW_FEAT_BMAP_REG(2)
+
+enum {
+	KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT	= 0,
+	KVM_REG_ARM_VENDOR_HYP_BIT_PTP		= 1,
+};
+
 /* Device Control API: ARM VGIC */
 #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS	1
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index 1c48b0ae3b..45fa180cc5 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -383,7 +383,7 @@ __SYSCALL(__NR_syslog, sys_syslog)
 
 /* kernel/ptrace.c */
 #define __NR_ptrace 117
-__SYSCALL(__NR_ptrace, sys_ptrace)
+__SC_COMP(__NR_ptrace, sys_ptrace, compat_sys_ptrace)
 
 /* kernel/sched/core.c */
 #define __NR_sched_setparam 118
@@ -779,7 +779,7 @@ __SYSCALL(__NR_rseq, sys_rseq)
 #define __NR_kexec_file_load 294
 __SYSCALL(__NR_kexec_file_load,     sys_kexec_file_load)
 /* 295 through 402 are unassigned to sync up with generic numbers, don't use */
-#if __BITS_PER_LONG == 32
+#if defined(__SYSCALL_COMPAT) || __BITS_PER_LONG == 32
 #define __NR_clock_gettime64 403
 __SYSCALL(__NR_clock_gettime64, sys_clock_gettime)
 #define __NR_clock_settime64 404
diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
index f808ad1ce5..6119368ba6 100644
--- a/linux-headers/asm-riscv/kvm.h
+++ b/linux-headers/asm-riscv/kvm.h
@@ -82,6 +82,23 @@ struct kvm_riscv_timer {
 	__u64 state;
 };
 
+/*
+ * ISA extension IDs specific to KVM. This is not the same as the host ISA
+ * extension IDs as that is internal to the host and should not be exposed
+ * to the guest. This should always be contiguous to keep the mapping simple
+ * in KVM implementation.
+ */
+enum KVM_RISCV_ISA_EXT_ID {
+	KVM_RISCV_ISA_EXT_A = 0,
+	KVM_RISCV_ISA_EXT_C,
+	KVM_RISCV_ISA_EXT_D,
+	KVM_RISCV_ISA_EXT_F,
+	KVM_RISCV_ISA_EXT_H,
+	KVM_RISCV_ISA_EXT_I,
+	KVM_RISCV_ISA_EXT_M,
+	KVM_RISCV_ISA_EXT_MAX,
+};
+
 /* Possible states for kvm_riscv_timer */
 #define KVM_RISCV_TIMER_STATE_OFF	0
 #define KVM_RISCV_TIMER_STATE_ON	1
@@ -123,6 +140,9 @@ struct kvm_riscv_timer {
 #define KVM_REG_RISCV_FP_D_REG(name)	\
 		(offsetof(struct __riscv_d_ext_state, name) / sizeof(__u64))
 
+/* ISA Extension registers are mapped as type 7 */
+#define KVM_REG_RISCV_ISA_EXT		(0x07 << KVM_REG_RISCV_TYPE_SHIFT)
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/linux-headers/asm-riscv/unistd.h b/linux-headers/asm-riscv/unistd.h
index 8062996c2d..73d7cdd2ec 100644
--- a/linux-headers/asm-riscv/unistd.h
+++ b/linux-headers/asm-riscv/unistd.h
@@ -15,12 +15,13 @@
  * along with this program.  If not, see <https://www.gnu.org/licenses/>.
  */
 
-#ifdef __LP64__
+#if defined(__LP64__) && !defined(__SYSCALL_COMPAT)
 #define __ARCH_WANT_NEW_STAT
 #define __ARCH_WANT_SET_GET_RLIMIT
 #endif /* __LP64__ */
 
 #define __ARCH_WANT_SYS_CLONE3
+#define __ARCH_WANT_MEMFD_SECRET
 
 #include <asm-generic/unistd.h>
 
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index bf6e96011d..21614807a2 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -428,11 +428,12 @@ struct kvm_sync_regs {
 	struct kvm_vcpu_events events;
 };
 
-#define KVM_X86_QUIRK_LINT0_REENABLED	   (1 << 0)
-#define KVM_X86_QUIRK_CD_NW_CLEARED	   (1 << 1)
-#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	   (1 << 2)
-#define KVM_X86_QUIRK_OUT_7E_INC_RIP	   (1 << 3)
-#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
+#define KVM_X86_QUIRK_LINT0_REENABLED		(1 << 0)
+#define KVM_X86_QUIRK_CD_NW_CLEARED		(1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE		(1 << 2)
+#define KVM_X86_QUIRK_OUT_7E_INC_RIP		(1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT	(1 << 4)
+#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN	(1 << 5)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
diff --git a/linux-headers/asm-x86/mman.h b/linux-headers/asm-x86/mman.h
index d4a8d0424b..775dbd3aff 100644
--- a/linux-headers/asm-x86/mman.h
+++ b/linux-headers/asm-x86/mman.h
@@ -5,20 +5,6 @@
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
-/*
- * Take the 4 protection key bits out of the vma->vm_flags
- * value and turn them in to the bits that we can put in
- * to a pte.
- *
- * Only override these if Protection Keys are available
- * (which is only on 64-bit).
- */
-#define arch_vm_get_page_prot(vm_flags)	__pgprot(	\
-		((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) |	\
-		((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) |	\
-		((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) |	\
-		((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
-
 #define arch_calc_vm_prot_bits(prot, key) (		\
 		((key) & 0x1 ? VM_PKEY_BIT0 : 0) |      \
 		((key) & 0x2 ? VM_PKEY_BIT1 : 0) |      \
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index f089349149..a0dff15c2b 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -444,6 +444,9 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
+#define KVM_SYSTEM_EVENT_WAKEUP         4
+#define KVM_SYSTEM_EVENT_SUSPEND        5
+#define KVM_SYSTEM_EVENT_SEV_TERM       6
 			__u32 type;
 			__u32 ndata;
 			union {
@@ -644,6 +647,7 @@ struct kvm_vapic_addr {
 #define KVM_MP_STATE_OPERATING         7
 #define KVM_MP_STATE_LOAD              8
 #define KVM_MP_STATE_AP_RESET_HOLD     9
+#define KVM_MP_STATE_SUSPENDED         10
 
 struct kvm_mp_state {
 	__u32 mp_state;
@@ -1148,8 +1152,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
 #define KVM_CAP_PMU_CAPABILITY 212
 #define KVM_CAP_DISABLE_QUIRKS2 213
-/* #define KVM_CAP_VM_TSC_CONTROL 214 */
+#define KVM_CAP_VM_TSC_CONTROL 214
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
+#define KVM_CAP_ARM_SYSTEM_SUSPEND 216
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1238,6 +1243,7 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_SHARED_INFO		(1 << 2)
 #define KVM_XEN_HVM_CONFIG_RUNSTATE		(1 << 3)
 #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL	(1 << 4)
+#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
 
 struct kvm_xen_hvm_config {
 	__u32 flags;
@@ -1476,7 +1482,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
 /* Available with KVM_CAP_PPC_GET_PVINFO */
 #define KVM_PPC_GET_PVINFO	  _IOW(KVMIO,  0xa1, struct kvm_ppc_pvinfo)
-/* Available with KVM_CAP_TSC_CONTROL */
+/* Available with KVM_CAP_TSC_CONTROL for a vCPU, or with
+*  KVM_CAP_VM_TSC_CONTROL to set defaults for a VM */
 #define KVM_SET_TSC_KHZ           _IO(KVMIO,  0xa2)
 #define KVM_GET_TSC_KHZ           _IO(KVMIO,  0xa3)
 /* Available with KVM_CAP_PCI_2_3 */
@@ -1692,6 +1699,32 @@ struct kvm_xen_hvm_attr {
 		struct {
 			__u64 gfn;
 		} shared_info;
+		struct {
+			__u32 send_port;
+			__u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */
+			__u32 flags;
+#define KVM_XEN_EVTCHN_DEASSIGN		(1 << 0)
+#define KVM_XEN_EVTCHN_UPDATE		(1 << 1)
+#define KVM_XEN_EVTCHN_RESET		(1 << 2)
+			/*
+			 * Events sent by the guest are either looped back to
+			 * the guest itself (potentially on a different port#)
+			 * or signalled via an eventfd.
+			 */
+			union {
+				struct {
+					__u32 port;
+					__u32 vcpu;
+					__u32 priority;
+				} port;
+				struct {
+					__u32 port; /* Zero for eventfd */
+					__s32 fd;
+				} eventfd;
+				__u32 padding[4];
+			} deliver;
+		} evtchn;
+		__u32 xen_version;
 		__u64 pad[8];
 	} u;
 };
@@ -1700,11 +1733,17 @@ struct kvm_xen_hvm_attr {
 #define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
 #define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
 #define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR		0x2
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
+#define KVM_XEN_ATTR_TYPE_EVTCHN		0x3
+#define KVM_XEN_ATTR_TYPE_XEN_VERSION		0x4
 
 /* Per-vCPU Xen attributes */
 #define KVM_XEN_VCPU_GET_ATTR	_IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
 #define KVM_XEN_VCPU_SET_ATTR	_IOW(KVMIO,  0xcb, struct kvm_xen_vcpu_attr)
 
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
+#define KVM_XEN_HVM_EVTCHN_SEND	_IOW(KVMIO,  0xd0, struct kvm_irq_routing_xen_evtchn)
+
 #define KVM_GET_SREGS2             _IOR(KVMIO,  0xcc, struct kvm_sregs2)
 #define KVM_SET_SREGS2             _IOW(KVMIO,  0xcd, struct kvm_sregs2)
 
@@ -1722,6 +1761,13 @@ struct kvm_xen_vcpu_attr {
 			__u64 time_blocked;
 			__u64 time_offline;
 		} runstate;
+		__u32 vcpu_id;
+		struct {
+			__u32 port;
+			__u32 priority;
+			__u64 expires_ns;
+		} timer;
+		__u8 vector;
 	} u;
 };
 
@@ -1732,6 +1778,10 @@ struct kvm_xen_vcpu_attr {
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT	0x3
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA	0x4
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST	0x5
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID		0x6
+#define KVM_XEN_VCPU_ATTR_TYPE_TIMER		0x7
+#define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR	0x8
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
@@ -2032,7 +2082,7 @@ struct kvm_stats_header {
 #define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
 #define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
 #define KVM_STATS_UNIT_BOOLEAN		(0x4 << KVM_STATS_UNIT_SHIFT)
-#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
+#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_BOOLEAN
 
 #define KVM_STATS_BASE_SHIFT		8
 #define KVM_STATS_BASE_MASK		(0xF << KVM_STATS_BASE_SHIFT)
diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 769b8379e4..a3a377cd44 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -33,7 +33,8 @@
 			   UFFD_FEATURE_THREAD_ID |		\
 			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
 			   UFFD_FEATURE_MINOR_SHMEM |		\
-			   UFFD_FEATURE_EXACT_ADDRESS)
+			   UFFD_FEATURE_EXACT_ADDRESS |		\
+			   UFFD_FEATURE_WP_HUGETLBFS_SHMEM)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -47,7 +48,8 @@
 #define UFFD_API_RANGE_IOCTLS_BASIC		\
 	((__u64)1 << _UFFDIO_WAKE |		\
 	 (__u64)1 << _UFFDIO_COPY |		\
-	 (__u64)1 << _UFFDIO_CONTINUE)
+	 (__u64)1 << _UFFDIO_CONTINUE |		\
+	 (__u64)1 << _UFFDIO_WRITEPROTECT)
 
 /*
  * Valid ioctl command number range with this API is from 0x00 to
@@ -194,6 +196,9 @@ struct uffdio_api {
 	 * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page
 	 * faults would be provided and the offset within the page would not be
 	 * masked.
+	 *
+	 * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd
+	 * write-protection mode is supported on both shmem and hugetlbfs.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -207,6 +212,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
 #define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
 #define UFFD_FEATURE_EXACT_ADDRESS		(1<<11)
+#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM		(1<<12)
 	__u64 features;
 
 	__u64 ioctls;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index e9f7795c39..ede44b5572 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -643,7 +643,7 @@ enum {
 };
 
 /**
- * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
+ * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
  *
  * Return: 0 on success, -errno on failure:
@@ -770,7 +770,7 @@ struct vfio_device_ioeventfd {
 #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
 
 /**
- * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
+ * VFIO_DEVICE_FEATURE - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
  *			       struct vfio_device_feature)
  *
  * Get, set, or probe feature data of the device.  The feature is selected
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index 5d99e7c242..cab645d4a6 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -89,11 +89,6 @@
 
 /* Set or get vhost backend capability */
 
-/* Use message type V2 */
-#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
-/* IOTLB can accept batching hints */
-#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
-
 #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
 #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
 
@@ -150,11 +145,30 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
-
 /* Get the config size */
 #define VHOST_VDPA_GET_CONFIG_SIZE	_IOR(VHOST_VIRTIO, 0x79, __u32)
 
 /* Get the count of all virtqueues */
 #define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
 
+/* Get the number of virtqueue groups. */
+#define VHOST_VDPA_GET_GROUP_NUM	_IOR(VHOST_VIRTIO, 0x81, __u32)
+
+/* Get the number of address spaces. */
+#define VHOST_VDPA_GET_AS_NUM		_IOR(VHOST_VIRTIO, 0x7A, unsigned int)
+
+/* Get the group for a virtqueue: read index, write group in num,
+ * The virtqueue index is stored in the index field of
+ * vhost_vring_state. The group for this specific virtqueue is
+ * returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_GROUP	_IOWR(VHOST_VIRTIO, 0x7B,	\
+					      struct vhost_vring_state)
+/* Set the ASID for a virtqueue group. The group index is stored in
+ * the index field of vhost_vring_state, the ASID associated with this
+ * group is stored at num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_SET_GROUP_ASID	_IOW(VHOST_VIRTIO, 0x7C, \
+					     struct vhost_vring_state)
+
 #endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/6] vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop
  2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 1/6] linux-headers: Update kernel headers Eugenio Pérez
@ 2022-08-05 16:39 ` Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 3/6] vdpa: Allocate SVQ unconditionally Eugenio Pérez
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

This function used to trust in v->shadow_vqs != NULL to know if it must
start svq or not.

This is not going to be valid anymore, as qemu is going to allocate svq
unconditionally (but it will only start them conditionally).

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 4b0cfc0f56..cc71ea750e 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1029,7 +1029,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
     Error *err = NULL;
     unsigned i;
 
-    if (!v->shadow_vqs) {
+    if (!v->shadow_vqs_enabled) {
         return true;
     }
 
@@ -1082,7 +1082,7 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v = dev->opaque;
 
-    if (!v->shadow_vqs) {
+    if (!v->shadow_vqs_enabled) {
         return true;
     }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/6] vdpa: Allocate SVQ unconditionally
  2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 1/6] linux-headers: Update kernel headers Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 2/6] vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop Eugenio Pérez
@ 2022-08-05 16:39 ` Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap Eugenio Pérez
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

SVQ may run or not in a device depending on runtime conditions (for
example, if the device can move CVQ to its own group or not).

Allocate the resources unconditionally, and decide later if to use them
or not.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index cc71ea750e..34922ec20d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -410,6 +410,21 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
     int r;
     bool ok;
 
+    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
+    for (unsigned n = 0; n < hdev->nvqs; ++n) {
+        g_autoptr(VhostShadowVirtqueue) svq;
+
+        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+                            v->shadow_vq_ops_opaque);
+        if (unlikely(!svq)) {
+            error_setg(errp, "Cannot create svq %u", n);
+            return -1;
+        }
+        g_ptr_array_add(shadow_vqs, g_steal_pointer(&svq));
+    }
+
+    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
+
     if (!v->shadow_vqs_enabled) {
         return 0;
     }
@@ -426,20 +441,6 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
         return -1;
     }
 
-    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
-    for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq;
-
-        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
-                            v->shadow_vq_ops_opaque);
-        if (unlikely(!svq)) {
-            error_setg(errp, "Cannot create svq %u", n);
-            return -1;
-        }
-        g_ptr_array_add(shadow_vqs, g_steal_pointer(&svq));
-    }
-
-    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
     return 0;
 }
 
@@ -580,10 +581,6 @@ static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
     struct vhost_vdpa *v = dev->opaque;
     size_t idx;
 
-    if (!v->shadow_vqs) {
-        return;
-    }
-
     for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
         vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
     }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-08-05 16:39 ` [PATCH v4 3/6] vdpa: Allocate SVQ unconditionally Eugenio Pérez
@ 2022-08-05 16:39 ` Eugenio Pérez
  2022-08-09  7:21   ` Jason Wang
  2022-08-05 16:39 ` [PATCH v4 5/6] vdpa: Store x-svq parameter in VhostVDPAState Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode Eugenio Pérez
  5 siblings, 1 reply; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

So the caller can choose which ASID is destined.

No need to update the batch functions as they will always be called from
memory listener updates at the moment. Memory listener updates will
always update ASID 0, as it's the passthrough ASID.

All vhost devices's ASID are 0 at this moment.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4: Add comment specifying behavior if device does not support _F_ASID

v3: Deleted unneeded space
---
 include/hw/virtio/vhost-vdpa.h |  8 +++++---
 hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++----------
 net/vhost-vdpa.c               |  6 +++---
 hw/virtio/trace-events         |  4 ++--
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 1111d85643..6560bb9d78 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
     int index;
     uint32_t msg_type;
     bool iotlb_batch_begin_sent;
+    uint32_t address_space_id;
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
     uint64_t acked_features;
@@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-                       void *vaddr, bool readonly);
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+                       hwaddr size, void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+                         hwaddr size);
 
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 34922ec20d..3eb67b27b7 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -72,22 +72,24 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
     return false;
 }
 
-int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-                       void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+                       hwaddr size, void *vaddr, bool readonly)
 {
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
     int ret = 0;
 
     msg.type = v->msg_type;
+    msg.asid = asid; /* 0 if vdpa device does not support asid */
     msg.iotlb.iova = iova;
     msg.iotlb.size = size;
     msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
     msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
     msg.iotlb.type = VHOST_IOTLB_UPDATE;
 
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
-                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+                             msg.iotlb.type);
 
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
     return ret;
 }
 
-int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
+                         hwaddr size)
 {
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
     int ret = 0;
 
     msg.type = v->msg_type;
+    msg.asid = asid; /* 0 if vdpa device does not support asid */
     msg.iotlb.iova = iova;
     msg.iotlb.size = size;
     msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
 
-    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
+    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
                                msg.iotlb.size, msg.iotlb.type);
 
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
@@ -229,7 +233,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
     }
 
     vhost_vdpa_iotlb_batch_begin_once(v);
-    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
+    ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
                              vaddr, section->readonly);
     if (ret) {
         error_report("vhost vdpa map fail!");
@@ -303,7 +307,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
         vhost_iova_tree_remove(v->iova_tree, result);
     }
     vhost_vdpa_iotlb_batch_begin_once(v);
-    ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
+    ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
     if (ret) {
         error_report("vhost_vdpa dma unmap error!");
     }
@@ -894,7 +898,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
     }
 
     size = ROUND_UP(result->size, qemu_real_host_page_size());
-    r = vhost_vdpa_dma_unmap(v, result->iova, size);
+    r = vhost_vdpa_dma_unmap(v, v->address_space_id, result->iova, size);
     return r == 0;
 }
 
@@ -936,7 +940,8 @@ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
         return false;
     }
 
-    r = vhost_vdpa_dma_map(v, needle->iova, needle->size + 1,
+    r = vhost_vdpa_dma_map(v, v->address_space_id, needle->iova,
+                           needle->size + 1,
                            (void *)(uintptr_t)needle->translated_addr,
                            needle->perm == IOMMU_RO);
     if (unlikely(r != 0)) {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 7c0d600aea..9b932442f0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -239,7 +239,7 @@ static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
         return;
     }
 
-    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
+    r = vhost_vdpa_dma_unmap(v, v->address_space_id, map->iova, map->size + 1);
     if (unlikely(r != 0)) {
         error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
     }
@@ -279,8 +279,8 @@ static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
         return r;
     }
 
-    r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
-                           !write);
+    r = vhost_vdpa_dma_map(v, v->address_space_id, map.iova,
+                           vhost_vdpa_net_cvq_cmd_page_len(), buf, !write);
     if (unlikely(r < 0)) {
         goto dma_map_err;
     }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 20af2e7ebd..36e5ae75f6 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -26,8 +26,8 @@ vhost_user_write(uint32_t req, uint32_t flags) "req:%d flags:0x%"PRIx32""
 vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
 
 # vhost-vdpa.c
-vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
-vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
+vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
+vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
 vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/6] vdpa: Store x-svq parameter in VhostVDPAState
  2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-08-05 16:39 ` [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap Eugenio Pérez
@ 2022-08-05 16:39 ` Eugenio Pérez
  2022-08-05 16:39 ` [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode Eugenio Pérez
  5 siblings, 0 replies; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

CVQ can be shadowed two ways:
- Device has x-svq=on parameter (current way)
- The device can isolate CVQ in its own vq group

QEMU needs to check for the second condition dynamically, because CVQ
index is not known at initialization time. Since this is dynamic, the
CVQ isolation could vary with different conditions, making it possible
to go from "not isolated group" to "isolated".

Saving the cmdline parameter in an extra field so we never disable CVQ
SVQ in case the device was started with cmdline.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 9b932442f0..bf78b1332f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -36,6 +36,9 @@ typedef struct VhostVDPAState {
 
     /* Control commands shadow buffers */
     void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
+
+    /* The device always have SVQ enabled */
+    bool always_svq;
     bool started;
 } VhostVDPAState;
 
@@ -546,6 +549,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->always_svq = svq;
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_tree = iova_tree;
     if (!is_datapath) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode
  2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-08-05 16:39 ` [PATCH v4 5/6] vdpa: Store x-svq parameter in VhostVDPAState Eugenio Pérez
@ 2022-08-05 16:39 ` Eugenio Pérez
  2022-08-09  7:33   ` Jason Wang
  5 siblings, 1 reply; 13+ messages in thread
From: Eugenio Pérez @ 2022-08-05 16:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Eli Cohen, Zhu Lingshan, Michael S. Tsirkin,
	Gautam Dawar, Stefano Garzarella, Parav Pandit, Cindy Lu,
	Gonglei (Arei),
	Jason Wang, Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong,
	Laurent Vivier, Harpreet Singh Anand

Isolate control virtqueue in its own group, allowing to intercept control
commands but letting dataplane run totally passthrough to the guest.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v4:
* Squash vhost_vdpa_cvq_group_is_independent.
* Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
* Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
  that callback registered in that NetClientInfo.

v3:
* Make asid related queries print a warning instead of returning an
  error and stop the start of qemu.
---
 hw/virtio/vhost-vdpa.c |   3 +-
 net/vhost-vdpa.c       | 124 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 122 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3eb67b27b7..69f34f4cc5 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -678,7 +678,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
     uint64_t features;
     uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
     int r;
 
     if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index bf78b1332f..c820a5fd9f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
     /* Control commands shadow buffers */
     void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
 
+    /* Number of address spaces supported by the device */
+    unsigned address_space_num;
+
     /* The device always have SVQ enabled */
     bool always_svq;
     bool started;
@@ -100,6 +103,9 @@ static const uint64_t vdpa_svq_device_features =
     BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
     BIT_ULL(VIRTIO_NET_F_STANDBY);
 
+#define VHOST_VDPA_NET_CVQ_PASSTHROUGH 0
+#define VHOST_VDPA_NET_CVQ_ASID 1
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -224,6 +230,37 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_get_vring_group(int device_fd,
+                                       struct vhost_vring_state *state)
+{
+    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
+    if (unlikely(r < 0)) {
+        /*
+         * Assume all groups are 0, the consequences are the same and we will
+         * not abort device creation
+         */
+        state->num = 0;
+    }
+}
+
+static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
+                                           unsigned vq_group,
+                                           unsigned asid_num)
+{
+    struct vhost_vring_state asid = {
+        .index = vq_group,
+        .num = asid_num,
+    };
+    int ret;
+
+    ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
+    if (unlikely(ret < 0)) {
+        warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
+            asid.index, asid.num, errno, g_strerror(errno));
+    }
+    return ret;
+}
+
 static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
 {
     VhostIOVATree *tree = v->iova_tree;
@@ -298,11 +335,55 @@ dma_map_err:
 static int vhost_vdpa_net_cvq_prepare(NetClientState *nc)
 {
     VhostVDPAState *s;
+    struct vhost_vdpa *v;
+    struct vhost_vring_state cvq_group = {};
     int r;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
     s = DO_UPCAST(VhostVDPAState, nc, nc);
+    v = &s->vhost_vdpa;
+    cvq_group.index = v->dev->vq_index_end - 1;
+
+    /* Default values */
+    v->shadow_vqs_enabled = false;
+    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_PASSTHROUGH;
+
+    if (s->always_svq) {
+        v->shadow_vqs_enabled = true;
+        goto out;
+    }
+
+    if (s->address_space_num < 2) {
+        return 0;
+    }
+
+    /**
+     * Check if all the virtqueues of the virtio device are in a different vq
+     * than the last vq. VQ group of last group passed in cvq_group.
+     */
+    vhost_vdpa_get_vring_group(v->device_fd, &cvq_group);
+    for (int i = 0; i < (v->dev->vq_index_end - 1); ++i) {
+        struct vhost_vring_state vq_group = {
+            .index = i,
+        };
+
+        vhost_vdpa_get_vring_group(v->device_fd, &vq_group);
+        if (unlikely(vq_group.num == cvq_group.num)) {
+            warn_report("CVQ %u group is the same as VQ %u one (%u)",
+                         cvq_group.index, vq_group.index, cvq_group.num);
+            return 0;
+        }
+    }
+
+    r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
+                                        VHOST_VDPA_NET_CVQ_ASID);
+    if (r == 0) {
+        v->shadow_vqs_enabled = true;
+        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
+    }
+
+out:
     if (!s->vhost_vdpa.shadow_vqs_enabled) {
         return 0;
     }
@@ -523,12 +604,38 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
     .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
 };
 
+static uint32_t vhost_vdpa_get_as_num(int vdpa_device_fd)
+{
+    uint64_t features;
+    unsigned num_as;
+    int r;
+
+    r = ioctl(vdpa_device_fd, VHOST_GET_BACKEND_FEATURES, &features);
+    if (unlikely(r < 0)) {
+        warn_report("Cannot get backend features");
+        return 1;
+    }
+
+    if (!(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
+        return 1;
+    }
+
+    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
+    if (unlikely(r < 0)) {
+        warn_report("Cannot retrieve number of supported ASs");
+        return 1;
+    }
+
+    return num_as;
+}
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            const char *device,
                                            const char *name,
                                            int vdpa_device_fd,
                                            int queue_pair_index,
                                            int nvqs,
+                                           unsigned nas,
                                            bool is_datapath,
                                            bool svq,
                                            VhostIOVATree *iova_tree)
@@ -547,6 +654,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
     s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+    s->address_space_num = nas;
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
     s->always_svq = svq;
@@ -632,6 +740,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     g_autoptr(VhostIOVATree) iova_tree = NULL;
     NetClientState *nc;
     int queue_pairs, r, i = 0, has_cvq = 0;
+    unsigned num_as = 1;
+    bool svq_cvq;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -656,7 +766,13 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         goto err;
     }
 
-    if (opts->x_svq) {
+    svq_cvq = opts->x_svq;
+    if (has_cvq && !opts->x_svq) {
+        num_as = vhost_vdpa_get_as_num(vdpa_device_fd);
+        svq_cvq = num_as > 1;
+    }
+
+    if (opts->x_svq || svq_cvq) {
         struct vhost_vdpa_iova_range iova_range;
 
         uint64_t invalid_dev_features =
@@ -679,15 +795,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true, opts->x_svq,
-                                     iova_tree);
+                                     vdpa_device_fd, i, 2, num_as, true,
+                                     opts->x_svq, iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false,
+                                 vdpa_device_fd, i, 1, num_as, false,
                                  opts->x_svq, iova_tree);
         if (!nc)
             goto err;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  2022-08-05 16:39 ` [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap Eugenio Pérez
@ 2022-08-09  7:21   ` Jason Wang
  2022-08-09 17:03     ` Eugenio Perez Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-08-09  7:21 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Stefan Hajnoczi, Eli Cohen, Zhu Lingshan,
	Michael S. Tsirkin, Gautam Dawar, Stefano Garzarella,
	Parav Pandit, Cindy Lu, Gonglei (Arei),
	Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong, Laurent Vivier,
	Harpreet Singh Anand

On Sat, Aug 6, 2022 at 12:39 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> So the caller can choose which ASID is destined.
>
> No need to update the batch functions as they will always be called from
> memory listener updates at the moment. Memory listener updates will
> always update ASID 0, as it's the passthrough ASID.
>
> All vhost devices's ASID are 0 at this moment.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v4: Add comment specifying behavior if device does not support _F_ASID
>
> v3: Deleted unneeded space
> ---
>  include/hw/virtio/vhost-vdpa.h |  8 +++++---
>  hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++----------
>  net/vhost-vdpa.c               |  6 +++---
>  hw/virtio/trace-events         |  4 ++--
>  4 files changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 1111d85643..6560bb9d78 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
>      int index;
>      uint32_t msg_type;
>      bool iotlb_batch_begin_sent;
> +    uint32_t address_space_id;
>      MemoryListener listener;
>      struct vhost_vdpa_iova_range iova_range;
>      uint64_t acked_features;
> @@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
>      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostVDPA;
>
> -int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> -                       void *vaddr, bool readonly);
> -int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
> +int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> +                       hwaddr size, void *vaddr, bool readonly);
> +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> +                         hwaddr size);
>
>  #endif
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 34922ec20d..3eb67b27b7 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -72,22 +72,24 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>      return false;
>  }
>
> -int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> -                       void *vaddr, bool readonly)
> +int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> +                       hwaddr size, void *vaddr, bool readonly)
>  {
>      struct vhost_msg_v2 msg = {};
>      int fd = v->device_fd;
>      int ret = 0;
>
>      msg.type = v->msg_type;
> +    msg.asid = asid; /* 0 if vdpa device does not support asid */

So this comment is still kind of confusing.

Does it mean the caller can guarantee that asid is 0 when ASID is not
supported? Even if this is true, does it silently depend on the
behaviour that the asid field is extended from the reserved field of
the ABI?

(I still wonder if it's better to avoid using msg.asid if the kernel
doesn't support that).

Thanks

>      msg.iotlb.iova = iova;
>      msg.iotlb.size = size;
>      msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
>      msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
>      msg.iotlb.type = VHOST_IOTLB_UPDATE;
>
> -   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
> -                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
> +    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> +                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
> +                             msg.iotlb.type);
>
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
> @@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
>      return ret;
>  }
>
> -int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
> +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> +                         hwaddr size)
>  {
>      struct vhost_msg_v2 msg = {};
>      int fd = v->device_fd;
>      int ret = 0;
>
>      msg.type = v->msg_type;
> +    msg.asid = asid; /* 0 if vdpa device does not support asid */
>      msg.iotlb.iova = iova;
>      msg.iotlb.size = size;
>      msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
>
> -    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
> +    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
>                                 msg.iotlb.size, msg.iotlb.type);
>
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> @@ -229,7 +233,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>      }
>
>      vhost_vdpa_iotlb_batch_begin_once(v);
> -    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> +    ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
>                               vaddr, section->readonly);
>      if (ret) {
>          error_report("vhost vdpa map fail!");
> @@ -303,7 +307,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>          vhost_iova_tree_remove(v->iova_tree, result);
>      }
>      vhost_vdpa_iotlb_batch_begin_once(v);
> -    ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> +    ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
>      if (ret) {
>          error_report("vhost_vdpa dma unmap error!");
>      }
> @@ -894,7 +898,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
>      }
>
>      size = ROUND_UP(result->size, qemu_real_host_page_size());
> -    r = vhost_vdpa_dma_unmap(v, result->iova, size);
> +    r = vhost_vdpa_dma_unmap(v, v->address_space_id, result->iova, size);
>      return r == 0;
>  }
>
> @@ -936,7 +940,8 @@ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
>          return false;
>      }
>
> -    r = vhost_vdpa_dma_map(v, needle->iova, needle->size + 1,
> +    r = vhost_vdpa_dma_map(v, v->address_space_id, needle->iova,
> +                           needle->size + 1,
>                             (void *)(uintptr_t)needle->translated_addr,
>                             needle->perm == IOMMU_RO);
>      if (unlikely(r != 0)) {
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 7c0d600aea..9b932442f0 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -239,7 +239,7 @@ static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
>          return;
>      }
>
> -    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
> +    r = vhost_vdpa_dma_unmap(v, v->address_space_id, map->iova, map->size + 1);
>      if (unlikely(r != 0)) {
>          error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
>      }
> @@ -279,8 +279,8 @@ static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
>          return r;
>      }
>
> -    r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
> -                           !write);
> +    r = vhost_vdpa_dma_map(v, v->address_space_id, map.iova,
> +                           vhost_vdpa_net_cvq_cmd_page_len(), buf, !write);
>      if (unlikely(r < 0)) {
>          goto dma_map_err;
>      }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 20af2e7ebd..36e5ae75f6 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -26,8 +26,8 @@ vhost_user_write(uint32_t req, uint32_t flags) "req:%d flags:0x%"PRIx32""
>  vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>
>  # vhost-vdpa.c
> -vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> -vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> +vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> +vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
>  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> --
> 2.31.1
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode
  2022-08-05 16:39 ` [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode Eugenio Pérez
@ 2022-08-09  7:33   ` Jason Wang
  2022-08-09  7:53     ` Eugenio Perez Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-08-09  7:33 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Stefan Hajnoczi, Eli Cohen, Zhu Lingshan,
	Michael S. Tsirkin, Gautam Dawar, Stefano Garzarella,
	Parav Pandit, Cindy Lu, Gonglei (Arei),
	Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong, Laurent Vivier,
	Harpreet Singh Anand

On Sat, Aug 6, 2022 at 12:39 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Isolate control virtqueue in its own group, allowing to intercept control
> commands but letting dataplane run totally passthrough to the guest.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v4:
> * Squash vhost_vdpa_cvq_group_is_independent.
> * Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
> * Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
>   that callback registered in that NetClientInfo.
>
> v3:
> * Make asid related queries print a warning instead of returning an
>   error and stop the start of qemu.
> ---
>  hw/virtio/vhost-vdpa.c |   3 +-
>  net/vhost-vdpa.c       | 124 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 122 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 3eb67b27b7..69f34f4cc5 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -678,7 +678,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>  {
>      uint64_t features;
>      uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
>      int r;
>
>      if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index bf78b1332f..c820a5fd9f 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
>      /* Control commands shadow buffers */
>      void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
>
> +    /* Number of address spaces supported by the device */
> +    unsigned address_space_num;
> +
>      /* The device always have SVQ enabled */
>      bool always_svq;
>      bool started;
> @@ -100,6 +103,9 @@ static const uint64_t vdpa_svq_device_features =
>      BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
>      BIT_ULL(VIRTIO_NET_F_STANDBY);
>
> +#define VHOST_VDPA_NET_CVQ_PASSTHROUGH 0

We need a better name for the macro since it's not easy to see it's an asid.

> +#define VHOST_VDPA_NET_CVQ_ASID 1
> +
>  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> @@ -224,6 +230,37 @@ static NetClientInfo net_vhost_vdpa_info = {
>          .check_peer_type = vhost_vdpa_check_peer_type,
>  };
>
> +static void vhost_vdpa_get_vring_group(int device_fd,
> +                                       struct vhost_vring_state *state)
> +{
> +    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);

Let's hide vhost_vring_state from the caller and simply accept a vq
index parameter (as the below function did) then we can return the
group id.

The hides low level ABI and simplify the caller's code.

> +    if (unlikely(r < 0)) {
> +        /*
> +         * Assume all groups are 0, the consequences are the same and we will
> +         * not abort device creation
> +         */
> +        state->num = 0;
> +    }
> +}
> +
> +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> +                                           unsigned vq_group,
> +                                           unsigned asid_num)
> +{
> +    struct vhost_vring_state asid = {
> +        .index = vq_group,
> +        .num = asid_num,
> +    };
> +    int ret;
> +
> +    ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> +    if (unlikely(ret < 0)) {
> +        warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
> +            asid.index, asid.num, errno, g_strerror(errno));
> +    }
> +    return ret;
> +}
> +
>  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
>  {
>      VhostIOVATree *tree = v->iova_tree;
> @@ -298,11 +335,55 @@ dma_map_err:
>  static int vhost_vdpa_net_cvq_prepare(NetClientState *nc)
>  {
>      VhostVDPAState *s;
> +    struct vhost_vdpa *v;
> +    struct vhost_vring_state cvq_group = {};
>      int r;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
>      s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    v = &s->vhost_vdpa;
> +    cvq_group.index = v->dev->vq_index_end - 1;
> +
> +    /* Default values */

Code can explain itself so this comment is probably useless.

> +    v->shadow_vqs_enabled = false;
> +    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_PASSTHROUGH;
> +
> +    if (s->always_svq) {
> +        v->shadow_vqs_enabled = true;

The name of the variable is suboptimal.

I think we need to differ:

1) shadow all virtqueues

from

2) shadow cvq only

Thanks

> +        goto out;
> +    }
> +
> +    if (s->address_space_num < 2) {
> +        return 0;
> +    }
> +
> +    /**
> +     * Check if all the virtqueues of the virtio device are in a different vq
> +     * than the last vq. VQ group of last group passed in cvq_group.
> +     */
> +    vhost_vdpa_get_vring_group(v->device_fd, &cvq_group);
> +    for (int i = 0; i < (v->dev->vq_index_end - 1); ++i) {
> +        struct vhost_vring_state vq_group = {
> +            .index = i,
> +        };
> +
> +        vhost_vdpa_get_vring_group(v->device_fd, &vq_group);
> +        if (unlikely(vq_group.num == cvq_group.num)) {
> +            warn_report("CVQ %u group is the same as VQ %u one (%u)",
> +                         cvq_group.index, vq_group.index, cvq_group.num);
> +            return 0;
> +        }
> +    }
> +
> +    r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
> +                                        VHOST_VDPA_NET_CVQ_ASID);
> +    if (r == 0) {
> +        v->shadow_vqs_enabled = true;
> +        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> +    }
> +
> +out:
>      if (!s->vhost_vdpa.shadow_vqs_enabled) {
>          return 0;
>      }
> @@ -523,12 +604,38 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
>      .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
>  };
>
> +static uint32_t vhost_vdpa_get_as_num(int vdpa_device_fd)
> +{
> +    uint64_t features;
> +    unsigned num_as;
> +    int r;
> +
> +    r = ioctl(vdpa_device_fd, VHOST_GET_BACKEND_FEATURES, &features);
> +    if (unlikely(r < 0)) {
> +        warn_report("Cannot get backend features");
> +        return 1;
> +    }
> +
> +    if (!(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> +        return 1;
> +    }
> +
> +    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
> +    if (unlikely(r < 0)) {
> +        warn_report("Cannot retrieve number of supported ASs");
> +        return 1;
> +    }
> +
> +    return num_as;
> +}
> +
>  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                             const char *device,
>                                             const char *name,
>                                             int vdpa_device_fd,
>                                             int queue_pair_index,
>                                             int nvqs,
> +                                           unsigned nas,
>                                             bool is_datapath,
>                                             bool svq,
>                                             VhostIOVATree *iova_tree)
> @@ -547,6 +654,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
>      s = DO_UPCAST(VhostVDPAState, nc, nc);
>
> +    s->address_space_num = nas;
>      s->vhost_vdpa.device_fd = vdpa_device_fd;
>      s->vhost_vdpa.index = queue_pair_index;
>      s->always_svq = svq;
> @@ -632,6 +740,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>      g_autoptr(VhostIOVATree) iova_tree = NULL;
>      NetClientState *nc;
>      int queue_pairs, r, i = 0, has_cvq = 0;
> +    unsigned num_as = 1;
> +    bool svq_cvq;
>
>      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>      opts = &netdev->u.vhost_vdpa;
> @@ -656,7 +766,13 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>          goto err;
>      }
>
> -    if (opts->x_svq) {
> +    svq_cvq = opts->x_svq;
> +    if (has_cvq && !opts->x_svq) {
> +        num_as = vhost_vdpa_get_as_num(vdpa_device_fd);
> +        svq_cvq = num_as > 1;
> +    }
> +
> +    if (opts->x_svq || svq_cvq) {
>          struct vhost_vdpa_iova_range iova_range;
>
>          uint64_t invalid_dev_features =
> @@ -679,15 +795,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>
>      for (i = 0; i < queue_pairs; i++) {
>          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true, opts->x_svq,
> -                                     iova_tree);
> +                                     vdpa_device_fd, i, 2, num_as, true,
> +                                     opts->x_svq, iova_tree);
>          if (!ncs[i])
>              goto err;
>      }
>
>      if (has_cvq) {
>          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false,
> +                                 vdpa_device_fd, i, 1, num_as, false,
>                                   opts->x_svq, iova_tree);
>          if (!nc)
>              goto err;
> --
> 2.31.1
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode
  2022-08-09  7:33   ` Jason Wang
@ 2022-08-09  7:53     ` Eugenio Perez Martin
  2022-08-19  6:18       ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Eugenio Perez Martin @ 2022-08-09  7:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefan Hajnoczi, Eli Cohen, Zhu Lingshan,
	Michael S. Tsirkin, Gautam Dawar, Stefano Garzarella,
	Parav Pandit, Cindy Lu, Gonglei (Arei),
	Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong, Laurent Vivier,
	Harpreet Singh Anand

On Tue, Aug 9, 2022 at 9:36 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Sat, Aug 6, 2022 at 12:39 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Isolate control virtqueue in its own group, allowing to intercept control
> > commands but letting dataplane run totally passthrough to the guest.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v4:
> > * Squash vhost_vdpa_cvq_group_is_independent.
> > * Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
> > * Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
> >   that callback registered in that NetClientInfo.
> >
> > v3:
> > * Make asid related queries print a warning instead of returning an
> >   error and stop the start of qemu.
> > ---
> >  hw/virtio/vhost-vdpa.c |   3 +-
> >  net/vhost-vdpa.c       | 124 +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 122 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 3eb67b27b7..69f34f4cc5 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -678,7 +678,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >  {
> >      uint64_t features;
> >      uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> > -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> > +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> > +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> >      int r;
> >
> >      if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index bf78b1332f..c820a5fd9f 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
> >      /* Control commands shadow buffers */
> >      void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
> >
> > +    /* Number of address spaces supported by the device */
> > +    unsigned address_space_num;
> > +
> >      /* The device always have SVQ enabled */
> >      bool always_svq;
> >      bool started;
> > @@ -100,6 +103,9 @@ static const uint64_t vdpa_svq_device_features =
> >      BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> >      BIT_ULL(VIRTIO_NET_F_STANDBY);
> >
> > +#define VHOST_VDPA_NET_CVQ_PASSTHROUGH 0
>
> We need a better name for the macro since it's not easy to see it's an asid.
>

VHOST_VDPA_NET_DATA_ASID?

> > +#define VHOST_VDPA_NET_CVQ_ASID 1
> > +
> >  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> >  {
> >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > @@ -224,6 +230,37 @@ static NetClientInfo net_vhost_vdpa_info = {
> >          .check_peer_type = vhost_vdpa_check_peer_type,
> >  };
> >
> > +static void vhost_vdpa_get_vring_group(int device_fd,
> > +                                       struct vhost_vring_state *state)
> > +{
> > +    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
>
> Let's hide vhost_vring_state from the caller and simply accept a vq
> index parameter (as the below function did) then we can return the
> group id.
>
> The hides low level ABI and simplify the caller's code.
>

We need to return an error.

The example is a little bit pathological, but if we get that CVQ is on
vq group != 0, and all data vqs returns an error reading the vq group,
I think we shouldn't assume they belong to the asid 0. Some of them
could be on the same vq group as cvq, making all of this fail.

Can we assume the vq group is an uint32_t? Would it work to return an
int64_t with the possibility of errors?

> > +    if (unlikely(r < 0)) {
> > +        /*
> > +         * Assume all groups are 0, the consequences are the same and we will
> > +         * not abort device creation
> > +         */
> > +        state->num = 0;
> > +    }
> > +}
> > +
> > +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> > +                                           unsigned vq_group,
> > +                                           unsigned asid_num)
> > +{
> > +    struct vhost_vring_state asid = {
> > +        .index = vq_group,
> > +        .num = asid_num,
> > +    };
> > +    int ret;
> > +
> > +    ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> > +    if (unlikely(ret < 0)) {
> > +        warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
> > +            asid.index, asid.num, errno, g_strerror(errno));
> > +    }
> > +    return ret;
> > +}
> > +
> >  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> >  {
> >      VhostIOVATree *tree = v->iova_tree;
> > @@ -298,11 +335,55 @@ dma_map_err:
> >  static int vhost_vdpa_net_cvq_prepare(NetClientState *nc)
> >  {
> >      VhostVDPAState *s;
> > +    struct vhost_vdpa *v;
> > +    struct vhost_vring_state cvq_group = {};
> >      int r;
> >
> >      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> >      s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    v = &s->vhost_vdpa;
> > +    cvq_group.index = v->dev->vq_index_end - 1;
> > +
> > +    /* Default values */
>
> Code can explain itself so this comment is probably useless.
>

I'll delete.

> > +    v->shadow_vqs_enabled = false;
> > +    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_PASSTHROUGH;
> > +
> > +    if (s->always_svq) {
> > +        v->shadow_vqs_enabled = true;
>
> The name of the variable is suboptimal.
>
> I think we need to differ:
>
> 1) shadow all virtqueues
>
> from
>
> 2) shadow cvq only
>

Note that shadow_vqs_enabled is per vhost_vdpa, so
v->shadow_vqs_enabled means that vqs of *v* are shadowed.

Data vhost_vdpa can have a different value than CVQ vhost_vdpa.

Having said that, I'm ok with changing the name of the variable, but I
cannot come with a better one.

Thanks!

> Thanks
>
> > +        goto out;
> > +    }
> > +
> > +    if (s->address_space_num < 2) {
> > +        return 0;
> > +    }
> > +
> > +    /**
> > +     * Check if all the virtqueues of the virtio device are in a different vq
> > +     * than the last vq. VQ group of last group passed in cvq_group.
> > +     */
> > +    vhost_vdpa_get_vring_group(v->device_fd, &cvq_group);
> > +    for (int i = 0; i < (v->dev->vq_index_end - 1); ++i) {
> > +        struct vhost_vring_state vq_group = {
> > +            .index = i,
> > +        };
> > +
> > +        vhost_vdpa_get_vring_group(v->device_fd, &vq_group);
> > +        if (unlikely(vq_group.num == cvq_group.num)) {
> > +            warn_report("CVQ %u group is the same as VQ %u one (%u)",
> > +                         cvq_group.index, vq_group.index, cvq_group.num);
> > +            return 0;
> > +        }
> > +    }
> > +
> > +    r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
> > +                                        VHOST_VDPA_NET_CVQ_ASID);
> > +    if (r == 0) {
> > +        v->shadow_vqs_enabled = true;
> > +        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> > +    }
> > +
> > +out:
> >      if (!s->vhost_vdpa.shadow_vqs_enabled) {
> >          return 0;
> >      }
> > @@ -523,12 +604,38 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> >      .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> >  };
> >
> > +static uint32_t vhost_vdpa_get_as_num(int vdpa_device_fd)
> > +{
> > +    uint64_t features;
> > +    unsigned num_as;
> > +    int r;
> > +
> > +    r = ioctl(vdpa_device_fd, VHOST_GET_BACKEND_FEATURES, &features);
> > +    if (unlikely(r < 0)) {
> > +        warn_report("Cannot get backend features");
> > +        return 1;
> > +    }
> > +
> > +    if (!(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> > +        return 1;
> > +    }
> > +
> > +    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
> > +    if (unlikely(r < 0)) {
> > +        warn_report("Cannot retrieve number of supported ASs");
> > +        return 1;
> > +    }
> > +
> > +    return num_as;
> > +}
> > +
> >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                             const char *device,
> >                                             const char *name,
> >                                             int vdpa_device_fd,
> >                                             int queue_pair_index,
> >                                             int nvqs,
> > +                                           unsigned nas,
> >                                             bool is_datapath,
> >                                             bool svq,
> >                                             VhostIOVATree *iova_tree)
> > @@ -547,6 +654,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >      snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
> >      s = DO_UPCAST(VhostVDPAState, nc, nc);
> >
> > +    s->address_space_num = nas;
> >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> >      s->vhost_vdpa.index = queue_pair_index;
> >      s->always_svq = svq;
> > @@ -632,6 +740,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >      g_autoptr(VhostIOVATree) iova_tree = NULL;
> >      NetClientState *nc;
> >      int queue_pairs, r, i = 0, has_cvq = 0;
> > +    unsigned num_as = 1;
> > +    bool svq_cvq;
> >
> >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >      opts = &netdev->u.vhost_vdpa;
> > @@ -656,7 +766,13 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >          goto err;
> >      }
> >
> > -    if (opts->x_svq) {
> > +    svq_cvq = opts->x_svq;
> > +    if (has_cvq && !opts->x_svq) {
> > +        num_as = vhost_vdpa_get_as_num(vdpa_device_fd);
> > +        svq_cvq = num_as > 1;
> > +    }
> > +
> > +    if (opts->x_svq || svq_cvq) {
> >          struct vhost_vdpa_iova_range iova_range;
> >
> >          uint64_t invalid_dev_features =
> > @@ -679,15 +795,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >
> >      for (i = 0; i < queue_pairs; i++) {
> >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                     vdpa_device_fd, i, 2, true, opts->x_svq,
> > -                                     iova_tree);
> > +                                     vdpa_device_fd, i, 2, num_as, true,
> > +                                     opts->x_svq, iova_tree);
> >          if (!ncs[i])
> >              goto err;
> >      }
> >
> >      if (has_cvq) {
> >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                 vdpa_device_fd, i, 1, false,
> > +                                 vdpa_device_fd, i, 1, num_as, false,
> >                                   opts->x_svq, iova_tree);
> >          if (!nc)
> >              goto err;
> > --
> > 2.31.1
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  2022-08-09  7:21   ` Jason Wang
@ 2022-08-09 17:03     ` Eugenio Perez Martin
  2022-08-19  4:49       ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Eugenio Perez Martin @ 2022-08-09 17:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Stefan Hajnoczi, Eli Cohen, Zhu Lingshan,
	Michael S. Tsirkin, Gautam Dawar, Stefano Garzarella,
	Parav Pandit, Cindy Lu, Gonglei (Arei),
	Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong, Laurent Vivier,
	Harpreet Singh Anand

On Tue, Aug 9, 2022 at 9:21 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Sat, Aug 6, 2022 at 12:39 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > So the caller can choose which ASID is destined.
> >
> > No need to update the batch functions as they will always be called from
> > memory listener updates at the moment. Memory listener updates will
> > always update ASID 0, as it's the passthrough ASID.
> >
> > All vhost devices's ASID are 0 at this moment.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v4: Add comment specifying behavior if device does not support _F_ASID
> >
> > v3: Deleted unneeded space
> > ---
> >  include/hw/virtio/vhost-vdpa.h |  8 +++++---
> >  hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++----------
> >  net/vhost-vdpa.c               |  6 +++---
> >  hw/virtio/trace-events         |  4 ++--
> >  4 files changed, 25 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 1111d85643..6560bb9d78 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
> >      int index;
> >      uint32_t msg_type;
> >      bool iotlb_batch_begin_sent;
> > +    uint32_t address_space_id;
> >      MemoryListener listener;
> >      struct vhost_vdpa_iova_range iova_range;
> >      uint64_t acked_features;
> > @@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
> >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >  } VhostVDPA;
> >
> > -int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> > -                       void *vaddr, bool readonly);
> > -int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
> > +int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > +                       hwaddr size, void *vaddr, bool readonly);
> > +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > +                         hwaddr size);
> >
> >  #endif
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 34922ec20d..3eb67b27b7 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -72,22 +72,24 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> >      return false;
> >  }
> >
> > -int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> > -                       void *vaddr, bool readonly)
> > +int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > +                       hwaddr size, void *vaddr, bool readonly)
> >  {
> >      struct vhost_msg_v2 msg = {};
> >      int fd = v->device_fd;
> >      int ret = 0;
> >
> >      msg.type = v->msg_type;
> > +    msg.asid = asid; /* 0 if vdpa device does not support asid */
>
> So this comment is still kind of confusing.
>
> Does it mean the caller can guarantee that asid is 0 when ASID is not
> supported?

That's right.

> Even if this is true, does it silently depend on the
> behaviour that the asid field is extended from the reserved field of
> the ABI?
>

I don't get this part.

Regarding the ABI, the reserved bytes will be there either the device
support asid or not, since the actual iotlb message is after the
reserved field. And they were already zeroed by msg = {} on top of the
function. So if the caller always sets asid = 0, there is no change on
this part.

> (I still wonder if it's better to avoid using msg.asid if the kernel
> doesn't support that).
>

We can add a conditional on v->dev->backend_features & _F_ASID.

But that is not the only case where ASID will not be used: If the vq
group does not match with the supported configuration (like if CVQ is
not in the independent group). This case is already handled by setting
all vhost_vdpa of the virtio device to asid = 0, so adding that extra
check seems redundant to me.

I'm not against adding it though: It can prevent bugs. Since it would
be a bug of qemu, maybe it's better to add an assertion?

Thanks!

> Thanks
>
> >      msg.iotlb.iova = iova;
> >      msg.iotlb.size = size;
> >      msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
> >      msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
> >      msg.iotlb.type = VHOST_IOTLB_UPDATE;
> >
> > -   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
> > -                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
> > +    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> > +                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
> > +                             msg.iotlb.type);
> >
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> >          error_report("failed to write, fd=%d, errno=%d (%s)",
> > @@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> >      return ret;
> >  }
> >
> > -int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
> > +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > +                         hwaddr size)
> >  {
> >      struct vhost_msg_v2 msg = {};
> >      int fd = v->device_fd;
> >      int ret = 0;
> >
> >      msg.type = v->msg_type;
> > +    msg.asid = asid; /* 0 if vdpa device does not support asid */
> >      msg.iotlb.iova = iova;
> >      msg.iotlb.size = size;
> >      msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
> >
> > -    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
> > +    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> >                                 msg.iotlb.size, msg.iotlb.type);
> >
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > @@ -229,7 +233,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> >      }
> >
> >      vhost_vdpa_iotlb_batch_begin_once(v);
> > -    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > +    ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
> >                               vaddr, section->readonly);
> >      if (ret) {
> >          error_report("vhost vdpa map fail!");
> > @@ -303,7 +307,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> >          vhost_iova_tree_remove(v->iova_tree, result);
> >      }
> >      vhost_vdpa_iotlb_batch_begin_once(v);
> > -    ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> > +    ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
> >      if (ret) {
> >          error_report("vhost_vdpa dma unmap error!");
> >      }
> > @@ -894,7 +898,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> >      }
> >
> >      size = ROUND_UP(result->size, qemu_real_host_page_size());
> > -    r = vhost_vdpa_dma_unmap(v, result->iova, size);
> > +    r = vhost_vdpa_dma_unmap(v, v->address_space_id, result->iova, size);
> >      return r == 0;
> >  }
> >
> > @@ -936,7 +940,8 @@ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
> >          return false;
> >      }
> >
> > -    r = vhost_vdpa_dma_map(v, needle->iova, needle->size + 1,
> > +    r = vhost_vdpa_dma_map(v, v->address_space_id, needle->iova,
> > +                           needle->size + 1,
> >                             (void *)(uintptr_t)needle->translated_addr,
> >                             needle->perm == IOMMU_RO);
> >      if (unlikely(r != 0)) {
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 7c0d600aea..9b932442f0 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -239,7 +239,7 @@ static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> >          return;
> >      }
> >
> > -    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
> > +    r = vhost_vdpa_dma_unmap(v, v->address_space_id, map->iova, map->size + 1);
> >      if (unlikely(r != 0)) {
> >          error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> >      }
> > @@ -279,8 +279,8 @@ static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
> >          return r;
> >      }
> >
> > -    r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
> > -                           !write);
> > +    r = vhost_vdpa_dma_map(v, v->address_space_id, map.iova,
> > +                           vhost_vdpa_net_cvq_cmd_page_len(), buf, !write);
> >      if (unlikely(r < 0)) {
> >          goto dma_map_err;
> >      }
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 20af2e7ebd..36e5ae75f6 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -26,8 +26,8 @@ vhost_user_write(uint32_t req, uint32_t flags) "req:%d flags:0x%"PRIx32""
> >  vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
> >
> >  # vhost-vdpa.c
> > -vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > -vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > +vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > +vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> >  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> >  vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> > --
> > 2.31.1
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap
  2022-08-09 17:03     ` Eugenio Perez Martin
@ 2022-08-19  4:49       ` Jason Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2022-08-19  4:49 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefan Hajnoczi, Eli Cohen, Zhu Lingshan,
	Michael S. Tsirkin, Gautam Dawar, Stefano Garzarella,
	Parav Pandit, Cindy Lu, Gonglei (Arei),
	Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong, Laurent Vivier,
	Harpreet Singh Anand

On Wed, Aug 10, 2022 at 1:04 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Aug 9, 2022 at 9:21 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Aug 6, 2022 at 12:39 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > So the caller can choose which ASID is destined.
> > >
> > > No need to update the batch functions as they will always be called from
> > > memory listener updates at the moment. Memory listener updates will
> > > always update ASID 0, as it's the passthrough ASID.
> > >
> > > All vhost devices's ASID are 0 at this moment.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v4: Add comment specifying behavior if device does not support _F_ASID
> > >
> > > v3: Deleted unneeded space
> > > ---
> > >  include/hw/virtio/vhost-vdpa.h |  8 +++++---
> > >  hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++----------
> > >  net/vhost-vdpa.c               |  6 +++---
> > >  hw/virtio/trace-events         |  4 ++--
> > >  4 files changed, 25 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > > index 1111d85643..6560bb9d78 100644
> > > --- a/include/hw/virtio/vhost-vdpa.h
> > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > @@ -29,6 +29,7 @@ typedef struct vhost_vdpa {
> > >      int index;
> > >      uint32_t msg_type;
> > >      bool iotlb_batch_begin_sent;
> > > +    uint32_t address_space_id;
> > >      MemoryListener listener;
> > >      struct vhost_vdpa_iova_range iova_range;
> > >      uint64_t acked_features;
> > > @@ -42,8 +43,9 @@ typedef struct vhost_vdpa {
> > >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostVDPA;
> > >
> > > -int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> > > -                       void *vaddr, bool readonly);
> > > -int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
> > > +int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > +                       hwaddr size, void *vaddr, bool readonly);
> > > +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > +                         hwaddr size);
> > >
> > >  #endif
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 34922ec20d..3eb67b27b7 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -72,22 +72,24 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> > >      return false;
> > >  }
> > >
> > > -int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> > > -                       void *vaddr, bool readonly)
> > > +int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > +                       hwaddr size, void *vaddr, bool readonly)
> > >  {
> > >      struct vhost_msg_v2 msg = {};
> > >      int fd = v->device_fd;
> > >      int ret = 0;
> > >
> > >      msg.type = v->msg_type;
> > > +    msg.asid = asid; /* 0 if vdpa device does not support asid */
> >
> > So this comment is still kind of confusing.
> >
> > Does it mean the caller can guarantee that asid is 0 when ASID is not
> > supported?
>
> That's right.
>
> > Even if this is true, does it silently depend on the
> > behaviour that the asid field is extended from the reserved field of
> > the ABI?
> >
>
> I don't get this part.
>
> Regarding the ABI, the reserved bytes will be there either the device
> support asid or not, since the actual iotlb message is after the
> reserved field. And they were already zeroed by msg = {} on top of the
> function. So if the caller always sets asid = 0, there is no change on
> this part.
>
> > (I still wonder if it's better to avoid using msg.asid if the kernel
> > doesn't support that).
> >
>
> We can add a conditional on v->dev->backend_features & _F_ASID.
>
> But that is not the only case where ASID will not be used: If the vq
> group does not match with the supported configuration (like if CVQ is
> not in the independent group). This case is already handled by setting
> all vhost_vdpa of the virtio device to asid = 0, so adding that extra
> check seems redundant to me.

I see.

>
> I'm not against adding it though: It can prevent bugs. Since it would
> be a bug of qemu, maybe it's better to add an assertion?

I'd suggest adding a comment here.

Thanks

>
> Thanks!
>
> > Thanks
> >
> > >      msg.iotlb.iova = iova;
> > >      msg.iotlb.size = size;
> > >      msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
> > >      msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
> > >      msg.iotlb.type = VHOST_IOTLB_UPDATE;
> > >
> > > -   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
> > > -                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
> > > +    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> > > +                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
> > > +                             msg.iotlb.type);
> > >
> > >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > >          error_report("failed to write, fd=%d, errno=%d (%s)",
> > > @@ -98,18 +100,20 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> > >      return ret;
> > >  }
> > >
> > > -int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
> > > +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > +                         hwaddr size)
> > >  {
> > >      struct vhost_msg_v2 msg = {};
> > >      int fd = v->device_fd;
> > >      int ret = 0;
> > >
> > >      msg.type = v->msg_type;
> > > +    msg.asid = asid; /* 0 if vdpa device does not support asid */
> > >      msg.iotlb.iova = iova;
> > >      msg.iotlb.size = size;
> > >      msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
> > >
> > > -    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
> > > +    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> > >                                 msg.iotlb.size, msg.iotlb.type);
> > >
> > >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > > @@ -229,7 +233,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > >      }
> > >
> > >      vhost_vdpa_iotlb_batch_begin_once(v);
> > > -    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > > +    ret = vhost_vdpa_dma_map(v, 0, iova, int128_get64(llsize),
> > >                               vaddr, section->readonly);
> > >      if (ret) {
> > >          error_report("vhost vdpa map fail!");
> > > @@ -303,7 +307,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> > >          vhost_iova_tree_remove(v->iova_tree, result);
> > >      }
> > >      vhost_vdpa_iotlb_batch_begin_once(v);
> > > -    ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> > > +    ret = vhost_vdpa_dma_unmap(v, 0, iova, int128_get64(llsize));
> > >      if (ret) {
> > >          error_report("vhost_vdpa dma unmap error!");
> > >      }
> > > @@ -894,7 +898,7 @@ static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> > >      }
> > >
> > >      size = ROUND_UP(result->size, qemu_real_host_page_size());
> > > -    r = vhost_vdpa_dma_unmap(v, result->iova, size);
> > > +    r = vhost_vdpa_dma_unmap(v, v->address_space_id, result->iova, size);
> > >      return r == 0;
> > >  }
> > >
> > > @@ -936,7 +940,8 @@ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
> > >          return false;
> > >      }
> > >
> > > -    r = vhost_vdpa_dma_map(v, needle->iova, needle->size + 1,
> > > +    r = vhost_vdpa_dma_map(v, v->address_space_id, needle->iova,
> > > +                           needle->size + 1,
> > >                             (void *)(uintptr_t)needle->translated_addr,
> > >                             needle->perm == IOMMU_RO);
> > >      if (unlikely(r != 0)) {
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 7c0d600aea..9b932442f0 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -239,7 +239,7 @@ static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> > >          return;
> > >      }
> > >
> > > -    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
> > > +    r = vhost_vdpa_dma_unmap(v, v->address_space_id, map->iova, map->size + 1);
> > >      if (unlikely(r != 0)) {
> > >          error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> > >      }
> > > @@ -279,8 +279,8 @@ static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
> > >          return r;
> > >      }
> > >
> > > -    r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
> > > -                           !write);
> > > +    r = vhost_vdpa_dma_map(v, v->address_space_id, map.iova,
> > > +                           vhost_vdpa_net_cvq_cmd_page_len(), buf, !write);
> > >      if (unlikely(r < 0)) {
> > >          goto dma_map_err;
> > >      }
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index 20af2e7ebd..36e5ae75f6 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -26,8 +26,8 @@ vhost_user_write(uint32_t req, uint32_t flags) "req:%d flags:0x%"PRIx32""
> > >  vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
> > >
> > >  # vhost-vdpa.c
> > > -vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > > -vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > > +vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > > +vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > >  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > >  vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> > > --
> > > 2.31.1
> > >
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode
  2022-08-09  7:53     ` Eugenio Perez Martin
@ 2022-08-19  6:18       ` Jason Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2022-08-19  6:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Stefan Hajnoczi, Eli Cohen, Zhu Lingshan,
	Michael S. Tsirkin, Gautam Dawar, Stefano Garzarella,
	Parav Pandit, Cindy Lu, Gonglei (Arei),
	Cornelia Huck, kvm, Paolo Bonzini, Liuxiangdong, Laurent Vivier,
	Harpreet Singh Anand

On Tue, Aug 9, 2022 at 3:54 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Aug 9, 2022 at 9:36 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Aug 6, 2022 at 12:39 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > Isolate control virtqueue in its own group, allowing to intercept control
> > > commands but letting dataplane run totally passthrough to the guest.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v4:
> > > * Squash vhost_vdpa_cvq_group_is_independent.
> > > * Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
> > > * Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
> > >   that callback registered in that NetClientInfo.
> > >
> > > v3:
> > > * Make asid related queries print a warning instead of returning an
> > >   error and stop the start of qemu.
> > > ---
> > >  hw/virtio/vhost-vdpa.c |   3 +-
> > >  net/vhost-vdpa.c       | 124 +++++++++++++++++++++++++++++++++++++++--
> > >  2 files changed, 122 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 3eb67b27b7..69f34f4cc5 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -678,7 +678,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> > >  {
> > >      uint64_t features;
> > >      uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> > > -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> > > +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> > > +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> > >      int r;
> > >
> > >      if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index bf78b1332f..c820a5fd9f 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -37,6 +37,9 @@ typedef struct VhostVDPAState {
> > >      /* Control commands shadow buffers */
> > >      void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
> > >
> > > +    /* Number of address spaces supported by the device */
> > > +    unsigned address_space_num;
> > > +
> > >      /* The device always have SVQ enabled */
> > >      bool always_svq;
> > >      bool started;
> > > @@ -100,6 +103,9 @@ static const uint64_t vdpa_svq_device_features =
> > >      BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> > >      BIT_ULL(VIRTIO_NET_F_STANDBY);
> > >
> > > +#define VHOST_VDPA_NET_CVQ_PASSTHROUGH 0
> >
> > We need a better name for the macro since it's not easy to see it's an asid.
> >
>
> VHOST_VDPA_NET_DATA_ASID?

Looks fine.

>
> > > +#define VHOST_VDPA_NET_CVQ_ASID 1
> > > +
> > >  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> > >  {
> > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > @@ -224,6 +230,37 @@ static NetClientInfo net_vhost_vdpa_info = {
> > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > >  };
> > >
> > > +static void vhost_vdpa_get_vring_group(int device_fd,
> > > +                                       struct vhost_vring_state *state)
> > > +{
> > > +    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, state);
> >
> > Let's hide vhost_vring_state from the caller and simply accept a vq
> > index parameter (as the below function did) then we can return the
> > group id.
> >
> > The hides low level ABI and simplify the caller's code.
> >
>
> We need to return an error.
>
> The example is a little bit pathological, but if we get that CVQ is on
> vq group != 0, and all data vqs returns an error reading the vq group,
> I think we shouldn't assume they belong to the asid 0. Some of them
> could be on the same vq group as cvq, making all of this fail.
>
> Can we assume the vq group is an uint32_t? Would it work to return an
> int64_t with the possibility of errors?

Yes, or I think a simple int should be fine, I can't image we will
make use of the full 32 bit in near future.

>
> > > +    if (unlikely(r < 0)) {
> > > +        /*
> > > +         * Assume all groups are 0, the consequences are the same and we will
> > > +         * not abort device creation
> > > +         */
> > > +        state->num = 0;
> > > +    }
> > > +}
> > > +
> > > +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> > > +                                           unsigned vq_group,
> > > +                                           unsigned asid_num)
> > > +{
> > > +    struct vhost_vring_state asid = {
> > > +        .index = vq_group,
> > > +        .num = asid_num,
> > > +    };
> > > +    int ret;
> > > +
> > > +    ret = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> > > +    if (unlikely(ret < 0)) {
> > > +        warn_report("Can't set vq group %u asid %u, errno=%d (%s)",
> > > +            asid.index, asid.num, errno, g_strerror(errno));
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> > >  {
> > >      VhostIOVATree *tree = v->iova_tree;
> > > @@ -298,11 +335,55 @@ dma_map_err:
> > >  static int vhost_vdpa_net_cvq_prepare(NetClientState *nc)
> > >  {
> > >      VhostVDPAState *s;
> > > +    struct vhost_vdpa *v;
> > > +    struct vhost_vring_state cvq_group = {};
> > >      int r;
> > >
> > >      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >
> > >      s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > +    v = &s->vhost_vdpa;
> > > +    cvq_group.index = v->dev->vq_index_end - 1;
> > > +
> > > +    /* Default values */
> >
> > Code can explain itself so this comment is probably useless.
> >
>
> I'll delete.
>
> > > +    v->shadow_vqs_enabled = false;
> > > +    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_PASSTHROUGH;
> > > +
> > > +    if (s->always_svq) {
> > > +        v->shadow_vqs_enabled = true;
> >
> > The name of the variable is suboptimal.
> >
> > I think we need to differ:
> >
> > 1) shadow all virtqueues
> >
> > from
> >
> > 2) shadow cvq only
> >
>
> Note that shadow_vqs_enabled is per vhost_vdpa, so
> v->shadow_vqs_enabled means that vqs of *v* are shadowed.
>
> Data vhost_vdpa can have a different value than CVQ vhost_vdpa.
>
> Having said that, I'm ok with changing the name of the variable, but I
> cannot come with a better one.

Ok, let's leave it as is.

Thanks

>
> Thanks!
>
> > Thanks
> >
> > > +        goto out;
> > > +    }
> > > +
> > > +    if (s->address_space_num < 2) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    /**
> > > +     * Check if all the virtqueues of the virtio device are in a different vq
> > > +     * than the last vq. VQ group of last group passed in cvq_group.
> > > +     */
> > > +    vhost_vdpa_get_vring_group(v->device_fd, &cvq_group);
> > > +    for (int i = 0; i < (v->dev->vq_index_end - 1); ++i) {
> > > +        struct vhost_vring_state vq_group = {
> > > +            .index = i,
> > > +        };
> > > +
> > > +        vhost_vdpa_get_vring_group(v->device_fd, &vq_group);
> > > +        if (unlikely(vq_group.num == cvq_group.num)) {
> > > +            warn_report("CVQ %u group is the same as VQ %u one (%u)",
> > > +                         cvq_group.index, vq_group.index, cvq_group.num);
> > > +            return 0;
> > > +        }
> > > +    }
> > > +
> > > +    r = vhost_vdpa_set_address_space_id(v, cvq_group.num,
> > > +                                        VHOST_VDPA_NET_CVQ_ASID);
> > > +    if (r == 0) {
> > > +        v->shadow_vqs_enabled = true;
> > > +        s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> > > +    }
> > > +
> > > +out:
> > >      if (!s->vhost_vdpa.shadow_vqs_enabled) {
> > >          return 0;
> > >      }
> > > @@ -523,12 +604,38 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> > >      .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> > >  };
> > >
> > > +static uint32_t vhost_vdpa_get_as_num(int vdpa_device_fd)
> > > +{
> > > +    uint64_t features;
> > > +    unsigned num_as;
> > > +    int r;
> > > +
> > > +    r = ioctl(vdpa_device_fd, VHOST_GET_BACKEND_FEATURES, &features);
> > > +    if (unlikely(r < 0)) {
> > > +        warn_report("Cannot get backend features");
> > > +        return 1;
> > > +    }
> > > +
> > > +    if (!(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> > > +        return 1;
> > > +    }
> > > +
> > > +    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
> > > +    if (unlikely(r < 0)) {
> > > +        warn_report("Cannot retrieve number of supported ASs");
> > > +        return 1;
> > > +    }
> > > +
> > > +    return num_as;
> > > +}
> > > +
> > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >                                             const char *device,
> > >                                             const char *name,
> > >                                             int vdpa_device_fd,
> > >                                             int queue_pair_index,
> > >                                             int nvqs,
> > > +                                           unsigned nas,
> > >                                             bool is_datapath,
> > >                                             bool svq,
> > >                                             VhostIOVATree *iova_tree)
> > > @@ -547,6 +654,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >      snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
> > >      s = DO_UPCAST(VhostVDPAState, nc, nc);
> > >
> > > +    s->address_space_num = nas;
> > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > >      s->vhost_vdpa.index = queue_pair_index;
> > >      s->always_svq = svq;
> > > @@ -632,6 +740,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >      g_autoptr(VhostIOVATree) iova_tree = NULL;
> > >      NetClientState *nc;
> > >      int queue_pairs, r, i = 0, has_cvq = 0;
> > > +    unsigned num_as = 1;
> > > +    bool svq_cvq;
> > >
> > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >      opts = &netdev->u.vhost_vdpa;
> > > @@ -656,7 +766,13 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >          goto err;
> > >      }
> > >
> > > -    if (opts->x_svq) {
> > > +    svq_cvq = opts->x_svq;
> > > +    if (has_cvq && !opts->x_svq) {
> > > +        num_as = vhost_vdpa_get_as_num(vdpa_device_fd);
> > > +        svq_cvq = num_as > 1;
> > > +    }
> > > +
> > > +    if (opts->x_svq || svq_cvq) {
> > >          struct vhost_vdpa_iova_range iova_range;
> > >
> > >          uint64_t invalid_dev_features =
> > > @@ -679,15 +795,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >
> > >      for (i = 0; i < queue_pairs; i++) {
> > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                     vdpa_device_fd, i, 2, true, opts->x_svq,
> > > -                                     iova_tree);
> > > +                                     vdpa_device_fd, i, 2, num_as, true,
> > > +                                     opts->x_svq, iova_tree);
> > >          if (!ncs[i])
> > >              goto err;
> > >      }
> > >
> > >      if (has_cvq) {
> > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                 vdpa_device_fd, i, 1, false,
> > > +                                 vdpa_device_fd, i, 1, num_as, false,
> > >                                   opts->x_svq, iova_tree);
> > >          if (!nc)
> > >              goto err;
> > > --
> > > 2.31.1
> > >
> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-19  6:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05 16:39 [PATCH v4 0/6] ASID support in vhost-vdpa net Eugenio Pérez
2022-08-05 16:39 ` [PATCH v4 1/6] linux-headers: Update kernel headers Eugenio Pérez
2022-08-05 16:39 ` [PATCH v4 2/6] vdpa: Use v->shadow_vqs_enabled in vhost_vdpa_svqs_start & stop Eugenio Pérez
2022-08-05 16:39 ` [PATCH v4 3/6] vdpa: Allocate SVQ unconditionally Eugenio Pérez
2022-08-05 16:39 ` [PATCH v4 4/6] vdpa: Add asid parameter to vhost_vdpa_dma_map/unmap Eugenio Pérez
2022-08-09  7:21   ` Jason Wang
2022-08-09 17:03     ` Eugenio Perez Martin
2022-08-19  4:49       ` Jason Wang
2022-08-05 16:39 ` [PATCH v4 5/6] vdpa: Store x-svq parameter in VhostVDPAState Eugenio Pérez
2022-08-05 16:39 ` [PATCH v4 6/6] vdpa: Always start CVQ in SVQ mode Eugenio Pérez
2022-08-09  7:33   ` Jason Wang
2022-08-09  7:53     ` Eugenio Perez Martin
2022-08-19  6:18       ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).