All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-06-19 14:14 ` Wei Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Wang @ 2016-06-19 14:14 UTC (permalink / raw)
  To: kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini; +Cc: Wei Wang

This RFC proposes a design of vhost-pci, which is a new virtio device type.
The vhost-pci device is used for inter-VM communication.

Changes in v2:
1. changed the vhost-pci driver to use a controlq to send acknowledgement
   messages to the vhost-pci server rather than writing to the device
   configuration space;

2. re-organized all the data structures and the description layout;

3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which is redundant;

4. added a message sequence number to the msg info structure to identify socket
   messages, and the socket message exchange does not need to be blocking;

5. changed to used uuid to identify each VM rather than using the QEMU process
   id

Wei Wang (1):
  Vhost-pci RFC v2: a new virtio device for inter-VM communication

 vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 341 insertions(+)
 create mode 100755 vhost-pci.patch

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-06-19 14:14 ` Wei Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Wang @ 2016-06-19 14:14 UTC (permalink / raw)
  To: kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini; +Cc: Wei Wang

This RFC proposes a design of vhost-pci, which is a new virtio device type.
The vhost-pci device is used for inter-VM communication.

Changes in v2:
1. changed the vhost-pci driver to use a controlq to send acknowledgement
   messages to the vhost-pci server rather than writing to the device
   configuration space;

2. re-organized all the data structures and the description layout;

3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which is redundant;

4. added a message sequence number to the msg info structure to identify socket
   messages, and the socket message exchange does not need to be blocking;

5. changed to used uuid to identify each VM rather than using the QEMU process
   id

Wei Wang (1):
  Vhost-pci RFC v2: a new virtio device for inter-VM communication

 vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 341 insertions(+)
 create mode 100755 vhost-pci.patch

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication
  2016-06-19 14:14 ` [Qemu-devel] " Wei Wang
@ 2016-06-19 14:14   ` Wei Wang
  -1 siblings, 0 replies; 41+ messages in thread
From: Wei Wang @ 2016-06-19 14:14 UTC (permalink / raw)
  To: kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini; +Cc: Wei Wang

We introduce the vhost-pci design in the virtio specification format.
To follow the naming conventions in the virtio specification, we call
the VM who sends packets to the destination VM the device VM, and the
VM who provides the vring and receives packets the driver VM.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 341 insertions(+)
 create mode 100755 vhost-pci.patch

diff --git a/vhost-pci.patch b/vhost-pci.patch
new file mode 100755
index 0000000..341ba07
--- /dev/null
+++ b/vhost-pci.patch
@@ -0,0 +1,341 @@
+1. Vhost-pci Device
+
+1.1 Device ID
+TBD
+
+1.2 Virtqueues
+0 control receiveq
+1 control transmitq
+
+1.3 Feature Bits
+
+1.3.1 Local Feature Bits
+Currently no local feature bits are defined, so the standard virtio feature
+bits negation will always be successful and complete.
+
+1.3.2 Remote Feature Bits
+The remote feature bits are obtained from the frontend device and negotiated
+with the vhost-pci driver via the control transmitq. The negotiation steps
+are described in 1.5 Device Initialization.
+
+1.4 Device Configuration Layout
+None currently defined
+
+1.5 Device Initialization
+When a device VM boots, it creates a vhost-pci server socket.
+
+When a virtio device on the driver VM is created with specifying the use of
+a vhost-pci device as a backend, a client socket is created and connected to
+the server for message exchanges.
+
+The server and client communicate via socket messages. The server and the
+vhost-pci driver communicate via controlq messages. The server updates the
+driver via a control transmitq. The driver acknowledges the server via a
+control receiveq.
+
+Both the socket message and controlq message headers can be constructed using
+the following message info structure:
+struct vhost_pci_msg_info {
+#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0
+#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1
+#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2
+#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3
+#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4
+#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5
+	u16 msg_type;
+	u16 msg_version;
+	u32 msg_len;
+	u64 msg_seq;
+};
+The msg_seq field stores the message sequence number. Each client maintains
+its own message sequence number.
+
+The socket messages are preceded by the following header:
+struct vhost_pci_socket_hdr {
+	struct vhost_pci_msg_info msg_info;
+	u64 client_uuid;
+};
+The client_uuid field is generated by the client for the client identification
+purpose.
+
+The controlq messages are preceded by the following header:
+struct vhost_pci_controlq_hdr {
+	struct vhost_pci_msg_info msg_info;
+#define VHOST_PCI_FRONTEND_DEVICE_NET 1
+#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
+#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
+#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
+#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
+#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
+	u32 device_type;
+	u64 device_id;
+};
+The device_type and device_id fields identify the frontend device (client).
+
+The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be
+constructed using the following structure:
+/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
+struct vhost_pci_socket_memory_info {
+#define VHOST_PCI_ADD_MEMORY 0
+#define VHOST_PCI_DEL_MEMORY 1
+	u16 ops;
+	u32 nregions;
+	struct vhost_pci_memory_region {
+		int fd;
+		u64 guest_phys_addr;
+		u64 memory_size;
+		u64 mmap_offset;
+	} regions[VHOST_PCI_MAX_NREGIONS];
+};
+
+The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be
+constructed using the following structure:
+/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
+struct vhost_pci_controlq_memory_info {
+#define VHOST_PCI_ADD_MEMORY 0
+#define VHOST_PCI_DEL_MEMORY 1
+	u16  ops;
+	u32 nregion;
+	struct exotic_memory_region {
+		u64   region_base_xgpa;
+		u64   size;
+		u64   offset_in_bar_area;
+	} region[VHOST_PCI_MAX_NREGIONS];
+};
+
+The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and
+VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed
+using the following vhost_pci_device_info structure and
+the vhost_pci_feature_bits structure respectively.
+
+/* socket/controlq message: VHOST_PCI_DEVICE_INFO */
+struct vhost_pci_device_info {
+#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
+#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
+	u16    ops;
+	u32    nvirtq;
+	u32    device_type;
+	u64    device_id;
+	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
+};
+
+/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */
+struct vhost_pci_feature_bits {
+	u64 feature_bits;
+};
+
+The payload of all the ACK socket/controlq messages can be constructed using
+the following structure:
+/* socket/controlq message: ACK messages */
+struct vhost_pci_ack {
+	union ack_msg {
+#define VHOST_PCI_ACK_ADD_DONE 0
+#define VHOST_PCI_ACK_ADD_FAIL 1
+#define VHOST_PCI_ACK_DEL_DONE 2
+#define VHOST_PCI_ACK_DEL_FAIL 3
+	u64 ack_memory_info;		
+	u64 ack_device_info;
+	u64 ack_feature_bits;
+	};
+};
+
+1.5.1 Device Requirements: Device Initialization
+
+1.5.1.1	The Frontend Device (Client)
+The vhost-pci server socket path SHOULD be provided to a virtio client socket
+for the connection.
+
+The client SHOULD send three socket messages,
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD),
+VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits)
+and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD),
+to the server, and wait until receiving the corresponding three ACK
+messages from the server.
+
+The client may receive the following ACK socket messages from the server:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device
+VM has successfully mapped the memory, and a vhost-pci device is created on
+the device VM for the driver VM.
+2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device
+VM fails to map the memory. Receiving this message results in the failure of
+setting up the vhost-pci based inter-VM communication support for the driver
+VM.
+3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device
+VM has successfully initialized the related interfaces to communicate to the
+fronted device.
+4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device
+VM fails to  initialize the related interfaces to communicate to the fronted
+device.
+5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of
+this message contains the feature bits accepted by the vhost-pci device and
+driver. If the accepted feature bits are not equal to the feature bits sent by
+the client, the client MUST reset the device to go into backwards capability
+mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and
+send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket
+message to the vhost-pci server. Otherwise, no
+VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is
+sent back to the server.
+
+1.5.1.2	The Vhost-pci Device (Server)
+To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST
+be created when it boots.
+
+When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD)
+socket message, it SHOULD check if a vhost-pci device has been created for the
+requesting VM. If the client_uuid contained in the socket message is not new
+to the server, the server SHOULD simply update the received message to the
+vhost-pci driver via the control transmitq. Otherwise, the server SHOULD
+create a new vhost-pci device, and continue the following memory mapping
+related initialization steps.
+
+The vhost-pci server SHOULD add up all the memory region size, and use a
+64-bit device bar for the mapping of all the memory regions obtained from the
+socket message. To better support the driver VM to hot-plug memory, the bar
+SHOULD be configured with a double size of the driver VM's memory. The server
+SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and
+then the new created vhost-pci device SHOULD be hot-plugged to the VM.
+
+When the device status is updated with DRIVER_OK, a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control
+transmitq, and a controlq interrupt SHOULD be injected to the VM. The server
+may receive the following ACK messages from the driver via the control
+receiveq:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
+has successfully added the memory info to its support. The server SHOULD send
+a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client.
+2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
+fails to add the memory info to its support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client.
+
+When the vhost-pci server receives a
+VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a
+VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq,
+and inject a controlq interrupt to the VM. When the server receives a
+VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message
+from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted
+feature bits) socket message to the client. If the accepted feature bits sent
+to the client does not equal to the one that it received, the server SHOULD
+wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature
+bits) socket message from the client, which indicates that the frontend device
+has finished the re-negotiation of the accepted feature bits.
+
+When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket
+message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the
+control transmitq, and inject a controlq interrupt to the VM. The server may
+receive the following ACK messages from the driver:
+1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the
+vhost-pci driver has successfully added the frontend device to its support
+list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE)
+socket message to the corresponding client.
+2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the
+vhost-pci driver fails to add the frontend device to its support list. The
+server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket
+message to the corresponding client.
+
+1.5.2 Driver Requirements: Device Initialization
+The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control
+receiveq if it succeeds to handle the received controlq message or not.
+The vhost-pci driver MUST NOT accept any feature bits that are not offered
+by the remote feature bits.
+
+When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD)
+controlq message, it MUST initialize the corresponding driver interfaces of
+the device type if they are not initialized, and add the device id to the
+support list that records all the frontend devices being supported by
+vhost-pci for inter-VM communications.
+
+1.6 Device Operation
+1.6.1 Device Requirements: Device Operation
+1.6.1.1 The Frontend Device (Client)
+When the frontend device changes any info (e.g. device_id, virtq address)
+that it has sent to the vhost-pci device, it MUST send a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The
+vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq
+message to the control transmitq, and inject a controlq interrupt to the VM.
+
+When the frontend virtio device is removed (e.g. being hot-plugged out), the
+client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the
+server.
+
+Before the driver VM is destroyed or migrated, all the clients that connect to
+the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
+the server. The destroying or migrating activity MUST wait until all the
+VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.
+
+When the driver VM hot-adds or hot-removes memory, it SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server.
+
+4.6.1.2 The Vhost-pci Device (Server)
+When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq,
+and inject a controlq interrupt to the VM. It may receive the following ACK
+controlq messages from the driver:
+1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver
+has successfully updated the device info. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the
+corresponding client.
+2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver
+fails to update the device info. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the
+corresponding client.
+3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver
+has successfully removed the vhost-pci support for the frontend device. The
+server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket
+message to the corresponding client.
+4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver
+fails to remove the vhost-pci support for the frontend device. The server
+SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to
+the corresponding client.
+
+When there is no client of a driver VM connecting to the vhost-pci device,
+the server SHOULD destroy the vhost-pci device for that driver VM.
+
+When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message,
+it calculates the total size of the received memory. If the new memory size
+plus the mapped memory size is smaller than the address space size reserved by
+the bar, the server SHOULD map the new memory and expose it to the VM via the
+QEMU MemoryRegion mechanism. Then it SHOULD put a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq,
+and inject a controlq interrupt to the VM.
+
+If the new memory size plus the mapped memory size is larger than the address
+space size reserved by the bar, the server SHOULD
+1. clone out a new vhost-pci device;
+2. configure the bar size to be double of the current memory size; and 
+3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci
+device to the VM.
+
+The initialization steps SHOULD follow 1.5 Device Initialization, except the
+interaction messages between the server and client are not needed.
+
+The server may receive the following two memory info add related ACK controlq
+messages from the driver:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
+has successfully added the new memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding
+client.
+2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
+fails to add the new memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding
+client.
+
+When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message,
+it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the
+control transmitq, and inject a controlq interrupt to the VM. The server may
+receive the following two memory ACK controlq messages from the driver:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver
+has successfully deleted the memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE)  socket message to the
+corresponding client.
+2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver
+fails to delete the memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding
+client.
+
+1.6.2 Driver Requirements: Device Operation
+The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL)
+controlq messages before acknowledging the server.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication
@ 2016-06-19 14:14   ` Wei Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Wang @ 2016-06-19 14:14 UTC (permalink / raw)
  To: kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini; +Cc: Wei Wang

We introduce the vhost-pci design in the virtio specification format.
To follow the naming conventions in the virtio specification, we call
the VM who sends packets to the destination VM the device VM, and the
VM who provides the vring and receives packets the driver VM.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 341 insertions(+)
 create mode 100755 vhost-pci.patch

diff --git a/vhost-pci.patch b/vhost-pci.patch
new file mode 100755
index 0000000..341ba07
--- /dev/null
+++ b/vhost-pci.patch
@@ -0,0 +1,341 @@
+1. Vhost-pci Device
+
+1.1 Device ID
+TBD
+
+1.2 Virtqueues
+0 control receiveq
+1 control transmitq
+
+1.3 Feature Bits
+
+1.3.1 Local Feature Bits
+Currently no local feature bits are defined, so the standard virtio feature
+bits negation will always be successful and complete.
+
+1.3.2 Remote Feature Bits
+The remote feature bits are obtained from the frontend device and negotiated
+with the vhost-pci driver via the control transmitq. The negotiation steps
+are described in 1.5 Device Initialization.
+
+1.4 Device Configuration Layout
+None currently defined
+
+1.5 Device Initialization
+When a device VM boots, it creates a vhost-pci server socket.
+
+When a virtio device on the driver VM is created with specifying the use of
+a vhost-pci device as a backend, a client socket is created and connected to
+the server for message exchanges.
+
+The server and client communicate via socket messages. The server and the
+vhost-pci driver communicate via controlq messages. The server updates the
+driver via a control transmitq. The driver acknowledges the server via a
+control receiveq.
+
+Both the socket message and controlq message headers can be constructed using
+the following message info structure:
+struct vhost_pci_msg_info {
+#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0
+#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1
+#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2
+#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3
+#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4
+#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5
+	u16 msg_type;
+	u16 msg_version;
+	u32 msg_len;
+	u64 msg_seq;
+};
+The msg_seq field stores the message sequence number. Each client maintains
+its own message sequence number.
+
+The socket messages are preceded by the following header:
+struct vhost_pci_socket_hdr {
+	struct vhost_pci_msg_info msg_info;
+	u64 client_uuid;
+};
+The client_uuid field is generated by the client for the client identification
+purpose.
+
+The controlq messages are preceded by the following header:
+struct vhost_pci_controlq_hdr {
+	struct vhost_pci_msg_info msg_info;
+#define VHOST_PCI_FRONTEND_DEVICE_NET 1
+#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
+#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
+#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
+#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
+#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
+	u32 device_type;
+	u64 device_id;
+};
+The device_type and device_id fields identify the frontend device (client).
+
+The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be
+constructed using the following structure:
+/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
+struct vhost_pci_socket_memory_info {
+#define VHOST_PCI_ADD_MEMORY 0
+#define VHOST_PCI_DEL_MEMORY 1
+	u16 ops;
+	u32 nregions;
+	struct vhost_pci_memory_region {
+		int fd;
+		u64 guest_phys_addr;
+		u64 memory_size;
+		u64 mmap_offset;
+	} regions[VHOST_PCI_MAX_NREGIONS];
+};
+
+The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be
+constructed using the following structure:
+/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
+struct vhost_pci_controlq_memory_info {
+#define VHOST_PCI_ADD_MEMORY 0
+#define VHOST_PCI_DEL_MEMORY 1
+	u16  ops;
+	u32 nregion;
+	struct exotic_memory_region {
+		u64   region_base_xgpa;
+		u64   size;
+		u64   offset_in_bar_area;
+	} region[VHOST_PCI_MAX_NREGIONS];
+};
+
+The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and
+VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed
+using the following vhost_pci_device_info structure and
+the vhost_pci_feature_bits structure respectively.
+
+/* socket/controlq message: VHOST_PCI_DEVICE_INFO */
+struct vhost_pci_device_info {
+#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
+#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
+	u16    ops;
+	u32    nvirtq;
+	u32    device_type;
+	u64    device_id;
+	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
+};
+
+/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */
+struct vhost_pci_feature_bits {
+	u64 feature_bits;
+};
+
+The payload of all the ACK socket/controlq messages can be constructed using
+the following structure:
+/* socket/controlq message: ACK messages */
+struct vhost_pci_ack {
+	union ack_msg {
+#define VHOST_PCI_ACK_ADD_DONE 0
+#define VHOST_PCI_ACK_ADD_FAIL 1
+#define VHOST_PCI_ACK_DEL_DONE 2
+#define VHOST_PCI_ACK_DEL_FAIL 3
+	u64 ack_memory_info;		
+	u64 ack_device_info;
+	u64 ack_feature_bits;
+	};
+};
+
+1.5.1 Device Requirements: Device Initialization
+
+1.5.1.1	The Frontend Device (Client)
+The vhost-pci server socket path SHOULD be provided to a virtio client socket
+for the connection.
+
+The client SHOULD send three socket messages,
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD),
+VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits)
+and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD),
+to the server, and wait until receiving the corresponding three ACK
+messages from the server.
+
+The client may receive the following ACK socket messages from the server:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device
+VM has successfully mapped the memory, and a vhost-pci device is created on
+the device VM for the driver VM.
+2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device
+VM fails to map the memory. Receiving this message results in the failure of
+setting up the vhost-pci based inter-VM communication support for the driver
+VM.
+3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device
+VM has successfully initialized the related interfaces to communicate to the
+fronted device.
+4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device
+VM fails to  initialize the related interfaces to communicate to the fronted
+device.
+5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of
+this message contains the feature bits accepted by the vhost-pci device and
+driver. If the accepted feature bits are not equal to the feature bits sent by
+the client, the client MUST reset the device to go into backwards capability
+mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and
+send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket
+message to the vhost-pci server. Otherwise, no
+VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is
+sent back to the server.
+
+1.5.1.2	The Vhost-pci Device (Server)
+To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST
+be created when it boots.
+
+When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD)
+socket message, it SHOULD check if a vhost-pci device has been created for the
+requesting VM. If the client_uuid contained in the socket message is not new
+to the server, the server SHOULD simply update the received message to the
+vhost-pci driver via the control transmitq. Otherwise, the server SHOULD
+create a new vhost-pci device, and continue the following memory mapping
+related initialization steps.
+
+The vhost-pci server SHOULD add up all the memory region size, and use a
+64-bit device bar for the mapping of all the memory regions obtained from the
+socket message. To better support the driver VM to hot-plug memory, the bar
+SHOULD be configured with a double size of the driver VM's memory. The server
+SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and
+then the new created vhost-pci device SHOULD be hot-plugged to the VM.
+
+When the device status is updated with DRIVER_OK, a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control
+transmitq, and a controlq interrupt SHOULD be injected to the VM. The server
+may receive the following ACK messages from the driver via the control
+receiveq:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
+has successfully added the memory info to its support. The server SHOULD send
+a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client.
+2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
+fails to add the memory info to its support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client.
+
+When the vhost-pci server receives a
+VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a
+VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq,
+and inject a controlq interrupt to the VM. When the server receives a
+VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message
+from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted
+feature bits) socket message to the client. If the accepted feature bits sent
+to the client does not equal to the one that it received, the server SHOULD
+wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature
+bits) socket message from the client, which indicates that the frontend device
+has finished the re-negotiation of the accepted feature bits.
+
+When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket
+message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the
+control transmitq, and inject a controlq interrupt to the VM. The server may
+receive the following ACK messages from the driver:
+1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the
+vhost-pci driver has successfully added the frontend device to its support
+list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE)
+socket message to the corresponding client.
+2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the
+vhost-pci driver fails to add the frontend device to its support list. The
+server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket
+message to the corresponding client.
+
+1.5.2 Driver Requirements: Device Initialization
+The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control
+receiveq if it succeeds to handle the received controlq message or not.
+The vhost-pci driver MUST NOT accept any feature bits that are not offered
+by the remote feature bits.
+
+When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD)
+controlq message, it MUST initialize the corresponding driver interfaces of
+the device type if they are not initialized, and add the device id to the
+support list that records all the frontend devices being supported by
+vhost-pci for inter-VM communications.
+
+1.6 Device Operation
+1.6.1 Device Requirements: Device Operation
+1.6.1.1 The Frontend Device (Client)
+When the frontend device changes any info (e.g. device_id, virtq address)
+that it has sent to the vhost-pci device, it MUST send a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The
+vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq
+message to the control transmitq, and inject a controlq interrupt to the VM.
+
+When the frontend virtio device is removed (e.g. being hot-plugged out), the
+client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the
+server.
+
+Before the driver VM is destroyed or migrated, all the clients that connect to
+the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
+the server. The destroying or migrating activity MUST wait until all the
+VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.
+
+When the driver VM hot-adds or hot-removes memory, it SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server.
+
+4.6.1.2 The Vhost-pci Device (Server)
+When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
+VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq,
+and inject a controlq interrupt to the VM. It may receive the following ACK
+controlq messages from the driver:
+1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver
+has successfully updated the device info. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the
+corresponding client.
+2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver
+fails to update the device info. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the
+corresponding client.
+3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver
+has successfully removed the vhost-pci support for the frontend device. The
+server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket
+message to the corresponding client.
+4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver
+fails to remove the vhost-pci support for the frontend device. The server
+SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to
+the corresponding client.
+
+When there is no client of a driver VM connecting to the vhost-pci device,
+the server SHOULD destroy the vhost-pci device for that driver VM.
+
+When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message,
+it calculates the total size of the received memory. If the new memory size
+plus the mapped memory size is smaller than the address space size reserved by
+the bar, the server SHOULD map the new memory and expose it to the VM via the
+QEMU MemoryRegion mechanism. Then it SHOULD put a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq,
+and inject a controlq interrupt to the VM.
+
+If the new memory size plus the mapped memory size is larger than the address
+space size reserved by the bar, the server SHOULD
+1. clone out a new vhost-pci device;
+2. configure the bar size to be double of the current memory size; and 
+3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci
+device to the VM.
+
+The initialization steps SHOULD follow 1.5 Device Initialization, except the
+interaction messages between the server and client are not needed.
+
+The server may receive the following two memory info add related ACK controlq
+messages from the driver:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
+has successfully added the new memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding
+client.
+2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
+fails to add the new memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding
+client.
+
+When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message,
+it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the
+control transmitq, and inject a controlq interrupt to the VM. The server may
+receive the following two memory ACK controlq messages from the driver:
+1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver
+has successfully deleted the memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE)  socket message to the
+corresponding client.
+2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver
+fails to delete the memory info support. The server SHOULD send a
+VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding
+client.
+
+1.6.2 Driver Requirements: Device Operation
+The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the
+VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL)
+controlq messages before acknowledging the server.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* RE: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-06-19 14:14 ` [Qemu-devel] " Wei Wang
@ 2016-06-27  2:01   ` Wang, Wei W
  -1 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-06-27  2:01 UTC (permalink / raw)
  To: Wang, Wei W, kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini

On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> This RFC proposes a design of vhost-pci, which is a new virtio device type.
> The vhost-pci device is used for inter-VM communication.
> 
> Changes in v2:
> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>    messages to the vhost-pci server rather than writing to the device
>    configuration space;
> 
> 2. re-organized all the data structures and the description layout;
> 
> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> is redundant;
> 
> 4. added a message sequence number to the msg info structure to identify
> socket
>    messages, and the socket message exchange does not need to be blocking;
> 
> 5. changed to used uuid to identify each VM rather than using the QEMU process
>    id
> 

One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)

> Wei Wang (1):
>   Vhost-pci RFC v2: a new virtio device for inter-VM communication
> 
>  vhost-pci.patch | 341
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch
> 

Hi Michael,

Would you be able to look into the design? Thanks.

Best,
Wei

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-06-27  2:01   ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-06-27  2:01 UTC (permalink / raw)
  To: Wang, Wei W, kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini

On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> This RFC proposes a design of vhost-pci, which is a new virtio device type.
> The vhost-pci device is used for inter-VM communication.
> 
> Changes in v2:
> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>    messages to the vhost-pci server rather than writing to the device
>    configuration space;
> 
> 2. re-organized all the data structures and the description layout;
> 
> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> is redundant;
> 
> 4. added a message sequence number to the msg info structure to identify
> socket
>    messages, and the socket message exchange does not need to be blocking;
> 
> 5. changed to used uuid to identify each VM rather than using the QEMU process
>    id
> 

One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)

> Wei Wang (1):
>   Vhost-pci RFC v2: a new virtio device for inter-VM communication
> 
>  vhost-pci.patch | 341
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch
> 

Hi Michael,

Would you be able to look into the design? Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-06-27  2:01   ` [Qemu-devel] " Wang, Wei W
@ 2016-08-29 15:24     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-08-29 15:24 UTC (permalink / raw)
  To: Wang, Wei W; +Cc: kvm, qemu-devel, virtio-comment, mst, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > The vhost-pci device is used for inter-VM communication.
> > 
> > Changes in v2:
> > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> >    messages to the vhost-pci server rather than writing to the device
> >    configuration space;
> > 
> > 2. re-organized all the data structures and the description layout;
> > 
> > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> > is redundant;
> > 
> > 4. added a message sequence number to the msg info structure to identify
> > socket
> >    messages, and the socket message exchange does not need to be blocking;
> > 
> > 5. changed to used uuid to identify each VM rather than using the QEMU process
> >    id
> > 
> 
> One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)

Either the driver VM could go down or the device VM (server) could go
down.  In both cases there must be a way to handle the situation.

If the server VM goes down it should be possible for the driver VM to
resume either via hotplug of a new device or through messages
reinitializing the dead device when the server VM restarts.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-29 15:24     ` Stefan Hajnoczi
  0 siblings, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-08-29 15:24 UTC (permalink / raw)
  To: Wang, Wei W; +Cc: kvm, qemu-devel, virtio-comment, mst, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > The vhost-pci device is used for inter-VM communication.
> > 
> > Changes in v2:
> > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> >    messages to the vhost-pci server rather than writing to the device
> >    configuration space;
> > 
> > 2. re-organized all the data structures and the description layout;
> > 
> > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> > is redundant;
> > 
> > 4. added a message sequence number to the msg info structure to identify
> > socket
> >    messages, and the socket message exchange does not need to be blocking;
> > 
> > 5. changed to used uuid to identify each VM rather than using the QEMU process
> >    id
> > 
> 
> One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)

Either the driver VM could go down or the device VM (server) could go
down.  In both cases there must be a way to handle the situation.

If the server VM goes down it should be possible for the driver VM to
resume either via hotplug of a new device or through messages
reinitializing the dead device when the server VM restarts.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [virtio-comment] Re: [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication
  2016-06-19 14:14   ` [Qemu-devel] " Wei Wang
@ 2016-08-29 15:27     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-08-29 15:27 UTC (permalink / raw)
  To: Wei Wang
  Cc: kvm, qemu-devel, virtio-comment, mst, pbonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 17833 bytes --]

On Sun, Jun 19, 2016 at 10:14:09PM +0800, Wei Wang wrote:
> We introduce the vhost-pci design in the virtio specification format.
> To follow the naming conventions in the virtio specification, we call
> the VM who sends packets to the destination VM the device VM, and the
> VM who provides the vring and receives packets the driver VM.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> ---
>  vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch

Adding Marc-André on CC because vhost-pci has a natural parallel to
vhost-user.  Instead of terminating the virtio device in a host
userspace process it terminates the device in a VM.  The design lessons
from vhost-user still apply though.

Marc-André: Do you have time to review this proposal?

> diff --git a/vhost-pci.patch b/vhost-pci.patch
> new file mode 100755
> index 0000000..341ba07
> --- /dev/null
> +++ b/vhost-pci.patch
> @@ -0,0 +1,341 @@
> +1. Vhost-pci Device
> +
> +1.1 Device ID
> +TBD
> +
> +1.2 Virtqueues
> +0 control receiveq
> +1 control transmitq
> +
> +1.3 Feature Bits
> +
> +1.3.1 Local Feature Bits
> +Currently no local feature bits are defined, so the standard virtio feature
> +bits negation will always be successful and complete.
> +
> +1.3.2 Remote Feature Bits
> +The remote feature bits are obtained from the frontend device and negotiated
> +with the vhost-pci driver via the control transmitq. The negotiation steps
> +are described in 1.5 Device Initialization.
> +
> +1.4 Device Configuration Layout
> +None currently defined
> +
> +1.5 Device Initialization
> +When a device VM boots, it creates a vhost-pci server socket.
> +
> +When a virtio device on the driver VM is created with specifying the use of
> +a vhost-pci device as a backend, a client socket is created and connected to
> +the server for message exchanges.
> +
> +The server and client communicate via socket messages. The server and the
> +vhost-pci driver communicate via controlq messages. The server updates the
> +driver via a control transmitq. The driver acknowledges the server via a
> +control receiveq.
> +
> +Both the socket message and controlq message headers can be constructed using
> +the following message info structure:
> +struct vhost_pci_msg_info {
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5
> +	u16 msg_type;
> +	u16 msg_version;
> +	u32 msg_len;
> +	u64 msg_seq;
> +};
> +The msg_seq field stores the message sequence number. Each client maintains
> +its own message sequence number.
> +
> +The socket messages are preceded by the following header:
> +struct vhost_pci_socket_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +	u64 client_uuid;
> +};
> +The client_uuid field is generated by the client for the client identification
> +purpose.
> +
> +The controlq messages are preceded by the following header:
> +struct vhost_pci_controlq_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +#define VHOST_PCI_FRONTEND_DEVICE_NET 1
> +#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
> +#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
> +#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
> +#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
> +#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
> +	u32 device_type;
> +	u64 device_id;
> +};
> +The device_type and device_id fields identify the frontend device (client).
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be
> +constructed using the following structure:
> +/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_socket_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16 ops;
> +	u32 nregions;
> +	struct vhost_pci_memory_region {
> +		int fd;
> +		u64 guest_phys_addr;
> +		u64 memory_size;
> +		u64 mmap_offset;
> +	} regions[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be
> +constructed using the following structure:
> +/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_controlq_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16  ops;
> +	u32 nregion;
> +	struct exotic_memory_region {
> +		u64   region_base_xgpa;
> +		u64   size;
> +		u64   offset_in_bar_area;
> +	} region[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed
> +using the following vhost_pci_device_info structure and
> +the vhost_pci_feature_bits structure respectively.
> +
> +/* socket/controlq message: VHOST_PCI_DEVICE_INFO */
> +struct vhost_pci_device_info {
> +#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
> +#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
> +	u16    ops;
> +	u32    nvirtq;
> +	u32    device_type;
> +	u64    device_id;
> +	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
> +};
> +
> +/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */
> +struct vhost_pci_feature_bits {
> +	u64 feature_bits;
> +};
> +
> +The payload of all the ACK socket/controlq messages can be constructed using
> +the following structure:
> +/* socket/controlq message: ACK messages */
> +struct vhost_pci_ack {
> +	union ack_msg {
> +#define VHOST_PCI_ACK_ADD_DONE 0
> +#define VHOST_PCI_ACK_ADD_FAIL 1
> +#define VHOST_PCI_ACK_DEL_DONE 2
> +#define VHOST_PCI_ACK_DEL_FAIL 3
> +	u64 ack_memory_info;		
> +	u64 ack_device_info;
> +	u64 ack_feature_bits;
> +	};
> +};
> +
> +1.5.1 Device Requirements: Device Initialization
> +
> +1.5.1.1	The Frontend Device (Client)
> +The vhost-pci server socket path SHOULD be provided to a virtio client socket
> +for the connection.
> +
> +The client SHOULD send three socket messages,
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD),
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits)
> +and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD),
> +to the server, and wait until receiving the corresponding three ACK
> +messages from the server.
> +
> +The client may receive the following ACK socket messages from the server:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully mapped the memory, and a vhost-pci device is created on
> +the device VM for the driver VM.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to map the memory. Receiving this message results in the failure of
> +setting up the vhost-pci based inter-VM communication support for the driver
> +VM.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully initialized the related interfaces to communicate to the
> +fronted device.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to  initialize the related interfaces to communicate to the fronted
> +device.
> +5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of
> +this message contains the feature bits accepted by the vhost-pci device and
> +driver. If the accepted feature bits are not equal to the feature bits sent by
> +the client, the client MUST reset the device to go into backwards capability
> +mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and
> +send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket
> +message to the vhost-pci server. Otherwise, no
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is
> +sent back to the server.
> +
> +1.5.1.2	The Vhost-pci Device (Server)
> +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST
> +be created when it boots.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD)
> +socket message, it SHOULD check if a vhost-pci device has been created for the
> +requesting VM. If the client_uuid contained in the socket message is not new
> +to the server, the server SHOULD simply update the received message to the
> +vhost-pci driver via the control transmitq. Otherwise, the server SHOULD
> +create a new vhost-pci device, and continue the following memory mapping
> +related initialization steps.
> +
> +The vhost-pci server SHOULD add up all the memory region size, and use a
> +64-bit device bar for the mapping of all the memory regions obtained from the
> +socket message. To better support the driver VM to hot-plug memory, the bar
> +SHOULD be configured with a double size of the driver VM's memory. The server
> +SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and
> +then the new created vhost-pci device SHOULD be hot-plugged to the VM.
> +
> +When the device status is updated with DRIVER_OK, a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control
> +transmitq, and a controlq interrupt SHOULD be injected to the VM. The server
> +may receive the following ACK messages from the driver via the control
> +receiveq:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the memory info to its support. The server SHOULD send
> +a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the memory info to its support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client.
> +
> +When the vhost-pci server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq,
> +and inject a controlq interrupt to the VM. When the server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message
> +from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted
> +feature bits) socket message to the client. If the accepted feature bits sent
> +to the client does not equal to the one that it received, the server SHOULD
> +wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature
> +bits) socket message from the client, which indicates that the frontend device
> +has finished the re-negotiation of the accepted feature bits.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket
> +message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following ACK messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the
> +vhost-pci driver has successfully added the frontend device to its support
> +list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE)
> +socket message to the corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the
> +vhost-pci driver fails to add the frontend device to its support list. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket
> +message to the corresponding client.
> +
> +1.5.2 Driver Requirements: Device Initialization
> +The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control
> +receiveq if it succeeds to handle the received controlq message or not.
> +The vhost-pci driver MUST NOT accept any feature bits that are not offered
> +by the remote feature bits.
> +
> +When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD)
> +controlq message, it MUST initialize the corresponding driver interfaces of
> +the device type if they are not initialized, and add the device id to the
> +support list that records all the frontend devices being supported by
> +vhost-pci for inter-VM communications.
> +
> +1.6 Device Operation
> +1.6.1 Device Requirements: Device Operation
> +1.6.1.1 The Frontend Device (Client)
> +When the frontend device changes any info (e.g. device_id, virtq address)
> +that it has sent to the vhost-pci device, it MUST send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The
> +vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq
> +message to the control transmitq, and inject a controlq interrupt to the VM.
> +
> +When the frontend virtio device is removed (e.g. being hot-plugged out), the
> +client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the
> +server.
> +
> +Before the driver VM is destroyed or migrated, all the clients that connect to
> +the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
> +the server. The destroying or migrating activity MUST wait until all the
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.
> +
> +When the driver VM hot-adds or hot-removes memory, it SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server.
> +
> +4.6.1.2 The Vhost-pci Device (Server)
> +When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq,
> +and inject a controlq interrupt to the VM. It may receive the following ACK
> +controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully updated the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver
> +fails to update the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the
> +corresponding client.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully removed the vhost-pci support for the frontend device. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket
> +message to the corresponding client.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to remove the vhost-pci support for the frontend device. The server
> +SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to
> +the corresponding client.
> +
> +When there is no client of a driver VM connecting to the vhost-pci device,
> +the server SHOULD destroy the vhost-pci device for that driver VM.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message,
> +it calculates the total size of the received memory. If the new memory size
> +plus the mapped memory size is smaller than the address space size reserved by
> +the bar, the server SHOULD map the new memory and expose it to the VM via the
> +QEMU MemoryRegion mechanism. Then it SHOULD put a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq,
> +and inject a controlq interrupt to the VM.
> +
> +If the new memory size plus the mapped memory size is larger than the address
> +space size reserved by the bar, the server SHOULD
> +1. clone out a new vhost-pci device;
> +2. configure the bar size to be double of the current memory size; and 
> +3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci
> +device to the VM.
> +
> +The initialization steps SHOULD follow 1.5 Device Initialization, except the
> +interaction messages between the server and client are not needed.
> +
> +The server may receive the following two memory info add related ACK controlq
> +messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding
> +client.
> +2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding
> +client.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message,
> +it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following two memory ACK controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully deleted the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE)  socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to delete the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding
> +client.
> +
> +1.6.2 Driver Requirements: Device Operation
> +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL)
> +controlq messages before acknowledging the server.
> -- 
> 1.8.3.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication
@ 2016-08-29 15:27     ` Stefan Hajnoczi
  0 siblings, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-08-29 15:27 UTC (permalink / raw)
  To: Wei Wang
  Cc: kvm, qemu-devel, virtio-comment, mst, pbonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 17833 bytes --]

On Sun, Jun 19, 2016 at 10:14:09PM +0800, Wei Wang wrote:
> We introduce the vhost-pci design in the virtio specification format.
> To follow the naming conventions in the virtio specification, we call
> the VM who sends packets to the destination VM the device VM, and the
> VM who provides the vring and receives packets the driver VM.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> ---
>  vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch

Adding Marc-André on CC because vhost-pci has a natural parallel to
vhost-user.  Instead of terminating the virtio device in a host
userspace process it terminates the device in a VM.  The design lessons
from vhost-user still apply though.

Marc-André: Do you have time to review this proposal?

> diff --git a/vhost-pci.patch b/vhost-pci.patch
> new file mode 100755
> index 0000000..341ba07
> --- /dev/null
> +++ b/vhost-pci.patch
> @@ -0,0 +1,341 @@
> +1. Vhost-pci Device
> +
> +1.1 Device ID
> +TBD
> +
> +1.2 Virtqueues
> +0 control receiveq
> +1 control transmitq
> +
> +1.3 Feature Bits
> +
> +1.3.1 Local Feature Bits
> +Currently no local feature bits are defined, so the standard virtio feature
> +bits negation will always be successful and complete.
> +
> +1.3.2 Remote Feature Bits
> +The remote feature bits are obtained from the frontend device and negotiated
> +with the vhost-pci driver via the control transmitq. The negotiation steps
> +are described in 1.5 Device Initialization.
> +
> +1.4 Device Configuration Layout
> +None currently defined
> +
> +1.5 Device Initialization
> +When a device VM boots, it creates a vhost-pci server socket.
> +
> +When a virtio device on the driver VM is created with specifying the use of
> +a vhost-pci device as a backend, a client socket is created and connected to
> +the server for message exchanges.
> +
> +The server and client communicate via socket messages. The server and the
> +vhost-pci driver communicate via controlq messages. The server updates the
> +driver via a control transmitq. The driver acknowledges the server via a
> +control receiveq.
> +
> +Both the socket message and controlq message headers can be constructed using
> +the following message info structure:
> +struct vhost_pci_msg_info {
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5
> +	u16 msg_type;
> +	u16 msg_version;
> +	u32 msg_len;
> +	u64 msg_seq;
> +};
> +The msg_seq field stores the message sequence number. Each client maintains
> +its own message sequence number.
> +
> +The socket messages are preceded by the following header:
> +struct vhost_pci_socket_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +	u64 client_uuid;
> +};
> +The client_uuid field is generated by the client for the client identification
> +purpose.
> +
> +The controlq messages are preceded by the following header:
> +struct vhost_pci_controlq_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +#define VHOST_PCI_FRONTEND_DEVICE_NET 1
> +#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
> +#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
> +#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
> +#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
> +#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
> +	u32 device_type;
> +	u64 device_id;
> +};
> +The device_type and device_id fields identify the frontend device (client).
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be
> +constructed using the following structure:
> +/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_socket_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16 ops;
> +	u32 nregions;
> +	struct vhost_pci_memory_region {
> +		int fd;
> +		u64 guest_phys_addr;
> +		u64 memory_size;
> +		u64 mmap_offset;
> +	} regions[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be
> +constructed using the following structure:
> +/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_controlq_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16  ops;
> +	u32 nregion;
> +	struct exotic_memory_region {
> +		u64   region_base_xgpa;
> +		u64   size;
> +		u64   offset_in_bar_area;
> +	} region[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed
> +using the following vhost_pci_device_info structure and
> +the vhost_pci_feature_bits structure respectively.
> +
> +/* socket/controlq message: VHOST_PCI_DEVICE_INFO */
> +struct vhost_pci_device_info {
> +#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
> +#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
> +	u16    ops;
> +	u32    nvirtq;
> +	u32    device_type;
> +	u64    device_id;
> +	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
> +};
> +
> +/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */
> +struct vhost_pci_feature_bits {
> +	u64 feature_bits;
> +};
> +
> +The payload of all the ACK socket/controlq messages can be constructed using
> +the following structure:
> +/* socket/controlq message: ACK messages */
> +struct vhost_pci_ack {
> +	union ack_msg {
> +#define VHOST_PCI_ACK_ADD_DONE 0
> +#define VHOST_PCI_ACK_ADD_FAIL 1
> +#define VHOST_PCI_ACK_DEL_DONE 2
> +#define VHOST_PCI_ACK_DEL_FAIL 3
> +	u64 ack_memory_info;		
> +	u64 ack_device_info;
> +	u64 ack_feature_bits;
> +	};
> +};
> +
> +1.5.1 Device Requirements: Device Initialization
> +
> +1.5.1.1	The Frontend Device (Client)
> +The vhost-pci server socket path SHOULD be provided to a virtio client socket
> +for the connection.
> +
> +The client SHOULD send three socket messages,
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD),
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits)
> +and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD),
> +to the server, and wait until receiving the corresponding three ACK
> +messages from the server.
> +
> +The client may receive the following ACK socket messages from the server:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully mapped the memory, and a vhost-pci device is created on
> +the device VM for the driver VM.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to map the memory. Receiving this message results in the failure of
> +setting up the vhost-pci based inter-VM communication support for the driver
> +VM.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully initialized the related interfaces to communicate to the
> +fronted device.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to  initialize the related interfaces to communicate to the fronted
> +device.
> +5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of
> +this message contains the feature bits accepted by the vhost-pci device and
> +driver. If the accepted feature bits are not equal to the feature bits sent by
> +the client, the client MUST reset the device to go into backwards capability
> +mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and
> +send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket
> +message to the vhost-pci server. Otherwise, no
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is
> +sent back to the server.
> +
> +1.5.1.2	The Vhost-pci Device (Server)
> +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST
> +be created when it boots.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD)
> +socket message, it SHOULD check if a vhost-pci device has been created for the
> +requesting VM. If the client_uuid contained in the socket message is not new
> +to the server, the server SHOULD simply update the received message to the
> +vhost-pci driver via the control transmitq. Otherwise, the server SHOULD
> +create a new vhost-pci device, and continue the following memory mapping
> +related initialization steps.
> +
> +The vhost-pci server SHOULD add up all the memory region size, and use a
> +64-bit device bar for the mapping of all the memory regions obtained from the
> +socket message. To better support the driver VM to hot-plug memory, the bar
> +SHOULD be configured with a double size of the driver VM's memory. The server
> +SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and
> +then the new created vhost-pci device SHOULD be hot-plugged to the VM.
> +
> +When the device status is updated with DRIVER_OK, a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control
> +transmitq, and a controlq interrupt SHOULD be injected to the VM. The server
> +may receive the following ACK messages from the driver via the control
> +receiveq:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the memory info to its support. The server SHOULD send
> +a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the memory info to its support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client.
> +
> +When the vhost-pci server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq,
> +and inject a controlq interrupt to the VM. When the server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message
> +from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted
> +feature bits) socket message to the client. If the accepted feature bits sent
> +to the client does not equal to the one that it received, the server SHOULD
> +wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature
> +bits) socket message from the client, which indicates that the frontend device
> +has finished the re-negotiation of the accepted feature bits.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket
> +message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following ACK messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the
> +vhost-pci driver has successfully added the frontend device to its support
> +list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE)
> +socket message to the corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the
> +vhost-pci driver fails to add the frontend device to its support list. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket
> +message to the corresponding client.
> +
> +1.5.2 Driver Requirements: Device Initialization
> +The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control
> +receiveq if it succeeds to handle the received controlq message or not.
> +The vhost-pci driver MUST NOT accept any feature bits that are not offered
> +by the remote feature bits.
> +
> +When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD)
> +controlq message, it MUST initialize the corresponding driver interfaces of
> +the device type if they are not initialized, and add the device id to the
> +support list that records all the frontend devices being supported by
> +vhost-pci for inter-VM communications.
> +
> +1.6 Device Operation
> +1.6.1 Device Requirements: Device Operation
> +1.6.1.1 The Frontend Device (Client)
> +When the frontend device changes any info (e.g. device_id, virtq address)
> +that it has sent to the vhost-pci device, it MUST send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The
> +vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq
> +message to the control transmitq, and inject a controlq interrupt to the VM.
> +
> +When the frontend virtio device is removed (e.g. being hot-plugged out), the
> +client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the
> +server.
> +
> +Before the driver VM is destroyed or migrated, all the clients that connect to
> +the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
> +the server. The destroying or migrating activity MUST wait until all the
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.
> +
> +When the driver VM hot-adds or hot-removes memory, it SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server.
> +
> +4.6.1.2 The Vhost-pci Device (Server)
> +When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq,
> +and inject a controlq interrupt to the VM. It may receive the following ACK
> +controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully updated the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver
> +fails to update the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the
> +corresponding client.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully removed the vhost-pci support for the frontend device. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket
> +message to the corresponding client.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to remove the vhost-pci support for the frontend device. The server
> +SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to
> +the corresponding client.
> +
> +When there is no client of a driver VM connecting to the vhost-pci device,
> +the server SHOULD destroy the vhost-pci device for that driver VM.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message,
> +it calculates the total size of the received memory. If the new memory size
> +plus the mapped memory size is smaller than the address space size reserved by
> +the bar, the server SHOULD map the new memory and expose it to the VM via the
> +QEMU MemoryRegion mechanism. Then it SHOULD put a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq,
> +and inject a controlq interrupt to the VM.
> +
> +If the new memory size plus the mapped memory size is larger than the address
> +space size reserved by the bar, the server SHOULD
> +1. clone out a new vhost-pci device;
> +2. configure the bar size to be double of the current memory size; and 
> +3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci
> +device to the VM.
> +
> +The initialization steps SHOULD follow 1.5 Device Initialization, except the
> +interaction messages between the server and client are not needed.
> +
> +The server may receive the following two memory info add related ACK controlq
> +messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding
> +client.
> +2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding
> +client.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message,
> +it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following two memory ACK controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully deleted the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE)  socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to delete the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding
> +client.
> +
> +1.6.2 Driver Requirements: Device Operation
> +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL)
> +controlq messages before acknowledging the server.
> -- 
> 1.8.3.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-06-27  2:01   ` [Qemu-devel] " Wang, Wei W
@ 2016-08-29 15:41     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2016-08-29 15:41 UTC (permalink / raw)
  To: Wang, Wei W; +Cc: kvm, qemu-devel, virtio-comment, stefanha, pbonzini

On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > The vhost-pci device is used for inter-VM communication.
> > 
> > Changes in v2:
> > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> >    messages to the vhost-pci server rather than writing to the device
> >    configuration space;
> > 
> > 2. re-organized all the data structures and the description layout;
> > 
> > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> > is redundant;
> > 
> > 4. added a message sequence number to the msg info structure to identify
> > socket
> >    messages, and the socket message exchange does not need to be blocking;
> > 
> > 5. changed to used uuid to identify each VM rather than using the QEMU process
> >    id
> > 
> 
> One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)

Question would be, does it mean guest is alive or QEMU/vhost
thread running is alive? And how do you distinguish a guest that
crashed from guest that is scheduled out?

Hypervisors generally have ways to detect and handle crashed
and stuck guests. It is likely a better idea to have a single
device to detect this than have each device send keep-alive
interrupts, interfering with the guest.

Given this is not a networking
transport, isn't it enough to handle this simply as a guest reset?
you have to handle it anyway.


> > Wei Wang (1):
> >   Vhost-pci RFC v2: a new virtio device for inter-VM communication
> > 
> >  vhost-pci.patch | 341
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 341 insertions(+)
> >  create mode 100755 vhost-pci.patch
> > 
> 
> Hi Michael,
> 
> Would you be able to look into the design? Thanks.
> 
> Best,
> Wei

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-subscribe@lists.oasis-open.org

Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org

List help: virtio-comment-help@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-29 15:41     ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2016-08-29 15:41 UTC (permalink / raw)
  To: Wang, Wei W; +Cc: kvm, qemu-devel, virtio-comment, stefanha, pbonzini

On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > The vhost-pci device is used for inter-VM communication.
> > 
> > Changes in v2:
> > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> >    messages to the vhost-pci server rather than writing to the device
> >    configuration space;
> > 
> > 2. re-organized all the data structures and the description layout;
> > 
> > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> > is redundant;
> > 
> > 4. added a message sequence number to the msg info structure to identify
> > socket
> >    messages, and the socket message exchange does not need to be blocking;
> > 
> > 5. changed to used uuid to identify each VM rather than using the QEMU process
> >    id
> > 
> 
> One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)

Question would be, does it mean guest is alive or QEMU/vhost
thread running is alive? And how do you distinguish a guest that
crashed from guest that is scheduled out?

Hypervisors generally have ways to detect and handle crashed
and stuck guests. It is likely a better idea to have a single
device to detect this than have each device send keep-alive
interrupts, interfering with the guest.

Given this is not a networking
transport, isn't it enough to handle this simply as a guest reset?
you have to handle it anyway.


> > Wei Wang (1):
> >   Vhost-pci RFC v2: a new virtio device for inter-VM communication
> > 
> >  vhost-pci.patch | 341
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 341 insertions(+)
> >  create mode 100755 vhost-pci.patch
> > 
> 
> Hi Michael,
> 
> Would you be able to look into the design? Thanks.
> 
> Best,
> Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-29 15:24     ` [Qemu-devel] " Stefan Hajnoczi
@ 2016-08-29 15:42       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2016-08-29 15:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Wang, Wei W, kvm, qemu-devel, virtio-comment, pbonzini

On Mon, Aug 29, 2016 at 11:24:51AM -0400, Stefan Hajnoczi wrote:
> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > The vhost-pci device is used for inter-VM communication.
> > > 
> > > Changes in v2:
> > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > >    messages to the vhost-pci server rather than writing to the device
> > >    configuration space;
> > > 
> > > 2. re-organized all the data structures and the description layout;
> > > 
> > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> > > is redundant;
> > > 
> > > 4. added a message sequence number to the msg info structure to identify
> > > socket
> > >    messages, and the socket message exchange does not need to be blocking;
> > > 
> > > 5. changed to used uuid to identify each VM rather than using the QEMU process
> > >    id
> > > 
> > 
> > One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)
> 
> Either the driver VM could go down or the device VM (server) could go
> down.  In both cases there must be a way to handle the situation.
> 
> If the server VM goes down it should be possible for the driver VM to
> resume either via hotplug of a new device or through messages
> reinitializing the dead device when the server VM restarts.
> 
> Stefan


I agree. And I do not think you need to send messages just to detect
this, leave this to the hypervisor.

-- 
MST

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-subscribe@lists.oasis-open.org

Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org

List help: virtio-comment-help@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-29 15:42       ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2016-08-29 15:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Wang, Wei W, kvm, qemu-devel, virtio-comment, pbonzini

On Mon, Aug 29, 2016 at 11:24:51AM -0400, Stefan Hajnoczi wrote:
> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > The vhost-pci device is used for inter-VM communication.
> > > 
> > > Changes in v2:
> > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > >    messages to the vhost-pci server rather than writing to the device
> > >    configuration space;
> > > 
> > > 2. re-organized all the data structures and the description layout;
> > > 
> > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which
> > > is redundant;
> > > 
> > > 4. added a message sequence number to the msg info structure to identify
> > > socket
> > >    messages, and the socket message exchange does not need to be blocking;
> > > 
> > > 5. changed to used uuid to identify each VM rather than using the QEMU process
> > >    id
> > > 
> > 
> > One more point should be added is that the server needs to send periodic socket messages to check if the driver VM is still alive. I will add this message support in next version.  (*v2-AR1*)
> 
> Either the driver VM could go down or the device VM (server) could go
> down.  In both cases there must be a way to handle the situation.
> 
> If the server VM goes down it should be possible for the driver VM to
> resume either via hotplug of a new device or through messages
> reinitializing the dead device when the server VM restarts.
> 
> Stefan


I agree. And I do not think you need to send messages just to detect
this, leave this to the hypervisor.

-- 
MST

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-29 15:41     ` [Qemu-devel] " Michael S. Tsirkin
@ 2016-08-30 10:07       ` Wang, Wei W
  -1 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-08-30 10:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, qemu-devel, virtio-comment, stefanha, pbonzini

On Monday, August 29, 2016 11:42 PM, Michael S. Tsirkin wrote:
> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > The vhost-pci device is used for inter-VM communication.
> > >
> > > Changes in v2:
> > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > >    messages to the vhost-pci server rather than writing to the device
> > >    configuration space;
> > >
> > > 2. re-organized all the data structures and the description layout;
> > >
> > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> which
> > > is redundant;
> > >
> > > 4. added a message sequence number to the msg info structure to
> > > identify socket
> > >    messages, and the socket message exchange does not need to be
> > > blocking;
> > >
> > > 5. changed to used uuid to identify each VM rather than using the QEMU
> process
> > >    id
> > >
> >
> > One more point should be added is that the server needs to send
> > periodic socket messages to check if the driver VM is still alive. I
> > will add this message support in next version.  (*v2-AR1*)
> 
> Question would be, does it mean guest is alive or QEMU/vhost thread running is
> alive? And how do you distinguish a guest that crashed from guest that is
> scheduled out?
> 
> Hypervisors generally have ways to detect and handle crashed and stuck guests.
> It is likely a better idea to have a single device to detect this than have each
> device send keep-alive interrupts, interfering with the guest.

Agree that QEMU can detect the guest crash. 
I think we can just handle the guest powering off case, because the crashed guest will be powered off or reboot anyway. Please check another email about how to handle the situation when a guest powers off. Thanks.

> Given this is not a networking
> transport, isn't it enough to handle this simply as a guest reset?
> you have to handle it anyway.

I think handling this case is not related to the transport type (btw, the vhost-pci device can offer multiple transports (net, scsi, console etc.) at the same time). 

Best,
Wei


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-30 10:07       ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-08-30 10:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, qemu-devel, virtio-comment, stefanha, pbonzini

On Monday, August 29, 2016 11:42 PM, Michael S. Tsirkin wrote:
> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > The vhost-pci device is used for inter-VM communication.
> > >
> > > Changes in v2:
> > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > >    messages to the vhost-pci server rather than writing to the device
> > >    configuration space;
> > >
> > > 2. re-organized all the data structures and the description layout;
> > >
> > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> which
> > > is redundant;
> > >
> > > 4. added a message sequence number to the msg info structure to
> > > identify socket
> > >    messages, and the socket message exchange does not need to be
> > > blocking;
> > >
> > > 5. changed to used uuid to identify each VM rather than using the QEMU
> process
> > >    id
> > >
> >
> > One more point should be added is that the server needs to send
> > periodic socket messages to check if the driver VM is still alive. I
> > will add this message support in next version.  (*v2-AR1*)
> 
> Question would be, does it mean guest is alive or QEMU/vhost thread running is
> alive? And how do you distinguish a guest that crashed from guest that is
> scheduled out?
> 
> Hypervisors generally have ways to detect and handle crashed and stuck guests.
> It is likely a better idea to have a single device to detect this than have each
> device send keep-alive interrupts, interfering with the guest.

Agree that QEMU can detect the guest crash. 
I think we can just handle the guest powering off case, because the crashed guest will be powered off or reboot anyway. Please check another email about how to handle the situation when a guest powers off. Thanks.

> Given this is not a networking
> transport, isn't it enough to handle this simply as a guest reset?
> you have to handle it anyway.

I think handling this case is not related to the transport type (btw, the vhost-pci device can offer multiple transports (net, scsi, console etc.) at the same time). 

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-29 15:24     ` [Qemu-devel] " Stefan Hajnoczi
@ 2016-08-30 10:08       ` Wang, Wei W
  -1 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-08-30 10:08 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm, qemu-devel, virtio-comment, mst, pbonzini

On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> To: Wang, Wei W <wei.w.wang@intel.com>
> Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> 
> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > The vhost-pci device is used for inter-VM communication.
> > >
> > > Changes in v2:
> > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > >    messages to the vhost-pci server rather than writing to the device
> > >    configuration space;
> > >
> > > 2. re-organized all the data structures and the description 
> > > layout;
> > >
> > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> which
> > > is redundant;
> > >
> > > 4. added a message sequence number to the msg info structure to 
> > > identify socket
> > >    messages, and the socket message exchange does not need to be 
> > > blocking;
> > >
> > > 5. changed to used uuid to identify each VM rather than using the 
> > > QEMU
> process
> > >    id
> > >
> >
> > One more point should be added is that the server needs to send 
> > periodic socket messages to check if the driver VM is still alive. I 
> > will add this message support in next version.  (*v2-AR1*)
> 
> Either the driver VM could go down or the device VM (server) could go 
> down.  In both cases there must be a way to handle the situation.
> 
> If the server VM goes down it should be possible for the driver VM to 
> resume either via hotplug of a new device or through messages 
> reinitializing the dead device when the server VM restarts.

I got feedbacks from people that the name of device VM and driver VM are difficult to remember. Can we use client (or frontend) VM and server (or backend) VM in the discussion? I think that would sound more straightforward :)

Here are the two cases:
 
Case 1: When the client VM powers off, the server VM will notice that the connection is closed (the client calls the socket close() function, which notifies the server about the disconnection). Then the server will need to remove the vhost-pci device for that client VM. When the client VM boots up and connects to the server again, the server VM re-establishes the inter-VM communication channel (i.e. creating a new vhost-pci device and hot-plugging it to the server VM).

Case 2: When the server VM powers off, the client doesn't need to do anything. We can provide a way in QEMU monitor to re-establish the connection. So, when the server boots up again, the admin can let a client connect to the server via the client side QEMU monitor.

Best,
Wei





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-30 10:08       ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-08-30 10:08 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm, qemu-devel, virtio-comment, mst, pbonzini

On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> To: Wang, Wei W <wei.w.wang@intel.com>
> Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> 
> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > The vhost-pci device is used for inter-VM communication.
> > >
> > > Changes in v2:
> > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > >    messages to the vhost-pci server rather than writing to the device
> > >    configuration space;
> > >
> > > 2. re-organized all the data structures and the description 
> > > layout;
> > >
> > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> which
> > > is redundant;
> > >
> > > 4. added a message sequence number to the msg info structure to 
> > > identify socket
> > >    messages, and the socket message exchange does not need to be 
> > > blocking;
> > >
> > > 5. changed to used uuid to identify each VM rather than using the 
> > > QEMU
> process
> > >    id
> > >
> >
> > One more point should be added is that the server needs to send 
> > periodic socket messages to check if the driver VM is still alive. I 
> > will add this message support in next version.  (*v2-AR1*)
> 
> Either the driver VM could go down or the device VM (server) could go 
> down.  In both cases there must be a way to handle the situation.
> 
> If the server VM goes down it should be possible for the driver VM to 
> resume either via hotplug of a new device or through messages 
> reinitializing the dead device when the server VM restarts.

I got feedbacks from people that the name of device VM and driver VM are difficult to remember. Can we use client (or frontend) VM and server (or backend) VM in the discussion? I think that would sound more straightforward :)

Here are the two cases:
 
Case 1: When the client VM powers off, the server VM will notice that the connection is closed (the client calls the socket close() function, which notifies the server about the disconnection). Then the server will need to remove the vhost-pci device for that client VM. When the client VM boots up and connects to the server again, the server VM re-establishes the inter-VM communication channel (i.e. creating a new vhost-pci device and hot-plugging it to the server VM).

Case 2: When the server VM powers off, the client doesn't need to do anything. We can provide a way in QEMU monitor to re-establish the connection. So, when the server boots up again, the admin can let a client connect to the server via the client side QEMU monitor.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-30 10:08       ` [Qemu-devel] " Wang, Wei W
@ 2016-08-30 11:10         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2016-08-30 11:10 UTC (permalink / raw)
  To: Wang, Wei W; +Cc: Stefan Hajnoczi, kvm, qemu-devel, virtio-comment, pbonzini

On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > To: Wang, Wei W <wei.w.wang@intel.com>
> > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> > comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > 
> > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > > The vhost-pci device is used for inter-VM communication.
> > > >
> > > > Changes in v2:
> > > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > > >    messages to the vhost-pci server rather than writing to the device
> > > >    configuration space;
> > > >
> > > > 2. re-organized all the data structures and the description 
> > > > layout;
> > > >
> > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > which
> > > > is redundant;
> > > >
> > > > 4. added a message sequence number to the msg info structure to 
> > > > identify socket
> > > >    messages, and the socket message exchange does not need to be 
> > > > blocking;
> > > >
> > > > 5. changed to used uuid to identify each VM rather than using the 
> > > > QEMU
> > process
> > > >    id
> > > >
> > >
> > > One more point should be added is that the server needs to send 
> > > periodic socket messages to check if the driver VM is still alive. I 
> > > will add this message support in next version.  (*v2-AR1*)
> > 
> > Either the driver VM could go down or the device VM (server) could go 
> > down.  In both cases there must be a way to handle the situation.
> > 
> > If the server VM goes down it should be possible for the driver VM to 
> > resume either via hotplug of a new device or through messages 
> > reinitializing the dead device when the server VM restarts.
> 
> I got feedbacks from people that the name of device VM and driver VM
> are difficult to remember. Can we use client (or frontend) VM and
> server (or backend) VM in the discussion? I think that would sound
> more straightforward :)

So server is the device VM?

Sounds even more confusing to me :)

frontend/backend is kind of ok if you really
prefer it, but let's add some text that explains how this translates to
device/driver that rest of text uses.

> 
> Here are the two cases:
>  
> Case 1: When the client VM powers off, the server VM will notice that
> the connection is closed (the client calls the socket close()
> function, which notifies the server about the disconnection). Then the
> server will need to remove the vhost-pci device for that client VM.
> When the client VM boots up and connects to the server again, the
> server VM re-establishes the inter-VM communication channel (i.e.
> creating a new vhost-pci device and hot-plugging it to the server VM).

So on reset you really must wait for backend to stop
doing things before you proceed. Closing socket won't
do this, it's asynchronous.


> Case 2: When the server VM powers off, the client doesn't need to do
> anything. We can provide a way in QEMU monitor to re-establish the
> connection. So, when the server boots up again, the admin can let a
> client connect to the server via the client side QEMU monitor.
> 
> Best,
> Wei
> 
> 

You need server to be careful though.
If it leaves the rings in an inconsistent state, there's a problem.

-- 
MST

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-subscribe@lists.oasis-open.org

Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org

List help: virtio-comment-help@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-30 11:10         ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2016-08-30 11:10 UTC (permalink / raw)
  To: Wang, Wei W; +Cc: Stefan Hajnoczi, kvm, qemu-devel, virtio-comment, pbonzini

On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > To: Wang, Wei W <wei.w.wang@intel.com>
> > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> > comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > 
> > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > > The vhost-pci device is used for inter-VM communication.
> > > >
> > > > Changes in v2:
> > > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > > >    messages to the vhost-pci server rather than writing to the device
> > > >    configuration space;
> > > >
> > > > 2. re-organized all the data structures and the description 
> > > > layout;
> > > >
> > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > which
> > > > is redundant;
> > > >
> > > > 4. added a message sequence number to the msg info structure to 
> > > > identify socket
> > > >    messages, and the socket message exchange does not need to be 
> > > > blocking;
> > > >
> > > > 5. changed to used uuid to identify each VM rather than using the 
> > > > QEMU
> > process
> > > >    id
> > > >
> > >
> > > One more point should be added is that the server needs to send 
> > > periodic socket messages to check if the driver VM is still alive. I 
> > > will add this message support in next version.  (*v2-AR1*)
> > 
> > Either the driver VM could go down or the device VM (server) could go 
> > down.  In both cases there must be a way to handle the situation.
> > 
> > If the server VM goes down it should be possible for the driver VM to 
> > resume either via hotplug of a new device or through messages 
> > reinitializing the dead device when the server VM restarts.
> 
> I got feedbacks from people that the name of device VM and driver VM
> are difficult to remember. Can we use client (or frontend) VM and
> server (or backend) VM in the discussion? I think that would sound
> more straightforward :)

So server is the device VM?

Sounds even more confusing to me :)

frontend/backend is kind of ok if you really
prefer it, but let's add some text that explains how this translates to
device/driver that rest of text uses.

> 
> Here are the two cases:
>  
> Case 1: When the client VM powers off, the server VM will notice that
> the connection is closed (the client calls the socket close()
> function, which notifies the server about the disconnection). Then the
> server will need to remove the vhost-pci device for that client VM.
> When the client VM boots up and connects to the server again, the
> server VM re-establishes the inter-VM communication channel (i.e.
> creating a new vhost-pci device and hot-plugging it to the server VM).

So on reset you really must wait for backend to stop
doing things before you proceed. Closing socket won't
do this, it's asynchronous.


> Case 2: When the server VM powers off, the client doesn't need to do
> anything. We can provide a way in QEMU monitor to re-establish the
> connection. So, when the server boots up again, the admin can let a
> client connect to the server via the client side QEMU monitor.
> 
> Best,
> Wei
> 
> 

You need server to be careful though.
If it leaves the rings in an inconsistent state, there's a problem.

-- 
MST

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-30 11:10         ` [Qemu-devel] " Michael S. Tsirkin
@ 2016-08-30 12:59           ` Wang, Wei W
  -1 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-08-30 12:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, kvm, qemu-devel, virtio-comment, pbonzini

On Tuesday, August 30, 2016 7:11 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> > On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > > To: Wang, Wei W <wei.w.wang@intel.com>
> > > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-
> > > comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> > > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > >
> > > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > > > The vhost-pci device is used for inter-VM communication.
> > > > >
> > > > > Changes in v2:
> > > > > 1. changed the vhost-pci driver to use a controlq to send
> acknowledgement
> > > > >    messages to the vhost-pci server rather than writing to the device
> > > > >    configuration space;
> > > > >
> > > > > 2. re-organized all the data structures and the description
> > > > > layout;
> > > > >
> > > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > > which
> > > > > is redundant;
> > > > >
> > > > > 4. added a message sequence number to the msg info structure to
> > > > > identify socket
> > > > >    messages, and the socket message exchange does not need to be
> > > > > blocking;
> > > > >
> > > > > 5. changed to used uuid to identify each VM rather than using
> > > > > the QEMU
> > > process
> > > > >    id
> > > > >
> > > >
> > > > One more point should be added is that the server needs to send
> > > > periodic socket messages to check if the driver VM is still alive.
> > > > I will add this message support in next version.  (*v2-AR1*)
> > >
> > > Either the driver VM could go down or the device VM (server) could
> > > go down.  In both cases there must be a way to handle the situation.
> > >
> > > If the server VM goes down it should be possible for the driver VM
> > > to resume either via hotplug of a new device or through messages
> > > reinitializing the dead device when the server VM restarts.
> >
> > I got feedbacks from people that the name of device VM and driver VM
> > are difficult to remember. Can we use client (or frontend) VM and
> > server (or backend) VM in the discussion? I think that would sound
> > more straightforward :)
> 
> So server is the device VM?

Yes. 

> Sounds even more confusing to me :)
> 
> frontend/backend is kind of ok if you really prefer it, but let's add some text that
> explains how this translates to device/driver that rest of text uses.

OK. I guess most people are more comfortable with frontend and backend :)

> >
> > Here are the two cases:
> >
> > Case 1: When the client VM powers off, the server VM will notice that
> > the connection is closed (the client calls the socket close()
> > function, which notifies the server about the disconnection). Then the
> > server will need to remove the vhost-pci device for that client VM.
> > When the client VM boots up and connects to the server again, the
> > server VM re-establishes the inter-VM communication channel (i.e.
> > creating a new vhost-pci device and hot-plugging it to the server VM).
> 
> So on reset you really must wait for backend to stop doing things before you
> proceed. Closing socket won't do this, it's asynchronous.

Agree.

>From the logic point of view, I think we can state the following in the spec:

Before the frontend VM is destroyed or migrated, all the clients that connect to
the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
the server. The destroying or migrating activity MUST wait until all the
VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.


>From the implementation point of view, I think we can implement it like this:

Add a new virtio device_status value: VIRTIO_CONFIG_S_DRIVER_DEL_OK
On reset, the virtio driver's .remove() function will be invoked. At the beginning of that function, we can patch the following two lines of code:

..->set_status(dev, VIRTIO_CONFIG_S_DRIVER_DEL_OK);  // this is supposed to be a request, "OK?"
while (..->get_status(dev) != VIRTIO_CONFIG_S_DRIVER_DEL_OK);  // this is supposed to be an ack, "OK!"

The first function traps to QEMU. There in QEMU, it invokes the client socket to send a DEVICE_INFO(DEL) socket message to the server, and returns without setting the status to be "OK!". Then the frontend driver will wait in the while() function there until it's "OK!" to do the removal.

Once the server receives that DEVICE_INFO(DEL) message, it stops the corresponding vhost-pci driver and sends back a DEVICE_INFO(DEL_DONE) socket message. Upon receiving the message, the client sets the device status to be "OK!". Then the driver's .remove() function gets out of while() to continue its removing work.

Best,
Wei


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-30 12:59           ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-08-30 12:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, kvm, qemu-devel, virtio-comment, pbonzini

On Tuesday, August 30, 2016 7:11 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> > On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > > To: Wang, Wei W <wei.w.wang@intel.com>
> > > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-
> > > comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> > > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > >
> > > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > > > The vhost-pci device is used for inter-VM communication.
> > > > >
> > > > > Changes in v2:
> > > > > 1. changed the vhost-pci driver to use a controlq to send
> acknowledgement
> > > > >    messages to the vhost-pci server rather than writing to the device
> > > > >    configuration space;
> > > > >
> > > > > 2. re-organized all the data structures and the description
> > > > > layout;
> > > > >
> > > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > > which
> > > > > is redundant;
> > > > >
> > > > > 4. added a message sequence number to the msg info structure to
> > > > > identify socket
> > > > >    messages, and the socket message exchange does not need to be
> > > > > blocking;
> > > > >
> > > > > 5. changed to used uuid to identify each VM rather than using
> > > > > the QEMU
> > > process
> > > > >    id
> > > > >
> > > >
> > > > One more point should be added is that the server needs to send
> > > > periodic socket messages to check if the driver VM is still alive.
> > > > I will add this message support in next version.  (*v2-AR1*)
> > >
> > > Either the driver VM could go down or the device VM (server) could
> > > go down.  In both cases there must be a way to handle the situation.
> > >
> > > If the server VM goes down it should be possible for the driver VM
> > > to resume either via hotplug of a new device or through messages
> > > reinitializing the dead device when the server VM restarts.
> >
> > I got feedbacks from people that the name of device VM and driver VM
> > are difficult to remember. Can we use client (or frontend) VM and
> > server (or backend) VM in the discussion? I think that would sound
> > more straightforward :)
> 
> So server is the device VM?

Yes. 

> Sounds even more confusing to me :)
> 
> frontend/backend is kind of ok if you really prefer it, but let's add some text that
> explains how this translates to device/driver that rest of text uses.

OK. I guess most people are more comfortable with frontend and backend :)

> >
> > Here are the two cases:
> >
> > Case 1: When the client VM powers off, the server VM will notice that
> > the connection is closed (the client calls the socket close()
> > function, which notifies the server about the disconnection). Then the
> > server will need to remove the vhost-pci device for that client VM.
> > When the client VM boots up and connects to the server again, the
> > server VM re-establishes the inter-VM communication channel (i.e.
> > creating a new vhost-pci device and hot-plugging it to the server VM).
> 
> So on reset you really must wait for backend to stop doing things before you
> proceed. Closing socket won't do this, it's asynchronous.

Agree.

>From the logic point of view, I think we can state the following in the spec:

Before the frontend VM is destroyed or migrated, all the clients that connect to
the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
the server. The destroying or migrating activity MUST wait until all the
VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.


>From the implementation point of view, I think we can implement it like this:

Add a new virtio device_status value: VIRTIO_CONFIG_S_DRIVER_DEL_OK
On reset, the virtio driver's .remove() function will be invoked. At the beginning of that function, we can patch the following two lines of code:

..->set_status(dev, VIRTIO_CONFIG_S_DRIVER_DEL_OK);  // this is supposed to be a request, "OK?"
while (..->get_status(dev) != VIRTIO_CONFIG_S_DRIVER_DEL_OK);  // this is supposed to be an ack, "OK!"

The first function traps to QEMU. There in QEMU, it invokes the client socket to send a DEVICE_INFO(DEL) socket message to the server, and returns without setting the status to be "OK!". Then the frontend driver will wait in the while() function there until it's "OK!" to do the removal.

Once the server receives that DEVICE_INFO(DEL) message, it stops the corresponding vhost-pci driver and sends back a DEVICE_INFO(DEL_DONE) socket message. Upon receiving the message, the client sets the device status to be "OK!". Then the driver's .remove() function gets out of while() to continue its removing work.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [virtio-comment] Re: [Qemu-devel] [PATCH] *** Vhost-pci RFC v2 ***
  2016-06-19 14:14 ` [Qemu-devel] " Wei Wang
@ 2016-08-31 12:30   ` Marc-André Lureau
  -1 siblings, 0 replies; 41+ messages in thread
From: Marc-André Lureau @ 2016-08-31 12:30 UTC (permalink / raw)
  To: Wei Wang, kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2652 bytes --]

Hi

On Sun, Jun 19, 2016 at 10:19 AM Wei Wang <wei.w.wang@intel.com> wrote:

> This RFC proposes a design of vhost-pci, which is a new virtio device type.
> The vhost-pci device is used for inter-VM communication.
>
>
Before I send a more complete review of the spec, I have a few overall
questions:

- this patch is for the virtio spec? Why not patch the spec directly (
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/trunk/) I
expect several rfc iterations, so perhaps it's easier as plain text file
for now (as a qemu patch to doc/specs). btw, I would limit the audience at
qemu-devel for now.
- I think the virtio spec should limit itself to the hw device description,
and virtioq messages. Not the backend implementation (the ipc details,
client/server etc).
- If it could be made not pci-specific, a better name for the device could
be simply "driver": the driver of a virtio device. Or the "slave" in
vhost-user terminology - consumer of virtq. I think you prefer to call it
"backend" in general, but I find it more confusing.
- regarding the socket protocol, why not reuse vhost-user? it seems to me
it supports most of what you need and more (like interrupt, migrations,
protocol features, start/stop queues). Some of the extensions, like uuid,
could be beneficial to vhost-user too.
- Why is it required or beneficial to support multiple "frontend" devices
over the same "vhost-pci" device? It could simplify things if it was a
single device. If necessary, that could also be interesting as a vhost-user
extension.
- no interrupt support, I suppose you mainly looked at poll-based net
devices
- when do you expect to share a wip/rfc implementation?

thanks

Changes in v2:
> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>    messages to the vhost-pci server rather than writing to the device
>    configuration space;
>
> 2. re-organized all the data structures and the description layout;
>
> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which is
> redundant;
>
> 4. added a message sequence number to the msg info structure to identify
> socket
>    messages, and the socket message exchange does not need to be blocking;
>
> 5. changed to used uuid to identify each VM rather than using the QEMU
> process
>    id
>
> Wei Wang (1):
>   Vhost-pci RFC v2: a new virtio device for inter-VM communication
>
>  vhost-pci.patch | 341
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch
>
> --
> 1.8.3.1
>
>
> --
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 3414 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-08-31 12:30   ` Marc-André Lureau
  0 siblings, 0 replies; 41+ messages in thread
From: Marc-André Lureau @ 2016-08-31 12:30 UTC (permalink / raw)
  To: Wei Wang, kvm, qemu-devel, virtio-comment, mst, stefanha, pbonzini

Hi

On Sun, Jun 19, 2016 at 10:19 AM Wei Wang <wei.w.wang@intel.com> wrote:

> This RFC proposes a design of vhost-pci, which is a new virtio device type.
> The vhost-pci device is used for inter-VM communication.
>
>
Before I send a more complete review of the spec, I have a few overall
questions:

- this patch is for the virtio spec? Why not patch the spec directly (
https://tools.oasis-open.org/version-control/browse/wsvn/virtio/trunk/) I
expect several rfc iterations, so perhaps it's easier as plain text file
for now (as a qemu patch to doc/specs). btw, I would limit the audience at
qemu-devel for now.
- I think the virtio spec should limit itself to the hw device description,
and virtioq messages. Not the backend implementation (the ipc details,
client/server etc).
- If it could be made not pci-specific, a better name for the device could
be simply "driver": the driver of a virtio device. Or the "slave" in
vhost-user terminology - consumer of virtq. I think you prefer to call it
"backend" in general, but I find it more confusing.
- regarding the socket protocol, why not reuse vhost-user? it seems to me
it supports most of what you need and more (like interrupt, migrations,
protocol features, start/stop queues). Some of the extensions, like uuid,
could be beneficial to vhost-user too.
- Why is it required or beneficial to support multiple "frontend" devices
over the same "vhost-pci" device? It could simplify things if it was a
single device. If necessary, that could also be interesting as a vhost-user
extension.
- no interrupt support, I suppose you mainly looked at poll-based net
devices
- when do you expect to share a wip/rfc implementation?

thanks

Changes in v2:
> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>    messages to the vhost-pci server rather than writing to the device
>    configuration space;
>
> 2. re-organized all the data structures and the description layout;
>
> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message, which is
> redundant;
>
> 4. added a message sequence number to the msg info structure to identify
> socket
>    messages, and the socket message exchange does not need to be blocking;
>
> 5. changed to used uuid to identify each VM rather than using the QEMU
> process
>    id
>
> Wei Wang (1):
>   Vhost-pci RFC v2: a new virtio device for inter-VM communication
>
>  vhost-pci.patch | 341
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch
>
> --
> 1.8.3.1
>
>
> --
Marc-André Lureau

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-30 10:08       ` [Qemu-devel] " Wang, Wei W
  (?)
  (?)
@ 2016-08-31 16:07       ` Stefan Hajnoczi
  2016-09-01 16:27           ` Wei Wang
  -1 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-08-31 16:07 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]

On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > To: Wang, Wei W <wei.w.wang@intel.com>
> > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> > comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > 
> > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > > The vhost-pci device is used for inter-VM communication.
> > > >
> > > > Changes in v2:
> > > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > > >    messages to the vhost-pci server rather than writing to the device
> > > >    configuration space;
> > > >
> > > > 2. re-organized all the data structures and the description 
> > > > layout;
> > > >
> > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > which
> > > > is redundant;
> > > >
> > > > 4. added a message sequence number to the msg info structure to 
> > > > identify socket
> > > >    messages, and the socket message exchange does not need to be 
> > > > blocking;
> > > >
> > > > 5. changed to used uuid to identify each VM rather than using the 
> > > > QEMU
> > process
> > > >    id
> > > >
> > >
> > > One more point should be added is that the server needs to send 
> > > periodic socket messages to check if the driver VM is still alive. I 
> > > will add this message support in next version.  (*v2-AR1*)
> > 
> > Either the driver VM could go down or the device VM (server) could go 
> > down.  In both cases there must be a way to handle the situation.
> > 
> > If the server VM goes down it should be possible for the driver VM to 
> > resume either via hotplug of a new device or through messages 
> > reinitializing the dead device when the server VM restarts.
> 
> I got feedbacks from people that the name of device VM and driver VM are difficult to remember. Can we use client (or frontend) VM and server (or backend) VM in the discussion? I think that would sound more straightforward :)

We discussed this in a previous email thread.

Device and driver are the terms used by the virtio spec.  Anyone dealing
with vhost-pci design must be familiar with the virtio spec.

I don't see how using the terminology consistently can be confusing,
unless these people haven't looked at the virtio spec.  In that case
they have no business with working on vhost-pci because virtio is a
prerequisite :).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-01 16:26   ` Wei Wang
@ 2016-09-01  8:49     ` Marc-André Lureau
  2016-09-01 12:13       ` [Qemu-devel] [virtio-comment] " Wei Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Marc-André Lureau @ 2016-09-01  8:49 UTC (permalink / raw)
  To: Wei Wang; +Cc: qemu-devel, virtio-comment, mst, stefanha, pbonzini

Hi

On Thu, Sep 1, 2016 at 12:19 PM Wei Wang <wei.w.wang@intel.com> wrote:

> On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
>
> - If it could be made not pci-specific, a better name for the device could
> be simply "driver": the driver of a virtio device. Or the "slave" in
> vhost-user terminology - consumer of virtq. I think you prefer to call it
> "backend" in general, but I find it more confusing.
>
>
> Not really. A virtio device has it own driver (e.g. a virtio-net driver
> for a virtio-net device). A vhost-pci device plays the role of a backend
> (just like vhost_net, vhost_user) for a virtio device. If we use the
> "device/driver" naming convention, the vhost-pci device is part of the
> "device". But I actually prefer to use "frontend/backend" :) If we check
> the QEMU's doc/specs/vhost-user.txt, it also uses "backend" to describe.
>
>
Yes, but it uses "backend" freely without any definition and to name
eventually different things. (at least "slave" is being defined as the
consumer of virtq, but I think some people don't like to use that word).

Have you thought about making the device not pci specific? I don't know
much about mmio devices nor s/390, but if devices can hotplug their own
memory (I believe mmio can), then it should be possible to define a device
generic enough.

- regarding the socket protocol, why not reuse vhost-user? it seems to me
> it supports most of what you need and more (like interrupt, migrations,
> protocol features, start/stop queues). Some of the extensions, like uuid,
> could be beneficial to vhost-user too.
>
>
> Right. We recently changed the plan - trying to make it (the vhost-pci
> protocol) an extension of the vhost-user protocol.
>
>
> Great!


> - Why is it required or beneficial to support multiple "frontend" devices
> over the same "vhost-pci" device? It could simplify things if it was a
> single device. If necessary, that could also be interesting as a vhost-user
> extension.
>
>
> We call it "multiple backend functionalities" (e.g. vhost-pci-net,
> vhost-pci-scsi..). A vhost-pci driver contains multiple such backend
> functionalities, because in this way they can reuse (share) the same memory
> mapping. To be more precise, a vhost-pci device supplies the memory of a
> frontend VM, and all the backend functionalities need to access the same
> frontend VM memory, so we consolidate them into one vhost-pci driver to use
> one vhost-pci device.
>
>
That's what I imagined. Do you have a use case for that?

Given that it's in a VM (no caching issues?), how is it a problem to map
the same memory multiple times? Is there a memory limit?

- no interrupt support, I suppose you mainly looked at poll-based net
> devices
>
>
> Yes. But I think it's also possible to add the interrupt support. For
> example, we can use ioeventfd (or hypercall) to inject interrupts to the
> fontend VM after transmitting packets.
>
> I guess it would be a good idea to have this in the spec from the
beginning, not as an afterthought

>
> - when do you expect to share a wip/rfc implementation?
>
> Probably in October (next month). I think it also depends on the
> discussions here :)
>

thanks
-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-01  8:49     ` Marc-André Lureau
@ 2016-09-01 12:13       ` Wei Wang
  2016-09-01 13:05         ` Marc-André Lureau
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Wang @ 2016-09-01 12:13 UTC (permalink / raw)
  To: virtio-comment, stefanha, Marc-André Lureau
  Cc: qemu-devel, pbonzini, mst

On 09/01/2016 04:49 PM, Marc-André Lureau wrote:
> Hi
>
> On Thu, Sep 1, 2016 at 12:19 PM Wei Wang <wei.w.wang@intel.com 
> <mailto:wei.w.wang@intel.com>> wrote:
>
>     On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
>
>>     - If it could be made not pci-specific, a better name for the
>>     device could be simply "driver": the driver of a virtio device.
>>     Or the "slave" in vhost-user terminology - consumer of virtq. I
>>     think you prefer to call it "backend" in general, but I find it
>>     more confusing.
>
>     Not really. A virtio device has it own driver (e.g. a virtio-net
>     driver for a virtio-net device). A vhost-pci device plays the role
>     of a backend (just like vhost_net, vhost_user) for a virtio
>     device. If we use the "device/driver" naming convention, the
>     vhost-pci device is part of the "device". But I actually prefer to
>     use "frontend/backend" :) If we check the QEMU's
>     doc/specs/vhost-user.txt, it also uses "backend" to describe.
>
>
> Yes, but it uses "backend" freely without any definition and to name 
> eventually different things. (at least "slave" is being defined as the 
> consumer of virtq, but I think some people don't like to use that word).
>

I think most people know the concept of backend/frontend, that's 
probably the reason why they usually don't explicitly explain it in a 
doc. If you guys don't have an objection, I suggest to use it in the 
discussion :)  The goal here is to get the design finalized first. When 
it comes to the final spec wording phase, we can decide which 
description is more proper.

> Have you thought about making the device not pci specific? I don't 
> know much about mmio devices nor s/390, but if devices can hotplug 
> their own memory (I believe mmio can), then it should be possible to 
> define a device generic enough.

Not yet. I think the main difference would be the way to map the 
frontend VM's memory (in our case, we use a BAR). Other things should be 
generic.

>
>>     - Why is it required or beneficial to support multiple "frontend"
>>     devices over the same "vhost-pci" device? It could simplify
>>     things if it was a single device. If necessary, that could also
>>     be interesting as a vhost-user extension.
>
>     We call it "multiple backend functionalities" (e.g. vhost-pci-net,
>     vhost-pci-scsi..). A vhost-pci driver contains multiple such
>     backend functionalities, because in this way they can reuse
>     (share) the same memory mapping. To be more precise, a vhost-pci
>     device supplies the memory of a frontend VM, and all the backend
>     functionalities need to access the same frontend VM memory, so we
>     consolidate them into one vhost-pci driver to use one vhost-pci
>     device.
>
>
> That's what I imagined. Do you have a use case for that?

Currently, we only have the network use cases. I think we can design it 
that way (multple backend functionalities), which is more generic (not 
just limited to network usages). When implementing it, we can first have 
the network backend functionality (i.e. vhost-pci-net) implemented. In 
the future, if people are interested in other backend functionalities, I 
think it should be easy to add them.

>
> Given that it's in a VM (no caching issues?), how is it a problem to 
> map the same memory multiple times? Is there a memory limit?
>

I need to explain this a little bit more :)  - the backend VM doesn't 
need to map the same memory multiple times. It maps the entire memory of 
a frontend VM using a vhost-pci device (it's a one-time mapping 
happening at the setup phase). Those backend functionalities reside in 
the same vhost-pci driver, so the bar is ioremap()-ed only once, by the 
vhost-pci driver. The backend functionalities are not created together 
in the driver probe() function. A backend functionality is created when 
the vhost-pci driver receives a controlq message asking for creating one 
(the message indicates the type - net, scsi, console etc.).

I haven't seen any caching issues so far.

IIRC, the memory mapping has a limit (512GB or 1T), but that should be 
enough (a guest usually has a much smaller memory size).

>>     - no interrupt support, I suppose you mainly looked at poll-based
>>     net devices
>
>     Yes. But I think it's also possible to add the interrupt support.
>     For example, we can use ioeventfd (or hypercall) to inject
>     interrupts to the fontend VM after transmitting packets.
>
> I guess it would be a good idea to have this in the spec from the 
> beginning, not as an afterthought

OK, will add it.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-01 12:13       ` [Qemu-devel] [virtio-comment] " Wei Wang
@ 2016-09-01 13:05         ` Marc-André Lureau
  2016-09-02  1:29           ` Wei Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Marc-André Lureau @ 2016-09-01 13:05 UTC (permalink / raw)
  To: Wei Wang, virtio-comment, stefanha; +Cc: qemu-devel, pbonzini, mst

Hi

On Thu, Sep 1, 2016 at 4:13 PM Wei Wang <wei.w.wang@intel.com> wrote:

> On 09/01/2016 04:49 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Thu, Sep 1, 2016 at 12:19 PM Wei Wang <wei.w.wang@intel.com
> > <mailto:wei.w.wang@intel.com>> wrote:
> >
> >     On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
> >
> >>     - If it could be made not pci-specific, a better name for the
> >>     device could be simply "driver": the driver of a virtio device.
> >>     Or the "slave" in vhost-user terminology - consumer of virtq. I
> >>     think you prefer to call it "backend" in general, but I find it
> >>     more confusing.
> >
> >     Not really. A virtio device has it own driver (e.g. a virtio-net
> >     driver for a virtio-net device). A vhost-pci device plays the role
> >     of a backend (just like vhost_net, vhost_user) for a virtio
> >     device. If we use the "device/driver" naming convention, the
> >     vhost-pci device is part of the "device". But I actually prefer to
> >     use "frontend/backend" :) If we check the QEMU's
> >     doc/specs/vhost-user.txt, it also uses "backend" to describe.
> >
> >
> > Yes, but it uses "backend" freely without any definition and to name
> > eventually different things. (at least "slave" is being defined as the
> > consumer of virtq, but I think some people don't like to use that word).
> >
>
> I think most people know the concept of backend/frontend, that's
> probably the reason why they usually don't explicitly explain it in a
> doc. If you guys don't have an objection, I suggest to use it in the
> discussion :)  The goal here is to get the design finalized first. When
> it comes to the final spec wording phase, we can decide which
> description is more proper.
>

"backend" is too broad for me. Instead I would stick to something closer to
what we want to name and define. If it's the consumer of virtq, then why
not call it that way.


> > Have you thought about making the device not pci specific? I don't
> > know much about mmio devices nor s/390, but if devices can hotplug
> > their own memory (I believe mmio can), then it should be possible to
> > define a device generic enough.
>
> Not yet. I think the main difference would be the way to map the
> frontend VM's memory (in our case, we use a BAR). Other things should be
> generic.
>

I hope some more knowledgeable people will chime in.


>
> >
> >>     - Why is it required or beneficial to support multiple "frontend"
> >>     devices over the same "vhost-pci" device? It could simplify
> >>     things if it was a single device. If necessary, that could also
> >>     be interesting as a vhost-user extension.
> >
> >     We call it "multiple backend functionalities" (e.g. vhost-pci-net,
> >     vhost-pci-scsi..). A vhost-pci driver contains multiple such
> >     backend functionalities, because in this way they can reuse
> >     (share) the same memory mapping. To be more precise, a vhost-pci
> >     device supplies the memory of a frontend VM, and all the backend
> >     functionalities need to access the same frontend VM memory, so we
> >     consolidate them into one vhost-pci driver to use one vhost-pci
> >     device.
> >
> >
> > That's what I imagined. Do you have a use case for that?
>
> Currently, we only have the network use cases. I think we can design it
> that way (multple backend functionalities), which is more generic (not
> just limited to network usages). When implementing it, we can first have
> the network backend functionality (i.e. vhost-pci-net) implemented. In
> the future, if people are interested in other backend functionalities, I
> think it should be easy to add them.
>

My question is not about the support of various kind of devices (that is
clearly a worthy goal to me) but to have support simultaneously of several
frontend/provider devices on the same vhost-pci device: is this required or
necessary? I think it would simplify things if it was 1-1 instead, I would
like to understand why you propose a different design.


>
> >
> > Given that it's in a VM (no caching issues?), how is it a problem to
> > map the same memory multiple times? Is there a memory limit?
> >
>
> I need to explain this a little bit more :)  - the backend VM doesn't
> need to map the same memory multiple times. It maps the entire memory of
> a frontend VM using a vhost-pci device (it's a one-time mapping
> happening at the setup phase). Those backend functionalities reside in
> the same vhost-pci driver, so the bar is ioremap()-ed only once, by the
> vhost-pci driver. The backend functionalities are not created together
> in the driver probe() function. A backend functionality is created when
> the vhost-pci driver receives a controlq message asking for creating one
> (the message indicates the type - net, scsi, console etc.).
>
> I haven't seen any caching issues so far.
>
> IIRC, the memory mapping has a limit (512GB or 1T), but that should be
> enough (a guest usually has a much smaller memory size).
>
> >>     - no interrupt support, I suppose you mainly looked at poll-based
> >>     net devices
> >
> >     Yes. But I think it's also possible to add the interrupt support.
> >     For example, we can use ioeventfd (or hypercall) to inject
> >     interrupts to the fontend VM after transmitting packets.
> >
> > I guess it would be a good idea to have this in the spec from the
> > beginning, not as an afterthought
>
> OK, will add it.
>
>
thanks
-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-31 12:30   ` Marc-André Lureau
  (?)
@ 2016-09-01 16:26   ` Wei Wang
  2016-09-01  8:49     ` Marc-André Lureau
  -1 siblings, 1 reply; 41+ messages in thread
From: Wei Wang @ 2016-09-01 16:26 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, virtio-comment, mst, stefanha, pbonzini

On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
> Hi
>
> On Sun, Jun 19, 2016 at 10:19 AM Wei Wang <wei.w.wang@intel.com 
> <mailto:wei.w.wang@intel.com>> wrote:
>
>     This RFC proposes a design of vhost-pci, which is a new virtio
>     device type.
>     The vhost-pci device is used for inter-VM communication.
>
>
> Before I send a more complete review of the spec, I have a few overall 
> questions:
>
Hi Marc-André, thanks for joining the reviewing process :)

> - this patch is for the virtio spec? Why not patch the spec directly 
> (https://tools.oasis-open.org/version-control/browse/wsvn/virtio/trunk/) 
> I expect several rfc iterations, so perhaps it's easier as plain text 
> file for now (as a qemu patch to doc/specs). btw, I would limit the 
> audience at qemu-devel for now.

Yes. A part of the patch is for the virtio spec. I will separate the 
patches (please see the next response).
I have the qemu-devel and virtio mailinglist kept here.

> - I think the virtio spec should limit itself to the hw device 
> description, and virtioq messages. Not the backend implementation (the 
> ipc details, client/server etc).

Agree. I will separate the device spec description from the protocol 
description. The device description will be made a virtio spec patch, 
and the protocol description will be made a qemu patch to doc/specs.

> - If it could be made not pci-specific, a better name for the device 
> could be simply "driver": the driver of a virtio device. Or the 
> "slave" in vhost-user terminology - consumer of virtq. I think you 
> prefer to call it "backend" in general, but I find it more confusing.

Not really. A virtio device has it own driver (e.g. a virtio-net driver 
for a virtio-net device). A vhost-pci device plays the role of a backend 
(just like vhost_net, vhost_user) for a virtio device. If we use the 
"device/driver" naming convention, the vhost-pci device is part of the 
"device". But I actually prefer to use "frontend/backend" :) If we check 
the QEMU's doc/specs/vhost-user.txt, it also uses "backend" to describe.

> - regarding the socket protocol, why not reuse vhost-user? it seems to 
> me it supports most of what you need and more (like interrupt, 
> migrations, protocol features, start/stop queues). Some of the 
> extensions, like uuid, could be beneficial to vhost-user too.

Right. We recently changed the plan - trying to make it (the vhost-pci 
protocol) an extension of the vhost-user protocol.

> - Why is it required or beneficial to support multiple "frontend" 
> devices over the same "vhost-pci" device? It could simplify things if 
> it was a single device. If necessary, that could also be interesting 
> as a vhost-user extension.

We call it "multiple backend functionalities" (e.g. vhost-pci-net, 
vhost-pci-scsi..). A vhost-pci driver contains multiple such backend 
functionalities, because in this way they can reuse (share) the same 
memory mapping. To be more precise, a vhost-pci device supplies the 
memory of a frontend VM, and all the backend functionalities need to 
access the same frontend VM memory, so we consolidate them into one 
vhost-pci driver to use one vhost-pci device.

> - no interrupt support, I suppose you mainly looked at poll-based net 
> devices

Yes. But I think it's also possible to add the interrupt support. For 
example, we can use ioeventfd (or hypercall) to inject interrupts to the 
fontend VM after transmitting packets.

> - when do you expect to share a wip/rfc implementation?
Probably in October (next month). I think it also depends on the 
discussions here :)

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [virtio-comment] Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-08-31 16:07       ` Stefan Hajnoczi
@ 2016-09-01 16:27           ` Wei Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Wang @ 2016-09-01 16:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

On 09/01/2016 12:07 AM, Stefan Hajnoczi wrote:
> On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
>> On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
>>> To: Wang, Wei W <wei.w.wang@intel.com>
>>> Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-
>>> comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
>>> Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
>>>
>>> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
>>>> On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
>>>>> This RFC proposes a design of vhost-pci, which is a new virtio device type.
>>>>> The vhost-pci device is used for inter-VM communication.
>>>>>
>>>>> Changes in v2:
>>>>> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>>>>>     messages to the vhost-pci server rather than writing to the device
>>>>>     configuration space;
>>>>>
>>>>> 2. re-organized all the data structures and the description
>>>>> layout;
>>>>>
>>>>> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
>>> which
>>>>> is redundant;
>>>>>
>>>>> 4. added a message sequence number to the msg info structure to
>>>>> identify socket
>>>>>     messages, and the socket message exchange does not need to be
>>>>> blocking;
>>>>>
>>>>> 5. changed to used uuid to identify each VM rather than using the
>>>>> QEMU
>>> process
>>>>>     id
>>>>>
>>>> One more point should be added is that the server needs to send
>>>> periodic socket messages to check if the driver VM is still alive. I
>>>> will add this message support in next version.  (*v2-AR1*)
>>> Either the driver VM could go down or the device VM (server) could go
>>> down.  In both cases there must be a way to handle the situation.
>>>
>>> If the server VM goes down it should be possible for the driver VM to
>>> resume either via hotplug of a new device or through messages
>>> reinitializing the dead device when the server VM restarts.
>> I got feedbacks from people that the name of device VM and driver VM are difficult to remember. Can we use client (or frontend) VM and server (or backend) VM in the discussion? I think that would sound more straightforward :)
> We discussed this in a previous email thread.
>
> Device and driver are the terms used by the virtio spec.  Anyone dealing
> with vhost-pci design must be familiar with the virtio spec.
>
> I don't see how using the terminology consistently can be confusing,
> unless these people haven't looked at the virtio spec.  In that case
> they have no business with working on vhost-pci because virtio is a
> prerequisite :).
>
> Stefan
I don't disagree :)
But "frontend/backend" is also commonly used in descriptions in virtio 
related stuff, and it seems that more people like it. It's also easier 
to describe some components in the design (e.g. a backend functionality 
like vhost-pci-net). I am not sure if you guys are also OK with it.

Best,
Wei

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-subscribe@lists.oasis-open.org

Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org

List help: virtio-comment-help@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-09-01 16:27           ` Wei Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Wang @ 2016-09-01 16:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

On 09/01/2016 12:07 AM, Stefan Hajnoczi wrote:
> On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
>> On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
>>> To: Wang, Wei W <wei.w.wang@intel.com>
>>> Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-
>>> comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
>>> Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
>>>
>>> On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
>>>> On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
>>>>> This RFC proposes a design of vhost-pci, which is a new virtio device type.
>>>>> The vhost-pci device is used for inter-VM communication.
>>>>>
>>>>> Changes in v2:
>>>>> 1. changed the vhost-pci driver to use a controlq to send acknowledgement
>>>>>     messages to the vhost-pci server rather than writing to the device
>>>>>     configuration space;
>>>>>
>>>>> 2. re-organized all the data structures and the description
>>>>> layout;
>>>>>
>>>>> 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
>>> which
>>>>> is redundant;
>>>>>
>>>>> 4. added a message sequence number to the msg info structure to
>>>>> identify socket
>>>>>     messages, and the socket message exchange does not need to be
>>>>> blocking;
>>>>>
>>>>> 5. changed to used uuid to identify each VM rather than using the
>>>>> QEMU
>>> process
>>>>>     id
>>>>>
>>>> One more point should be added is that the server needs to send
>>>> periodic socket messages to check if the driver VM is still alive. I
>>>> will add this message support in next version.  (*v2-AR1*)
>>> Either the driver VM could go down or the device VM (server) could go
>>> down.  In both cases there must be a way to handle the situation.
>>>
>>> If the server VM goes down it should be possible for the driver VM to
>>> resume either via hotplug of a new device or through messages
>>> reinitializing the dead device when the server VM restarts.
>> I got feedbacks from people that the name of device VM and driver VM are difficult to remember. Can we use client (or frontend) VM and server (or backend) VM in the discussion? I think that would sound more straightforward :)
> We discussed this in a previous email thread.
>
> Device and driver are the terms used by the virtio spec.  Anyone dealing
> with vhost-pci design must be familiar with the virtio spec.
>
> I don't see how using the terminology consistently can be confusing,
> unless these people haven't looked at the virtio spec.  In that case
> they have no business with working on vhost-pci because virtio is a
> prerequisite :).
>
> Stefan
I don't disagree :)
But "frontend/backend" is also commonly used in descriptions in virtio 
related stuff, and it seems that more people like it. It's also easier 
to describe some components in the design (e.g. a backend functionality 
like vhost-pci-net). I am not sure if you guys are also OK with it.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-01 13:05         ` Marc-André Lureau
@ 2016-09-02  1:29           ` Wei Wang
  2016-09-02  8:15             ` Marc-André Lureau
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Wang @ 2016-09-02  1:29 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: virtio-comment, stefanha, qemu-devel, pbonzini, mst

On 09/01/2016 09:05 PM, Marc-André Lureau wrote:
> On Thu, Sep 1, 2016 at 4:13 PM Wei Wang <wei.w.wang@intel.com 
> <mailto:wei.w.wang@intel.com>> wrote:
>
>     On 09/01/2016 04:49 PM, Marc-André Lureau wrote:
>     > Hi
>     >
>     > On Thu, Sep 1, 2016 at 12:19 PM Wei Wang <wei.w.wang@intel.com
>     <mailto:wei.w.wang@intel.com>
>     > <mailto:wei.w.wang@intel.com <mailto:wei.w.wang@intel.com>>> wrote:
>     >
>     >     On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
>     >
>     >>     - If it could be made not pci-specific, a better name for the
>     >>     device could be simply "driver": the driver of a virtio device.
>     >>     Or the "slave" in vhost-user terminology - consumer of virtq. I
>     >>     think you prefer to call it "backend" in general, but I find it
>     >>     more confusing.
>     >
>     >     Not really. A virtio device has it own driver (e.g. a virtio-net
>     >     driver for a virtio-net device). A vhost-pci device plays
>     the role
>     >     of a backend (just like vhost_net, vhost_user) for a virtio
>     >     device. If we use the "device/driver" naming convention, the
>     >     vhost-pci device is part of the "device". But I actually
>     prefer to
>     >     use "frontend/backend" :) If we check the QEMU's
>     >     doc/specs/vhost-user.txt, it also uses "backend" to describe.
>     >
>     >
>     > Yes, but it uses "backend" freely without any definition and to name
>     > eventually different things. (at least "slave" is being defined
>     as the
>     > consumer of virtq, but I think some people don't like to use
>     that word).
>     >
>
>     I think most people know the concept of backend/frontend, that's
>     probably the reason why they usually don't explicitly explain it in a
>     doc. If you guys don't have an objection, I suggest to use it in the
>     discussion :)  The goal here is to get the design finalized first.
>     When
>     it comes to the final spec wording phase, we can decide which
>     description is more proper.
>
>
> "backend" is too broad for me. Instead I would stick to something 
> closer to what we want to name and define. If it's the consumer of 
> virtq, then why not call it that way.

     OK. Let me get used to it (provider VM - frontend, consumer VM - 
backend).

>
>     > Have you thought about making the device not pci specific? I don't
>     > know much about mmio devices nor s/390, but if devices can hotplug
>     > their own memory (I believe mmio can), then it should be possible to
>     > define a device generic enough.
>
>     Not yet. I think the main difference would be the way to map the
>     frontend VM's memory (in our case, we use a BAR). Other things
>     should be
>     generic.
>
>
> I hope some more knowledgeable people will chime in.

That would be great.

>
>
>     >
>     >>     - Why is it required or beneficial to support multiple
>     "frontend"
>     >>     devices over the same "vhost-pci" device? It could simplify
>     >>     things if it was a single device. If necessary, that could also
>     >>     be interesting as a vhost-user extension.
>     >
>     >     We call it "multiple backend functionalities" (e.g.
>     vhost-pci-net,
>     >     vhost-pci-scsi..). A vhost-pci driver contains multiple such
>     >     backend functionalities, because in this way they can reuse
>     >     (share) the same memory mapping. To be more precise, a vhost-pci
>     >     device supplies the memory of a frontend VM, and all the backend
>     >     functionalities need to access the same frontend VM memory,
>     so we
>     >     consolidate them into one vhost-pci driver to use one vhost-pci
>     >     device.
>     >
>     >
>     > That's what I imagined. Do you have a use case for that?
>
>     Currently, we only have the network use cases. I think we can
>     design it
>     that way (multple backend functionalities), which is more generic (not
>     just limited to network usages). When implementing it, we can
>     first have
>     the network backend functionality (i.e. vhost-pci-net) implemented. In
>     the future, if people are interested in other backend
>     functionalities, I
>     think it should be easy to add them.
>
>
> My question is not about the support of various kind of devices (that 
> is clearly a worthy goal to me) but to have support simultaneously of 
> several frontend/provider devices on the same vhost-pci device: is 
> this required or necessary? I think it would simplify things if it was 
> 1-1 instead, I would like to understand why you propose a different 
> design.

It is not required, but necessary, I think. As mentioned above, those 
consumer-side functionalities basically access the same provider VM's 
memory, so I think one vhost-pci device is enough to hold that memory. 
When it comes to the consumer guest kernel, we only need to ioremap that 
memory once. Also, a pair of controlq-s is enough to handle the control 
path messages between all those functionalities and the QEMU. I think 
the design also looks compact in this way. what do you think?

If we make it an N-N model (each functionality has a vhost-pci device), 
then the QEMU and guest kernel need to repeat those memory setup things 
N times.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-02  1:29           ` Wei Wang
@ 2016-09-02  8:15             ` Marc-André Lureau
  0 siblings, 0 replies; 41+ messages in thread
From: Marc-André Lureau @ 2016-09-02  8:15 UTC (permalink / raw)
  To: Wei Wang; +Cc: virtio-comment, stefanha, qemu-devel, pbonzini, mst

Hi

On Fri, Sep 2, 2016 at 5:30 AM Wei Wang <wei.w.wang@intel.com> wrote:

> On 09/01/2016 09:05 PM, Marc-André Lureau wrote:
> > On Thu, Sep 1, 2016 at 4:13 PM Wei Wang <wei.w.wang@intel.com
> > <mailto:wei.w.wang@intel.com>> wrote:
> > My question is not about the support of various kind of devices (that
> > is clearly a worthy goal to me) but to have support simultaneously of
> > several frontend/provider devices on the same vhost-pci device: is
> > this required or necessary? I think it would simplify things if it was
> > 1-1 instead, I would like to understand why you propose a different
> > design.
>
> It is not required, but necessary, I think. As mentioned above, those
> consumer-side functionalities basically access the same provider VM's
> memory, so I think one vhost-pci device is enough to hold that memory.
> When it comes to the consumer guest kernel, we only need to ioremap that
> memory once. Also, a pair of controlq-s is enough to handle the control
> path messages between all those functionalities and the QEMU. I think
> the design also looks compact in this way. what do you think?
>
>
If it's not required, I would propose to stick to a 1-1 design for now.
(1-n support could be added later, even if it means a new device kind)
Michael, what do you think?

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-01 16:27           ` Wei Wang
  (?)
@ 2016-09-02 13:26           ` Stefan Hajnoczi
  2016-09-03 13:36               ` Wang, Wei W
  -1 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-09-02 13:26 UTC (permalink / raw)
  To: Wei Wang; +Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 3381 bytes --]

On Fri, Sep 02, 2016 at 12:27:25AM +0800, Wei Wang wrote:
> On 09/01/2016 12:07 AM, Stefan Hajnoczi wrote:
> > On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> > > On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > > > To: Wang, Wei W <wei.w.wang@intel.com>
> > > > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-
> > > > comment@lists.oasis-open.org; mst@redhat.com; pbonzini@redhat.com
> > > > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > > > 
> > > > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > > > This RFC proposes a design of vhost-pci, which is a new virtio device type.
> > > > > > The vhost-pci device is used for inter-VM communication.
> > > > > > 
> > > > > > Changes in v2:
> > > > > > 1. changed the vhost-pci driver to use a controlq to send acknowledgement
> > > > > >     messages to the vhost-pci server rather than writing to the device
> > > > > >     configuration space;
> > > > > > 
> > > > > > 2. re-organized all the data structures and the description
> > > > > > layout;
> > > > > > 
> > > > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket message,
> > > > which
> > > > > > is redundant;
> > > > > > 
> > > > > > 4. added a message sequence number to the msg info structure to
> > > > > > identify socket
> > > > > >     messages, and the socket message exchange does not need to be
> > > > > > blocking;
> > > > > > 
> > > > > > 5. changed to used uuid to identify each VM rather than using the
> > > > > > QEMU
> > > > process
> > > > > >     id
> > > > > > 
> > > > > One more point should be added is that the server needs to send
> > > > > periodic socket messages to check if the driver VM is still alive. I
> > > > > will add this message support in next version.  (*v2-AR1*)
> > > > Either the driver VM could go down or the device VM (server) could go
> > > > down.  In both cases there must be a way to handle the situation.
> > > > 
> > > > If the server VM goes down it should be possible for the driver VM to
> > > > resume either via hotplug of a new device or through messages
> > > > reinitializing the dead device when the server VM restarts.
> > > I got feedbacks from people that the name of device VM and driver VM are difficult to remember. Can we use client (or frontend) VM and server (or backend) VM in the discussion? I think that would sound more straightforward :)
> > We discussed this in a previous email thread.
> > 
> > Device and driver are the terms used by the virtio spec.  Anyone dealing
> > with vhost-pci design must be familiar with the virtio spec.
> > 
> > I don't see how using the terminology consistently can be confusing,
> > unless these people haven't looked at the virtio spec.  In that case
> > they have no business with working on vhost-pci because virtio is a
> > prerequisite :).
> > 
> > Stefan
> I don't disagree :)
> But "frontend/backend" is also commonly used in descriptions in virtio
> related stuff, and it seems that more people like it. It's also easier to
> describe some components in the design (e.g. a backend functionality like
> vhost-pci-net). I am not sure if you guys are also OK with it.

If you want to use frontend/backend I don't mind.  It seems clear to me.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-02 13:26           ` Stefan Hajnoczi
@ 2016-09-03 13:36               ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-09-03 13:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Marc-André Lureau, pbonzini, mst,
	qemu-devel, kvm, virtio-comment


On 09/02/2016 09:27 PM, Stefan Hajnoczi wrote:
> On Fri, Sep 02, 2016 at 12:27:25AM +0800, Wei Wang wrote:
> > On 09/01/2016 12:07 AM, Stefan Hajnoczi wrote:
> > > On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> > > > On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > > > > To: Wang, Wei W <wei.w.wang@intel.com>
> > > > > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> > > > > comment@lists.oasis-open.org; mst@redhat.com; 
> > > > > pbonzini@redhat.com
> > > > > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > > > >
> > > > > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > > > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > > > > This RFC proposes a design of vhost-pci, which is a new 
> > > > > > > virtio device
> type.
> > > > > > > The vhost-pci device is used for inter-VM communication.
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > > 1. changed the vhost-pci driver to use a controlq to send
> acknowledgement
> > > > > > >     messages to the vhost-pci server rather than writing to the device
> > > > > > >     configuration space;
> > > > > > >
> > > > > > > 2. re-organized all the data structures and the 
> > > > > > > description layout;
> > > > > > >
> > > > > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket 
> > > > > > > message,
> > > > > which
> > > > > > > is redundant;
> > > > > > >
> > > > > > > 4. added a message sequence number to the msg info 
> > > > > > > structure to identify socket
> > > > > > >     messages, and the socket message exchange does not 
> > > > > > > need to be blocking;
> > > > > > >
> > > > > > > 5. changed to used uuid to identify each VM rather than 
> > > > > > > using the QEMU
> > > > > process
> > > > > > >     id
> > > > > > >
> > > > > > One more point should be added is that the server needs to 
> > > > > > send periodic socket messages to check if the driver VM is 
> > > > > > still alive. I will add this message support in next version.
> > > > > > (*v2-AR1*)
> > > > > Either the driver VM could go down or the device VM (server) 
> > > > > could go down.  In both cases there must be a way to handle 
> > > > > the
> situation.
> > > > >
> > > > > If the server VM goes down it should be possible for the 
> > > > > driver VM to resume either via hotplug of a new device or 
> > > > > through messages reinitializing the dead device when the server VM restarts.
> > > > I got feedbacks from people that the name of device VM and 
> > > > driver VM are difficult to remember. Can we use client (or 
> > > > frontend) VM and server (or backend) VM in the discussion? I 
> > > > think that would sound more straightforward :)
> > > We discussed this in a previous email thread.
> > >
> > > Device and driver are the terms used by the virtio spec.  Anyone 
> > > dealing with vhost-pci design must be familiar with the virtio spec.
> > >
> > > I don't see how using the terminology consistently can be 
> > > confusing, unless these people haven't looked at the virtio spec.  
> > > In that case they have no business with working on vhost-pci 
> > > because virtio is a prerequisite :).
> > >
> > > Stefan
> > I don't disagree :)
> > But "frontend/backend" is also commonly used in descriptions in 
> > virtio related stuff, and it seems that more people like it. It's 
> > also easier to describe some components in the design (e.g. a 
> > backend functionality like vhost-pci-net). I am not sure if you guys are also OK with it.
> 
> If you want to use frontend/backend I don't mind.  It seems clear to me.

Thanks Stefan. 

Marc-André and I just got different thoughts about a design direction. I prefer to have all the frontend virtio devices (net, scsi, console etc.) from the same VM to be supported by one backend vhost-pci device (N-1), while Marc-André prefers to have each frontend virtio device be supported by a backend vhost-pci device (N-N). 

If possible, hope you, Michael or other more people could also join our review and discussion to finalize the design. Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-09-03 13:36               ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-09-03 13:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Marc-André Lureau, pbonzini, mst,
	qemu-devel, kvm, virtio-comment


On 09/02/2016 09:27 PM, Stefan Hajnoczi wrote:
> On Fri, Sep 02, 2016 at 12:27:25AM +0800, Wei Wang wrote:
> > On 09/01/2016 12:07 AM, Stefan Hajnoczi wrote:
> > > On Tue, Aug 30, 2016 at 10:08:01AM +0000, Wang, Wei W wrote:
> > > > On Monday, August 29, 2016 11:25 PM, Stefan Hajnoczi wrote:
> > > > > To: Wang, Wei W <wei.w.wang@intel.com>
> > > > > Cc: kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio- 
> > > > > comment@lists.oasis-open.org; mst@redhat.com; 
> > > > > pbonzini@redhat.com
> > > > > Subject: Re: [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
> > > > >
> > > > > On Mon, Jun 27, 2016 at 02:01:24AM +0000, Wang, Wei W wrote:
> > > > > > On Sun 6/19/2016 10:14 PM, Wei Wang wrote:
> > > > > > > This RFC proposes a design of vhost-pci, which is a new 
> > > > > > > virtio device
> type.
> > > > > > > The vhost-pci device is used for inter-VM communication.
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > > 1. changed the vhost-pci driver to use a controlq to send
> acknowledgement
> > > > > > >     messages to the vhost-pci server rather than writing to the device
> > > > > > >     configuration space;
> > > > > > >
> > > > > > > 2. re-organized all the data structures and the 
> > > > > > > description layout;
> > > > > > >
> > > > > > > 3. removed the VHOST_PCI_CONTROLQ_UPDATE_DONE socket 
> > > > > > > message,
> > > > > which
> > > > > > > is redundant;
> > > > > > >
> > > > > > > 4. added a message sequence number to the msg info 
> > > > > > > structure to identify socket
> > > > > > >     messages, and the socket message exchange does not 
> > > > > > > need to be blocking;
> > > > > > >
> > > > > > > 5. changed to used uuid to identify each VM rather than 
> > > > > > > using the QEMU
> > > > > process
> > > > > > >     id
> > > > > > >
> > > > > > One more point should be added is that the server needs to 
> > > > > > send periodic socket messages to check if the driver VM is 
> > > > > > still alive. I will add this message support in next version.
> > > > > > (*v2-AR1*)
> > > > > Either the driver VM could go down or the device VM (server) 
> > > > > could go down.  In both cases there must be a way to handle 
> > > > > the
> situation.
> > > > >
> > > > > If the server VM goes down it should be possible for the 
> > > > > driver VM to resume either via hotplug of a new device or 
> > > > > through messages reinitializing the dead device when the server VM restarts.
> > > > I got feedbacks from people that the name of device VM and 
> > > > driver VM are difficult to remember. Can we use client (or 
> > > > frontend) VM and server (or backend) VM in the discussion? I 
> > > > think that would sound more straightforward :)
> > > We discussed this in a previous email thread.
> > >
> > > Device and driver are the terms used by the virtio spec.  Anyone 
> > > dealing with vhost-pci design must be familiar with the virtio spec.
> > >
> > > I don't see how using the terminology consistently can be 
> > > confusing, unless these people haven't looked at the virtio spec.  
> > > In that case they have no business with working on vhost-pci 
> > > because virtio is a prerequisite :).
> > >
> > > Stefan
> > I don't disagree :)
> > But "frontend/backend" is also commonly used in descriptions in 
> > virtio related stuff, and it seems that more people like it. It's 
> > also easier to describe some components in the design (e.g. a 
> > backend functionality like vhost-pci-net). I am not sure if you guys are also OK with it.
> 
> If you want to use frontend/backend I don't mind.  It seems clear to me.

Thanks Stefan. 

Marc-André and I just got different thoughts about a design direction. I prefer to have all the frontend virtio devices (net, scsi, console etc.) from the same VM to be supported by one backend vhost-pci device (N-1), while Marc-André prefers to have each frontend virtio device be supported by a backend vhost-pci device (N-N). 

If possible, hope you, Michael or other more people could also join our review and discussion to finalize the design. Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [virtio-comment] Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-03 13:36               ` Wang, Wei W
@ 2016-09-05  8:56                 ` Marc-André Lureau
  -1 siblings, 0 replies; 41+ messages in thread
From: Marc-André Lureau @ 2016-09-05  8:56 UTC (permalink / raw)
  To: Wang, Wei W, Stefan Hajnoczi
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Hi

On Sat, Sep 3, 2016 at 5:36 PM Wang, Wei W <wei.w.wang@intel.com> wrote:

> Marc-André and I just got different thoughts about a design direction. I
> prefer to have all the frontend virtio devices (net, scsi, console etc.)
> from the same VM to be supported by one backend vhost-pci device (N-1),
> while Marc-André prefers to have each frontend virtio device be supported
> by a backend vhost-pci device (N-N).
>

I suggested 1-1 (not n-n, but you can have several 1-1 pairs), unless you
have a good reason to do differently, starting from the use case (is there
a case that requires several backends/consumers in the same VM? if yes, 1-1
design could still fit). If it's to save guest memory space, it may not be
a good enough reason, but I don't see clearly the implications.

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 1191 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-09-05  8:56                 ` Marc-André Lureau
  0 siblings, 0 replies; 41+ messages in thread
From: Marc-André Lureau @ 2016-09-05  8:56 UTC (permalink / raw)
  To: Wang, Wei W, Stefan Hajnoczi
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

Hi

On Sat, Sep 3, 2016 at 5:36 PM Wang, Wei W <wei.w.wang@intel.com> wrote:

> Marc-André and I just got different thoughts about a design direction. I
> prefer to have all the frontend virtio devices (net, scsi, console etc.)
> from the same VM to be supported by one backend vhost-pci device (N-1),
> while Marc-André prefers to have each frontend virtio device be supported
> by a backend vhost-pci device (N-N).
>

I suggested 1-1 (not n-n, but you can have several 1-1 pairs), unless you
have a good reason to do differently, starting from the use case (is there
a case that requires several backends/consumers in the same VM? if yes, 1-1
design could still fit). If it's to save guest memory space, it may not be
a good enough reason, but I don't see clearly the implications.

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-05  8:56                 ` Marc-André Lureau
  (?)
@ 2016-09-06 17:16                 ` Stefan Hajnoczi
  2016-09-07 12:27                     ` Wang, Wei W
  -1 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2016-09-06 17:16 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Wang, Wei W, Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm,
	virtio-comment

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

On Mon, Sep 05, 2016 at 08:56:14AM +0000, Marc-André Lureau wrote:
> Hi
> 
> On Sat, Sep 3, 2016 at 5:36 PM Wang, Wei W <wei.w.wang@intel.com> wrote:
> 
> > Marc-André and I just got different thoughts about a design direction. I
> > prefer to have all the frontend virtio devices (net, scsi, console etc.)
> > from the same VM to be supported by one backend vhost-pci device (N-1),
> > while Marc-André prefers to have each frontend virtio device be supported
> > by a backend vhost-pci device (N-N).
> >
> 
> I suggested 1-1 (not n-n, but you can have several 1-1 pairs), unless you
> have a good reason to do differently, starting from the use case (is there
> a case that requires several backends/consumers in the same VM? if yes, 1-1
> design could still fit). If it's to save guest memory space, it may not be
> a good enough reason, but I don't see clearly the implications.

N-1 saves address space but is probably a poor fit for modern PCI
devices that are geared towards IOMMUs.

Each virtio device should be isolated in terms of memory space and
hotplug/reset life cycle.  This ensures they are robust against driver
bugs and can be safely delegated/passed through to different
applications or nested VMs that don't trust each other.

Isolation between virtio device instances sharing a single vhost-pci
device (N-1) will harder to achieve.  I would aim for the simpler 1-1
design instead where each device is isolated.

Are there specific reasons for wanting an N-1 design?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
  2016-09-06 17:16                 ` Stefan Hajnoczi
@ 2016-09-07 12:27                     ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-09-07 12:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Marc-André Lureau
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

On 09/07/2016 01:17 PM, Stefan Hajnoczi wrote:
> On Mon, Sep 05, 2016 at 08:56:14AM +0000, Marc-André Lureau wrote:
> > Hi
> >
> > On Sat, Sep 3, 2016 at 5:36 PM Wang, Wei W <wei.w.wang@intel.com> wrote:
> >
> > > Marc-André and I just got different thoughts about a design
> > > direction. I prefer to have all the frontend virtio devices (net,
> > > scsi, console etc.) from the same VM to be supported by one backend
> > > vhost-pci device (N-1), while Marc-André prefers to have each
> > > frontend virtio device be supported by a backend vhost-pci device (N-N).
> > >
> >
> > I suggested 1-1 (not n-n, but you can have several 1-1 pairs), unless
> > you have a good reason to do differently, starting from the use case
> > (is there a case that requires several backends/consumers in the same
> > VM? if yes, 1-1 design could still fit). If it's to save guest memory
> > space, it may not be a good enough reason, but I don't see clearly the
> implications.
> 
> N-1 saves address space but is probably a poor fit for modern PCI devices that
> are geared towards IOMMUs.
> 
> Each virtio device should be isolated in terms of memory space and
> hotplug/reset life cycle.  This ensures they are robust against driver bugs and can
> be safely delegated/passed through to different applications or nested VMs that
> don't trust each other.
> 
> Isolation between virtio device instances sharing a single vhost-pci device (N-1)
> will harder to achieve.  I would aim for the simpler 1-1 design instead where
> each device is isolated.
> 
> Are there specific reasons for wanting an N-1 design?

I made an N-1 design mainly for the sake of resource reusability (sharing the one pair of controlqs, the common initialization code and so on). But it looks like the consolidation is not that important here - I will take your suggestions to start from the simpler one. Thanks Stefan and Marc-André.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 ***
@ 2016-09-07 12:27                     ` Wang, Wei W
  0 siblings, 0 replies; 41+ messages in thread
From: Wang, Wei W @ 2016-09-07 12:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Marc-André Lureau
  Cc: Stefan Hajnoczi, pbonzini, mst, qemu-devel, kvm, virtio-comment

On 09/07/2016 01:17 PM, Stefan Hajnoczi wrote:
> On Mon, Sep 05, 2016 at 08:56:14AM +0000, Marc-André Lureau wrote:
> > Hi
> >
> > On Sat, Sep 3, 2016 at 5:36 PM Wang, Wei W <wei.w.wang@intel.com> wrote:
> >
> > > Marc-André and I just got different thoughts about a design
> > > direction. I prefer to have all the frontend virtio devices (net,
> > > scsi, console etc.) from the same VM to be supported by one backend
> > > vhost-pci device (N-1), while Marc-André prefers to have each
> > > frontend virtio device be supported by a backend vhost-pci device (N-N).
> > >
> >
> > I suggested 1-1 (not n-n, but you can have several 1-1 pairs), unless
> > you have a good reason to do differently, starting from the use case
> > (is there a case that requires several backends/consumers in the same
> > VM? if yes, 1-1 design could still fit). If it's to save guest memory
> > space, it may not be a good enough reason, but I don't see clearly the
> implications.
> 
> N-1 saves address space but is probably a poor fit for modern PCI devices that
> are geared towards IOMMUs.
> 
> Each virtio device should be isolated in terms of memory space and
> hotplug/reset life cycle.  This ensures they are robust against driver bugs and can
> be safely delegated/passed through to different applications or nested VMs that
> don't trust each other.
> 
> Isolation between virtio device instances sharing a single vhost-pci device (N-1)
> will harder to achieve.  I would aim for the simpler 1-1 design instead where
> each device is isolated.
> 
> Are there specific reasons for wanting an N-1 design?

I made an N-1 design mainly for the sake of resource reusability (sharing the one pair of controlqs, the common initialization code and so on). But it looks like the consolidation is not that important here - I will take your suggestions to start from the simpler one. Thanks Stefan and Marc-André.

Best,
Wei

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2016-09-07 12:27 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-19 14:14 [PATCH] *** Vhost-pci RFC v2 *** Wei Wang
2016-06-19 14:14 ` [Qemu-devel] " Wei Wang
2016-06-19 14:14 ` [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication Wei Wang
2016-06-19 14:14   ` [Qemu-devel] " Wei Wang
2016-08-29 15:27   ` [virtio-comment] " Stefan Hajnoczi
2016-08-29 15:27     ` [Qemu-devel] " Stefan Hajnoczi
2016-06-27  2:01 ` [virtio-comment] [PATCH] *** Vhost-pci RFC v2 *** Wang, Wei W
2016-06-27  2:01   ` [Qemu-devel] " Wang, Wei W
2016-08-29 15:24   ` Stefan Hajnoczi
2016-08-29 15:24     ` [Qemu-devel] " Stefan Hajnoczi
2016-08-29 15:42     ` Michael S. Tsirkin
2016-08-29 15:42       ` [Qemu-devel] " Michael S. Tsirkin
2016-08-30 10:08     ` Wang, Wei W
2016-08-30 10:08       ` [Qemu-devel] " Wang, Wei W
2016-08-30 11:10       ` Michael S. Tsirkin
2016-08-30 11:10         ` [Qemu-devel] " Michael S. Tsirkin
2016-08-30 12:59         ` Wang, Wei W
2016-08-30 12:59           ` [Qemu-devel] " Wang, Wei W
2016-08-31 16:07       ` Stefan Hajnoczi
2016-09-01 16:27         ` [virtio-comment] " Wei Wang
2016-09-01 16:27           ` Wei Wang
2016-09-02 13:26           ` Stefan Hajnoczi
2016-09-03 13:36             ` Wang, Wei W
2016-09-03 13:36               ` Wang, Wei W
2016-09-05  8:56               ` [virtio-comment] " Marc-André Lureau
2016-09-05  8:56                 ` Marc-André Lureau
2016-09-06 17:16                 ` Stefan Hajnoczi
2016-09-07 12:27                   ` Wang, Wei W
2016-09-07 12:27                     ` Wang, Wei W
2016-08-29 15:41   ` Michael S. Tsirkin
2016-08-29 15:41     ` [Qemu-devel] " Michael S. Tsirkin
2016-08-30 10:07     ` Wang, Wei W
2016-08-30 10:07       ` [Qemu-devel] " Wang, Wei W
2016-08-31 12:30 ` [virtio-comment] Re: [Qemu-devel] " Marc-André Lureau
2016-08-31 12:30   ` Marc-André Lureau
2016-09-01 16:26   ` Wei Wang
2016-09-01  8:49     ` Marc-André Lureau
2016-09-01 12:13       ` [Qemu-devel] [virtio-comment] " Wei Wang
2016-09-01 13:05         ` Marc-André Lureau
2016-09-02  1:29           ` Wei Wang
2016-09-02  8:15             ` Marc-André Lureau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.