All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] vhost: add postcopy live-migration support
@ 2018-10-02  9:36 Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 01/17] vhost: fix messages error checks Maxime Coquelin
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

In this v2:
- Rebase on top of Nikolay message handling series. It
requires passing an extra parameter to message handlers (
the fd of the socket as set_mem_table needs to send an
intermediate reply).
- Preliminary patches to fix issues with message handling
rework not handling replies properly.
- Handle userfaultfd region registration errors properly.
- Don't build postcopy by default as userfaultd only
landed in v4.3 kernel. It gets automatically enabled with
Meson if headers are present.

With classic live-migration, the VM runs on source while its
content is being migrated to destination. When pages already
migrated to destination are dirtied by the source, they get
copied until both source and destination memory converge.
At that time, the source is stopped and destination is
started.

With postcopy live-migration, the VM is started on destination
before all the memory has been migrated. When the VM tries to
access a page that haven't been migrated yet, a pagefault is
triggered, handled by userfaultfd which pauses the thread.
A Qemu thread in charge of postcopy request the source for
the missing page. Once received and mapped, the paused thread
gets resumed.

Userfaultfd supports handling faults from a different process,
and Qemu supports postcopy with vhost-user backends since
v2.12.

One problem encountered with classic live-migration for VMs
relying on vhost-user backends is that when the traffic is
high (e.g. PVP), it happens that it never converges as
pages gets dirtied at a faster rate than they are copied
to the destination.
It is expected this problem sould be solved with using
postcopy, as rings memory and buffers will be copied once,
when destination will pagefault on them.

Note that it will certainly require a rebase to apply on top
of Nikolay's vhost-user message handling rework.

Steps to test postcopy:
1. Run DPDK's Testpmd application on source:
./install/bin/testpmd -m 512 --file-prefix=src -l 0,2 -n 4 \
  --vdev 'net_vhost0,iface=/tmp/vu-src' -- --portmask=1 -i \
  --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \
  --no-mlockall

2. Run DPDK's Testpmd application on destination:
./install/bin/testpmd -m 512 --file-prefix=dst -l 0,2 -n 4 \
  --vdev 'net_vhost0,iface=/tmp/vu-dst,postcopy-support=1' -- --portmask=1 -i \
  --rxq=1 --txq=1 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 \
  --no-mlockall

3. Launch VM on source:
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \
  -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \
  -numa node,memdev=mem -mem-prealloc \
  -chardev socket,id=char0,path=/tmp/vu-src \
  -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
  -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \
  -net none -vnc :0 -monitor stdio

4. Launch VM on destination:
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3G -smp 2 -cpu host \
  -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on \
  -numa node,memdev=mem -mem-prealloc \
  -chardev socket,id=char0,path=/tmp/vu-dst \
  -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
  -device virtio-net-pci,netdev=mynet1 /home/virt/rhel7.6-1-clone.qcow2 \
  -net none -vnc :1 -monitor stdio -incoming tcp::8888

5. In both testpmd prompts, start flooding the virtio-net device:
testpmd> set fwd txonly
testpmd> start

6. In destination's Qemu monitor, enable postcopy:
(qemu) migrate_set_capability postcopy-ram on

7. In source's Qemu monitor, enable postcopy and launch migration:
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:0:8888
(qemu) migrate_start_postcopy

Maxime Coquelin (17):
  vhost: fix messages error checks
  vhost: fix return code of messages requiring replies
  vhost: fix error handling when mem table gets updated
  vhost: define postcopy protocol flag
  vhost: add number of fds to vhost-user messages and use it
  vhost: pass socket fd to message handling callbacks
  vhost: enable fds passing when sending vhost-user messages
  vhost: add config flag for postcopy feature
  vhost: introduce postcopy's advise message
  vhost: add support for postcopy's listen message
  vhost: register new regions with userfaultfd
  vhost: avoid useless VhostUserMemory copy
  vhost: send userfault range addresses back to qemu
  vhost: add support to postcopy's end request
  vhost: enable postcopy protocol feature
  vhost: add flag to enable postcopy live-migration
  net/vhost: add parameter to enable postcopy support

 config/common_linuxapp              |   1 +
 doc/guides/nics/vhost.rst           |   5 +
 doc/guides/prog_guide/vhost_lib.rst |   8 +
 drivers/net/vhost/rte_eth_vhost.c   |  13 ++
 lib/librte_vhost/meson.build        |   2 +
 lib/librte_vhost/rte_vhost.h        |   5 +
 lib/librte_vhost/socket.c           |  40 +++-
 lib/librte_vhost/vhost.h            |   3 +
 lib/librte_vhost/vhost_user.c       | 304 +++++++++++++++++++++++-----
 lib/librte_vhost/vhost_user.h       |  12 +-
 10 files changed, 327 insertions(+), 66 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02 14:15   ` Ilya Maximets
  2018-10-02  9:36 ` [PATCH v2 02/17] vhost: fix return code of messages requiring replies Maxime Coquelin
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.

Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 7ef3fb4a4..060b41893 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
 	}
 
 skip_to_post_handle:
-	if (!ret && dev->extern_ops.post_msg_handle) {
+	if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
 		uint32_t need_reply;
 
 		ret = (*dev->extern_ops.post_msg_handle)(
@@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
 		vhost_user_unlock_all_queue_pairs(dev);
 
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
-		msg.payload.u64 = !!ret;
+		msg.payload.u64 = ret == VH_RESULT_ERR;
 		msg.size = sizeof(msg.payload.u64);
 		send_vhost_reply(fd, &msg);
-	} else if (ret) {
+	} else if (ret == VH_RESULT_ERR) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"vhost message handling failed.\n");
 		return -1;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 02/17] vhost: fix return code of messages requiring replies
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 01/17] vhost: fix messages error checks Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-03 13:26   ` Ilya Maximets
  2018-10-02  9:36 ` [PATCH v2 03/17] vhost: fix error handling when mem table gets updated Maxime Coquelin
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.

Fixes: 2cfbbb86c62a ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 060b41893..ce0ac0098 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1161,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 
 	msg->size = sizeof(msg->payload.state);
 
-	return VH_RESULT_OK;
+	return VH_RESULT_REPLY;
 }
 
 /*
@@ -1218,7 +1218,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
 	msg->payload.u64 = protocol_features;
 	msg->size = sizeof(msg->payload.u64);
 
-	return VH_RESULT_OK;
+	return VH_RESULT_REPLY;
 }
 
 static int
@@ -1298,7 +1298,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
 
 	msg->size = sizeof(msg->payload.u64);
 
-	return VH_RESULT_OK;
+	return VH_RESULT_REPLY;
 }
 
 static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 03/17] vhost: fix error handling when mem table gets updated
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 01/17] vhost: fix messages error checks Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 02/17] vhost: fix return code of messages requiring replies Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 04/17] vhost: define postcopy protocol flag Maxime Coquelin
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.

Fixes: d5022533c20a ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ce0ac0098..c669d3c0a 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -964,7 +964,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg)
 
 			dev = translate_ring_addresses(dev, i);
 			if (!dev)
-				return VH_RESULT_ERR;
+				goto err_mmap;
+
 
 			*pdev = dev;
 		}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 04/17] vhost: define postcopy protocol flag
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (2 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 03/17] vhost: fix error handling when mem table gets updated Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 05/17] vhost: add number of fds to vhost-user messages and use it Maxime Coquelin
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/rte_vhost.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index b02673d4a..b3cc6990d 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -66,6 +66,10 @@ extern "C" {
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
 #endif
 
+#ifndef VHOST_USER_PROTOCOL_F_PAGEFAULT
+#define VHOST_USER_PROTOCOL_F_PAGEFAULT 8
+#endif
+
 /** Indicate whether protocol features negotiation is supported. */
 #ifndef VHOST_USER_F_PROTOCOL_FEATURES
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 05/17] vhost: add number of fds to vhost-user messages and use it
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (3 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 04/17] vhost: define postcopy protocol flag Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 06/17] vhost: pass socket fd to message handling callbacks Maxime Coquelin
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

As soons as some anciliarry datai (fds) are received, it is copied
without checking its length.

This patch adds adds the number of fds received to the message,
which is set in read_vhost_message().

This is preliminary work to support sending fds to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/socket.c     | 21 ++++++++++++++++-----
 lib/librte_vhost/vhost_user.c |  2 +-
 lib/librte_vhost/vhost_user.h |  4 +++-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index d63031747..c04d3d305 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -94,18 +94,23 @@ static struct vhost_user vhost_user = {
 	.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
 
-/* return bytes# of read on success or negative val on failure. */
+/*
+ * return bytes# of read on success or negative val on failure. Update fdnum
+ * with number of fds read.
+ */
 int
-read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
+		int *fd_num)
 {
 	struct iovec iov;
 	struct msghdr msgh;
-	size_t fdsize = fd_num * sizeof(int);
-	char control[CMSG_SPACE(fdsize)];
+	char control[CMSG_SPACE(max_fds * sizeof(int))];
 	struct cmsghdr *cmsg;
 	int got_fds = 0;
 	int ret;
 
+	*fd_num = 0;
+
 	memset(&msgh, 0, sizeof(msgh));
 	iov.iov_base = buf;
 	iov.iov_len  = buflen;
@@ -131,13 +136,19 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 		if ((cmsg->cmsg_level == SOL_SOCKET) &&
 			(cmsg->cmsg_type == SCM_RIGHTS)) {
 			got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int);
+			if (got_fds > max_fds) {
+				RTE_LOG(ERR, VHOST_CONFIG,
+						"Received msg contains more fds than supported\n");
+				return -1;
+			}
+			*fd_num = got_fds;
 			memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int));
 			break;
 		}
 	}
 
 	/* Clear out unused file descriptors */
-	while (got_fds < fd_num)
+	while (got_fds < max_fds)
 		fds[got_fds++] = -1;
 
 	return ret;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c669d3c0a..608d2f3e4 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1514,7 +1514,7 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg)
 	int ret;
 
 	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
-		msg->fds, VHOST_MEMORY_MAX_NREGIONS);
+		msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num);
 	if (ret <= 0)
 		return ret;
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 42166adf2..dd0262f8f 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -132,6 +132,7 @@ typedef struct VhostUserMsg {
 		VhostUserVringArea area;
 	} payload;
 	int fds[VHOST_MEMORY_MAX_NREGIONS];
+	int fd_num;
 } __attribute((packed)) VhostUserMsg;
 
 #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
@@ -146,7 +147,8 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 int vhost_user_host_notifier_ctrl(int vid, bool enable);
 
 /* socket.c */
-int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
+int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
+		int *fd_num);
 int send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
 
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 06/17] vhost: pass socket fd to message handling callbacks
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (4 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 05/17] vhost: add number of fds to vhost-user messages and use it Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 07/17] vhost: enable fds passing when sending vhost-user messages Maxime Coquelin
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 71 +++++++++++++++++++++++------------
 1 file changed, 47 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 608d2f3e4..050fc8bf9 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -138,14 +138,16 @@ vhost_backend_cleanup(struct virtio_net *dev)
  */
 static int
 vhost_user_set_owner(struct virtio_net **pdev __rte_unused,
-		struct VhostUserMsg *msg __rte_unused)
+			struct VhostUserMsg *msg __rte_unused,
+			int main_fd __rte_unused)
 {
 	return VH_RESULT_OK;
 }
 
 static int
 vhost_user_reset_owner(struct virtio_net **pdev,
-		struct VhostUserMsg *msg __rte_unused)
+			struct VhostUserMsg *msg __rte_unused,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	vhost_destroy_device_notify(dev);
@@ -159,7 +161,8 @@ vhost_user_reset_owner(struct virtio_net **pdev,
  * The features that we support are requested.
  */
 static int
-vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t features = 0;
@@ -176,7 +179,8 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
  * The queue number that we support are requested.
  */
 static int
-vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	uint32_t queue_num = 0;
@@ -193,7 +197,8 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg)
  * We receive the negotiated features supported by us and the virtio device.
  */
 static int
-vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t features = msg->payload.u64;
@@ -275,7 +280,8 @@ vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
  */
 static int
 vhost_user_set_vring_num(struct virtio_net **pdev,
-			 struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
@@ -637,7 +643,8 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index)
  * This function then converts these to our address space.
  */
 static int
-vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq;
@@ -674,7 +681,8 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg)
  */
 static int
 vhost_user_set_vring_base(struct virtio_net **pdev,
-			  struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	dev->virtqueue[msg->payload.state.index]->last_used_idx  =
@@ -807,7 +815,8 @@ vhost_memory_changed(struct VhostUserMemory *new,
 }
 
 static int
-vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct VhostUserMemory memory = msg->payload.memory;
@@ -1022,7 +1031,8 @@ virtio_is_ready(struct virtio_net *dev)
 }
 
 static int
-vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_vring_file file;
@@ -1046,7 +1056,8 @@ vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg)
 }
 
 static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
-			struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	if (!(msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK))
 		close(msg->fds[0]);
@@ -1056,7 +1067,8 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 }
 
 static int
-vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_vring_file file;
@@ -1115,7 +1127,8 @@ free_zmbufs(struct vhost_virtqueue *vq)
  */
 static int
 vhost_user_get_vring_base(struct virtio_net **pdev,
-			  struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
@@ -1171,7 +1184,8 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
  */
 static int
 vhost_user_set_vring_enable(struct virtio_net **pdev,
-			    struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
@@ -1199,7 +1213,8 @@ vhost_user_set_vring_enable(struct virtio_net **pdev,
 
 static int
 vhost_user_get_protocol_features(struct virtio_net **pdev,
-				 struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t features, protocol_features;
@@ -1224,7 +1239,8 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
 
 static int
 vhost_user_set_protocol_features(struct virtio_net **pdev,
-				 struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t protocol_features = msg->payload.u64;
@@ -1241,7 +1257,8 @@ vhost_user_set_protocol_features(struct virtio_net **pdev,
 }
 
 static int
-vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	int fd = msg->fds[0];
@@ -1303,7 +1320,8 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
 }
 
 static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
-			struct VhostUserMsg *msg)
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	close(msg->fds[0]);
 	RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
@@ -1320,7 +1338,8 @@ static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
  * a flag 'broadcast_rarp' to let rte_vhost_dequeue_burst() inject it.
  */
 static int
-vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	uint8_t *mac = (uint8_t *)&msg->payload.u64;
@@ -1350,7 +1369,8 @@ vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg)
 }
 
 static int
-vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	if (msg->payload.u64 < VIRTIO_MIN_MTU ||
@@ -1367,7 +1387,8 @@ vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg)
 }
 
 static int
-vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	int fd = msg->fds[0];
@@ -1434,7 +1455,8 @@ is_vring_iotlb_invalidate(struct vhost_virtqueue *vq,
 }
 
 static int
-vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
+vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
@@ -1479,7 +1501,8 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 }
 
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
-					struct VhostUserMsg *msg);
+					struct VhostUserMsg *msg,
+					int main_fd);
 static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
 	[VHOST_USER_NONE] = NULL,
 	[VHOST_USER_GET_FEATURES] = vhost_user_get_features,
@@ -1757,7 +1780,7 @@ vhost_user_msg_handler(int vid, int fd)
 	if (request > VHOST_USER_NONE && request < VHOST_USER_MAX) {
 		if (!vhost_message_handlers[request])
 			goto skip_to_post_handle;
-		ret = vhost_message_handlers[request](&dev, &msg);
+		ret = vhost_message_handlers[request](&dev, &msg, fd);
 
 		switch (ret) {
 		case VH_RESULT_ERR:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 07/17] vhost: enable fds passing when sending vhost-user messages
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (5 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 06/17] vhost: pass socket fd to message handling callbacks Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 08/17] vhost: add config flag for postcopy feature Maxime Coquelin
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Passing userfault fds to Qemu will be required for postcopy
live-migration feature.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 050fc8bf9..436ab7bf5 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -171,6 +171,7 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
 
 	msg->payload.u64 = features;
 	msg->size = sizeof(msg->payload.u64);
+	msg->fd_num = 0;
 
 	return VH_RESULT_REPLY;
 }
@@ -189,6 +190,7 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg,
 
 	msg->payload.u64 = (uint64_t)queue_num;
 	msg->size = sizeof(msg->payload.u64);
+	msg->fd_num = 0;
 
 	return VH_RESULT_REPLY;
 }
@@ -1174,6 +1176,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 	vq->batch_copy_elems = NULL;
 
 	msg->size = sizeof(msg->payload.state);
+	msg->fd_num = 0;
 
 	return VH_RESULT_REPLY;
 }
@@ -1233,6 +1236,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
 
 	msg->payload.u64 = protocol_features;
 	msg->size = sizeof(msg->payload.u64);
+	msg->fd_num = 0;
 
 	return VH_RESULT_REPLY;
 }
@@ -1315,6 +1319,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	dev->log_size = size;
 
 	msg->size = sizeof(msg->payload.u64);
+	msg->fd_num = 0;
 
 	return VH_RESULT_REPLY;
 }
@@ -1561,13 +1566,13 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg)
 }
 
 static int
-send_vhost_message(int sockfd, struct VhostUserMsg *msg, int *fds, int fd_num)
+send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 {
 	if (!msg)
 		return 0;
 
 	return send_fd_message(sockfd, (char *)msg,
-		VHOST_USER_HDR_SIZE + msg->size, fds, fd_num);
+		VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
 }
 
 static int
@@ -1581,19 +1586,18 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
 	msg->flags |= VHOST_USER_VERSION;
 	msg->flags |= VHOST_USER_REPLY_MASK;
 
-	return send_vhost_message(sockfd, msg, NULL, 0);
+	return send_vhost_message(sockfd, msg);
 }
 
 static int
-send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg,
-			 int *fds, int fd_num)
+send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
 	int ret;
 
 	if (msg->flags & VHOST_USER_NEED_REPLY)
 		rte_spinlock_lock(&dev->slave_req_lock);
 
-	ret = send_vhost_message(dev->slave_req_fd, msg, fds, fd_num);
+	ret = send_vhost_message(dev->slave_req_fd, msg);
 	if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY))
 		rte_spinlock_unlock(&dev->slave_req_lock);
 
@@ -1826,6 +1830,7 @@ vhost_user_msg_handler(int vid, int fd)
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = ret == VH_RESULT_ERR;
 		msg.size = sizeof(msg.payload.u64);
+		msg.fd_num = 0;
 		send_vhost_reply(fd, &msg);
 	} else if (ret == VH_RESULT_ERR) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -1909,7 +1914,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 		},
 	};
 
-	ret = send_vhost_message(dev->slave_req_fd, &msg, NULL, 0);
+	ret = send_vhost_message(dev->slave_req_fd, &msg);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 				"Failed to send IOTLB miss message (%d)\n",
@@ -1925,8 +1930,6 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 						    uint64_t offset,
 						    uint64_t size)
 {
-	int *fdp = NULL;
-	size_t fd_num = 0;
 	int ret;
 	struct VhostUserMsg msg = {
 		.request.slave = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
@@ -1942,11 +1945,11 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	if (fd < 0)
 		msg.payload.area.u64 |= VHOST_USER_VRING_NOFD_MASK;
 	else {
-		fdp = &fd;
-		fd_num = 1;
+		msg.fds[0] = fd;
+		msg.fd_num = 1;
 	}
 
-	ret = send_vhost_slave_message(dev, &msg, fdp, fd_num);
+	ret = send_vhost_slave_message(dev, &msg);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Failed to set host notifier (%d)\n", ret);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 08/17] vhost: add config flag for postcopy feature
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (6 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 07/17] vhost: enable fds passing when sending vhost-user messages Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 09/17] vhost: introduce postcopy's advise message Maxime Coquelin
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.

This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.

With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.

With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 config/common_linuxapp       | 1 +
 lib/librte_vhost/meson.build | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 9c5ea9d89..dc43dcc36 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -14,6 +14,7 @@ CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
+CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
index 9d25b4d88..e33e6fc16 100644
--- a/lib/librte_vhost/meson.build
+++ b/lib/librte_vhost/meson.build
@@ -7,6 +7,8 @@ endif
 if has_libnuma == 1
 	dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
 endif
+dpdk_conf.set('RTE_LIBRTE_VHOST_POSTCOPY',
+	      cc.has_header('linux/userfaultfd.h'))
 version = 4
 allow_experimental_apis = true
 cflags += '-fno-strict-aliasing'
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 09/17] vhost: introduce postcopy's advise message
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (7 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 08/17] vhost: add config flag for postcopy feature Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 10/17] vhost: add support for postcopy's listen message Maxime Coquelin
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  2 ++
 lib/librte_vhost/vhost_user.c | 44 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h |  3 ++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 25ffd7614..21722d8a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -363,6 +363,8 @@ struct virtio_net {
 	int			slave_req_fd;
 	rte_spinlock_t		slave_req_lock;
 
+	int			postcopy_ufd;
+
 	/*
 	 * Device id to identify a specific backend device.
 	 * It's set to -1 for the default software implementation.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 436ab7bf5..71721edc7 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -24,13 +24,19 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/syscall.h>
 #include <assert.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
 #include <numaif.h>
 #endif
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+#include <linux/userfaultfd.h>
+#endif
 
 #include <rte_common.h>
 #include <rte_malloc.h>
@@ -69,6 +75,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
 	[VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
 	[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
+	[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
 };
 
 /* The possible results of a message handling function */
@@ -1505,6 +1512,42 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	return VH_RESULT_OK;
 }
 
+static int
+vhost_user_set_postcopy_advise(struct virtio_net **pdev,
+			struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
+{
+	struct virtio_net *dev = *pdev;
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+	struct uffdio_api api_struct;
+
+	dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+
+	if (dev->postcopy_ufd == -1) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n",
+				strerror(errno));
+		return VH_RESULT_ERR;
+	}
+	api_struct.api = UFFD_API;
+	api_struct.features = 0;
+	if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n",
+				strerror(errno));
+		close(dev->postcopy_ufd);
+		return VH_RESULT_ERR;
+	}
+	msg->fds[0] = dev->postcopy_ufd;
+	msg->fd_num = 1;
+
+	return VH_RESULT_REPLY;
+#else
+	dev->postcopy_ufd = -1;
+	msg->fd_num = 0;
+
+	return VH_RESULT_ERR;
+#endif
+}
+
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 					struct VhostUserMsg *msg,
 					int main_fd);
@@ -1532,6 +1575,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
 	[VHOST_USER_NET_SET_MTU] = vhost_user_net_set_mtu,
 	[VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
 	[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
+	[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
 };
 
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index dd0262f8f..2030b40a5 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -50,7 +50,8 @@ typedef enum VhostUserRequest {
 	VHOST_USER_IOTLB_MSG = 22,
 	VHOST_USER_CRYPTO_CREATE_SESS = 26,
 	VHOST_USER_CRYPTO_CLOSE_SESS = 27,
-	VHOST_USER_MAX = 28
+	VHOST_USER_POSTCOPY_ADVISE = 28,
+	VHOST_USER_MAX = 29
 } VhostUserRequest;
 
 typedef enum VhostUserSlaveRequest {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 10/17] vhost: add support for postcopy's listen message
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (8 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 09/17] vhost: introduce postcopy's advise message Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02 14:11   ` Ilya Maximets
  2018-10-02  9:36 ` [PATCH v2 11/17] vhost: register new regions with userfaultfd Maxime Coquelin
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 19 +++++++++++++++++++
 lib/librte_vhost/vhost_user.h |  4 +++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 21722d8a8..9453cb28d 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -364,6 +364,7 @@ struct virtio_net {
 	rte_spinlock_t		slave_req_lock;
 
 	int			postcopy_ufd;
+	int			postcopy_listening;
 
 	/*
 	 * Device id to identify a specific backend device.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 71721edc7..bd468ca12 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
 	[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
 	[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
+	[VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
 };
 
 /* The possible results of a message handling function */
@@ -1548,6 +1549,23 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev,
 #endif
 }
 
+static int
+vhost_user_set_postcopy_listen(struct virtio_net **pdev,
+			struct VhostUserMsg *msg __rte_unused,
+			int main_fd __rte_unused)
+{
+	struct virtio_net *dev = *pdev;
+
+	if (dev->mem && dev->mem->nregions) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Regions already registered at postcopy-listen\n");
+		return VH_RESULT_ERR;
+	}
+	dev->postcopy_listening = 1;
+
+	return VH_RESULT_OK;
+}
+
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 					struct VhostUserMsg *msg,
 					int main_fd);
@@ -1576,6 +1594,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
 	[VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
 	[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
 	[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
+	[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
 };
 
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 2030b40a5..73b1fe2b9 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -51,7 +51,9 @@ typedef enum VhostUserRequest {
 	VHOST_USER_CRYPTO_CREATE_SESS = 26,
 	VHOST_USER_CRYPTO_CLOSE_SESS = 27,
 	VHOST_USER_POSTCOPY_ADVISE = 28,
-	VHOST_USER_MAX = 29
+	VHOST_USER_POSTCOPY_LISTEN = 29,
+	VHOST_USER_POSTCOPY_END = 30,
+	VHOST_USER_MAX = 31
 } VhostUserRequest;
 
 typedef enum VhostUserSlaveRequest {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 11/17] vhost: register new regions with userfaultfd
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (9 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 10/17] vhost: add support for postcopy's listen message Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 12/17] vhost: avoid useless VhostUserMemory copy Maxime Coquelin
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index bd468ca12..2f681d291 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -968,6 +968,32 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 			mmap_size,
 			alignment,
 			mmap_offset);
+
+		if (dev->postcopy_listening) {
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+			struct uffdio_register reg_struct;
+
+			reg_struct.range.start = (uint64_t)(uintptr_t)mmap_addr;
+			reg_struct.range.len = mmap_size;
+			reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+			if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
+						&reg_struct)) {
+				RTE_LOG(ERR, VHOST_CONFIG,
+						"Failed to register ufd for region %d: (ufd = %d) %s\n",
+						i, dev->postcopy_ufd,
+						strerror(errno));
+				goto err_ufd;
+			}
+			RTE_LOG(INFO, VHOST_CONFIG,
+					"\t userfaultfd registered for range : %llx - %llx\n",
+					reg_struct.range.start,
+					reg_struct.range.start +
+					reg_struct.range.len - 1);
+#else
+			goto err_ufd;
+#endif
+		}
 	}
 
 	for (i = 0; i < dev->nr_vring; i++) {
@@ -983,7 +1009,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 
 			dev = translate_ring_addresses(dev, i);
 			if (!dev)
-				goto err_mmap;
+				goto err_ufd;
 
 
 			*pdev = dev;
@@ -994,6 +1020,11 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 
 	return VH_RESULT_OK;
 
+err_ufd:
+	if (dev->postcopy_ufd >= 0) {
+		close(dev->postcopy_ufd);
+		dev->postcopy_ufd = -1;
+	}
 err_mmap:
 	free_mem_region(dev);
 	rte_free(dev->mem);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 12/17] vhost: avoid useless VhostUserMemory copy
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (10 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 11/17] vhost: register new regions with userfaultfd Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 13/17] vhost: send userfault range addresses back to qemu Maxime Coquelin
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.

This is not very important, but next, we'll need to update the
payload and send it back to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 2f681d291..515d3c61c 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -829,7 +829,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
-	struct VhostUserMemory memory = msg->payload.memory;
+	struct VhostUserMemory *memory = &msg->payload.memory;
 	struct rte_vhost_mem_region *reg;
 	void *mmap_addr;
 	uint64_t mmap_size;
@@ -839,17 +839,17 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	int populate;
 	int fd;
 
-	if (memory.nregions > VHOST_MEMORY_MAX_NREGIONS) {
+	if (memory->nregions > VHOST_MEMORY_MAX_NREGIONS) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"too many memory regions (%u)\n", memory.nregions);
+			"too many memory regions (%u)\n", memory->nregions);
 		return VH_RESULT_ERR;
 	}
 
-	if (dev->mem && !vhost_memory_changed(&memory, dev->mem)) {
+	if (dev->mem && !vhost_memory_changed(memory, dev->mem)) {
 		RTE_LOG(INFO, VHOST_CONFIG,
 			"(%d) memory regions not changed\n", dev->vid);
 
-		for (i = 0; i < memory.nregions; i++)
+		for (i = 0; i < memory->nregions; i++)
 			close(msg->fds[i]);
 
 		return VH_RESULT_OK;
@@ -881,25 +881,25 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	}
 
 	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct rte_vhost_memory) +
-		sizeof(struct rte_vhost_mem_region) * memory.nregions, 0);
+		sizeof(struct rte_vhost_mem_region) * memory->nregions, 0);
 	if (dev->mem == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to allocate memory for dev->mem\n",
 			dev->vid);
 		return VH_RESULT_ERR;
 	}
-	dev->mem->nregions = memory.nregions;
+	dev->mem->nregions = memory->nregions;
 
-	for (i = 0; i < memory.nregions; i++) {
+	for (i = 0; i < memory->nregions; i++) {
 		fd  = msg->fds[i];
 		reg = &dev->mem->regions[i];
 
-		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
-		reg->guest_user_addr = memory.regions[i].userspace_addr;
-		reg->size            = memory.regions[i].memory_size;
+		reg->guest_phys_addr = memory->regions[i].guest_phys_addr;
+		reg->guest_user_addr = memory->regions[i].userspace_addr;
+		reg->size            = memory->regions[i].memory_size;
 		reg->fd              = fd;
 
-		mmap_offset = memory.regions[i].mmap_offset;
+		mmap_offset = memory->regions[i].mmap_offset;
 
 		/* Check for memory_size + mmap_offset overflow */
 		if (mmap_offset >= -reg->size) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 13/17] vhost: send userfault range addresses back to qemu
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (11 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 12/17] vhost: avoid useless VhostUserMemory copy Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 14/17] vhost: add support to postcopy's end request Maxime Coquelin
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 49 ++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 515d3c61c..b207de6e0 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -89,6 +89,11 @@ enum vh_result {
 	VH_RESULT_REPLY =  1,
 };
 
+static int
+send_vhost_reply(int sockfd, struct VhostUserMsg *msg);
+static int
+read_vhost_message(int sockfd, struct VhostUserMsg *msg);
+
 static uint64_t
 get_blk_size(int fd)
 {
@@ -826,7 +831,7 @@ vhost_memory_changed(struct VhostUserMemory *new,
 
 static int
 vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			int main_fd)
 {
 	struct virtio_net *dev = *pdev;
 	struct VhostUserMemory *memory = &msg->payload.memory;
@@ -970,11 +975,49 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 			mmap_offset);
 
 		if (dev->postcopy_listening) {
+			/*
+			 * We haven't a better way right now than sharing
+			 * DPDK's virtual address with Qemu, so that Qemu can
+			 * retreive the region offset when handling userfaults.
+			 */
+			memory->regions[i].userspace_addr =
+				reg->host_user_addr;
+		}
+	}
+	if (dev->postcopy_listening) {
+		/* Send the addresses back to qemu */
+		msg->fd_num = 0;
+		send_vhost_reply(main_fd, msg);
+
+		/* Wait for qemu to acknolwedge it's got the addresses
+		 * we've got to wait before we're allowed to generate faults.
+		 */
+		VhostUserMsg ack_msg;
+		if (read_vhost_message(main_fd, &ack_msg) <= 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+					"Failed to read qemu ack on postcopy set-mem-table\n");
+			goto err_mmap;
+		}
+		if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+					"Bad qemu ack on postcopy set-mem-table (%d)\n",
+					ack_msg.request.master);
+			goto err_mmap;
+		}
+
+		/* Now userfault register and we can use the memory */
+		for (i = 0; i < memory->nregions; i++) {
 #ifdef RTE_LIBRTE_VHOST_POSTCOPY
+			reg = &dev->mem->regions[i];
 			struct uffdio_register reg_struct;
 
-			reg_struct.range.start = (uint64_t)(uintptr_t)mmap_addr;
-			reg_struct.range.len = mmap_size;
+			/*
+			 * Let's register all the mmap'ed area to ensure
+			 * alignement on page boundary.
+			 */
+			reg_struct.range.start =
+				(uint64_t)(uintptr_t)reg->mmap_addr;
+			reg_struct.range.len = reg->mmap_size;
 			reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
 			if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 14/17] vhost: add support to postcopy's end request
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (12 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 13/17] vhost: send userfault range addresses back to qemu Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02 14:18   ` Ilya Maximets
  2018-10-02  9:36 ` [PATCH v2 15/17] vhost: enable postcopy protocol feature Maxime Coquelin
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.

The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index b207de6e0..ee7337ac8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -77,6 +77,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
 	[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
 	[VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
+	[VHOST_USER_POSTCOPY_END]  = "VHOST_USER_POSTCOPY_END",
 };
 
 /* The possible results of a message handling function */
@@ -1640,6 +1641,25 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev,
 	return VH_RESULT_OK;
 }
 
+static int
+vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg,
+			int main_fd __rte_unused)
+{
+	struct virtio_net *dev = *pdev;
+
+	dev->postcopy_listening = 0;
+	if (dev->postcopy_ufd >= 0) {
+		close(dev->postcopy_ufd);
+		dev->postcopy_ufd = -1;
+	}
+
+	msg->payload.u64 = 0;
+	msg->size = sizeof(msg->payload.u64);
+	msg->fd_num = 0;
+
+	return 0;
+}
+
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 					struct VhostUserMsg *msg,
 					int main_fd);
@@ -1669,6 +1689,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
 	[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
 	[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
 	[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
+	[VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end,
 };
 
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 15/17] vhost: enable postcopy protocol feature
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (13 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 14/17] vhost: add support to postcopy's end request Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 16/17] vhost: add flag to enable postcopy live-migration Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 17/17] net/vhost: add parameter to enable postcopy support Maxime Coquelin
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Enable postcopy protocol feature except if dequeue
zero-copy is enabled. In this case, guest memory requires
to be populated, which is not compatible with userfaultfd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 7 +++++++
 lib/librte_vhost/vhost_user.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ee7337ac8..9d08f4af0 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1317,6 +1317,13 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
 	if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
 		protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK);
 
+	/*
+	 * If dequeue zerocopy is enabled, guest memory requires to be
+	 * populated, which is not compatible with postcopy.
+	 */
+	if (dev->dequeue_zero_copy)
+		protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT);
+
 	msg->payload.u64 = protocol_features;
 	msg->size = sizeof(msg->payload.u64);
 	msg->fd_num = 0;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 73b1fe2b9..dc97be843 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -22,7 +22,8 @@
 					 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
-					 (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER))
+					 (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT))
 
 typedef enum VhostUserRequest {
 	VHOST_USER_NONE = 0,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 16/17] vhost: add flag to enable postcopy live-migration
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (14 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 15/17] vhost: enable postcopy protocol feature Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  2018-10-02  9:36 ` [PATCH v2 17/17] net/vhost: add parameter to enable postcopy support Maxime Coquelin
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Postcopy live-migration feature require the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.

The application should only enable the feature if it does not
force the guest memory to be populated.

In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst |  8 ++++++++
 lib/librte_vhost/rte_vhost.h        |  1 +
 lib/librte_vhost/socket.c           | 19 +++++++++++++++++--
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 77af4d775..c77df338f 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -106,6 +106,14 @@ The following is an overview of some key Vhost API functions:
     Enabling this flag with these Qemu version results in Qemu being blocked
     when multiple queue pairs are declared.
 
+  - ``RTE_VHOST_USER_POSTCOPY_SUPPORT``
+
+    Postcopy live-migration support will be enabled when this flag is set.
+    It is disabled by default.
+
+    Enabling this flag should only be done when the calling application does
+    not pre-fault the guest shared memory, otherwise migration would fail.
+
 * ``rte_vhost_driver_set_features(path, features)``
 
   This function sets the feature bits the vhost-user driver supports. The
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index b3cc6990d..b26afbffa 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -28,6 +28,7 @@ extern "C" {
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
 #define RTE_VHOST_USER_IOMMU_SUPPORT	(1ULL << 3)
+#define RTE_VHOST_USER_POSTCOPY_SUPPORT		(1ULL << 4)
 
 /** Protocol features. */
 #ifndef VHOST_USER_PROTOCOL_F_MQ
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index c04d3d305..3df303be8 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -51,6 +51,8 @@ struct vhost_user_socket {
 	uint64_t supported_features;
 	uint64_t features;
 
+	uint64_t protocol_features;
+
 	/*
 	 * Device id to identify a specific backend device.
 	 * It's set to -1 for the default software implementation.
@@ -731,7 +733,7 @@ rte_vhost_driver_get_protocol_features(const char *path,
 	did = vsocket->vdpa_dev_id;
 	vdpa_dev = rte_vdpa_get_device(did);
 	if (!vdpa_dev || !vdpa_dev->ops->get_protocol_features) {
-		*protocol_features = VHOST_USER_PROTOCOL_FEATURES;
+		*protocol_features = vsocket->protocol_features;
 		goto unlock_exit;
 	}
 
@@ -744,7 +746,7 @@ rte_vhost_driver_get_protocol_features(const char *path,
 		goto unlock_exit;
 	}
 
-	*protocol_features = VHOST_USER_PROTOCOL_FEATURES
+	*protocol_features = vsocket->protocol_features
 		& vdpa_protocol_features;
 
 unlock_exit:
@@ -863,6 +865,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	vsocket->use_builtin_virtio_net = true;
 	vsocket->supported_features = VIRTIO_NET_SUPPORTED_FEATURES;
 	vsocket->features           = VIRTIO_NET_SUPPORTED_FEATURES;
+	vsocket->protocol_features  = VHOST_USER_PROTOCOL_FEATURES;
 
 	/* Dequeue zero copy can't assure descriptors returned in order */
 	if (vsocket->dequeue_zero_copy) {
@@ -875,6 +878,18 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		vsocket->features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
 	}
 
+	if (!(flags & RTE_VHOST_USER_POSTCOPY_SUPPORT)) {
+		vsocket->protocol_features &=
+			~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT);
+	} else {
+#ifndef RTE_LIBRTE_VHOST_POSTCOPY
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Postcopy requested but not compiled\n");
+		ret = -1;
+		goto out_mutex;
+#endif
+	}
+
 	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
 		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
 		if (vsocket->reconnect && reconn_tid == 0) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 17/17] net/vhost: add parameter to enable postcopy support
  2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
                   ` (15 preceding siblings ...)
  2018-10-02  9:36 ` [PATCH v2 16/17] vhost: add flag to enable postcopy live-migration Maxime Coquelin
@ 2018-10-02  9:36 ` Maxime Coquelin
  16 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-02  9:36 UTC (permalink / raw)
  To: dev, tiwei.bie, zhihong.wang, jfreimann, nicknickolaev,
	i.maximets, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Maxime Coquelin

Introduce a new postcopy-support parameter to Vhost PMD that
passes the RTE_VHOST_USER_POSTCOPY_SUPPORT flag at vhost
device register time.

Flag should only be set if application does not prefault guest
memory using, for example, mlockall() syscall.

Default value is 0, meaning that postcopy support is disabled
unless specified explicitly.

Example to enable postcopy support for a given device:

--vdev 'net_vhost0,iface=/tmp/vhost-user1,postcopy-support=1'

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/nics/vhost.rst         |  5 +++++
 drivers/net/vhost/rte_eth_vhost.c | 13 +++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
index 4f7ae8990..23f2e87aa 100644
--- a/doc/guides/nics/vhost.rst
+++ b/doc/guides/nics/vhost.rst
@@ -71,6 +71,11 @@ The user can specify below arguments in `--vdev` option.
     It is used to enable iommu support in vhost library.
     (Default: 0 (disabled))
 
+#.  ``postcopy-support``:
+
+    It is used to enable postcopy live-migration support in vhost library.
+    (Default: 0 (disabled))
+
 Vhost PMD event handling
 ------------------------
 
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index aa6052221..1330f06ba 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -30,6 +30,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 #define ETH_VHOST_CLIENT_ARG		"client"
 #define ETH_VHOST_DEQUEUE_ZERO_COPY	"dequeue-zero-copy"
 #define ETH_VHOST_IOMMU_SUPPORT		"iommu-support"
+#define ETH_VHOST_POSTCOPY_SUPPORT	"postcopy-support"
 #define VHOST_MAX_PKT_BURST 32
 
 static const char *valid_arguments[] = {
@@ -38,6 +39,7 @@ static const char *valid_arguments[] = {
 	ETH_VHOST_CLIENT_ARG,
 	ETH_VHOST_DEQUEUE_ZERO_COPY,
 	ETH_VHOST_IOMMU_SUPPORT,
+	ETH_VHOST_POSTCOPY_SUPPORT,
 	NULL
 };
 
@@ -1339,6 +1341,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	int client_mode = 0;
 	int dequeue_zero_copy = 0;
 	int iommu_support = 0;
+	int postcopy_support = 0;
 	struct rte_eth_dev *eth_dev;
 	const char *name = rte_vdev_device_name(dev);
 
@@ -1411,6 +1414,16 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 			flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
 	}
 
+	if (rte_kvargs_count(kvlist, ETH_VHOST_POSTCOPY_SUPPORT) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_POSTCOPY_SUPPORT,
+					 &open_int, &postcopy_support);
+		if (ret < 0)
+			goto out_free;
+
+		if (postcopy_support)
+			flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT;
+	}
+
 	if (dev->device.numa_node == SOCKET_ID_ANY)
 		dev->device.numa_node = rte_socket_id();
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/17] vhost: add support for postcopy's listen message
  2018-10-02  9:36 ` [PATCH v2 10/17] vhost: add support for postcopy's listen message Maxime Coquelin
@ 2018-10-02 14:11   ` Ilya Maximets
  2018-10-03  7:46     ` Maxime Coquelin
  0 siblings, 1 reply; 32+ messages in thread
From: Ilya Maximets @ 2018-10-02 14:11 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable

On 02.10.2018 12:36, Maxime Coquelin wrote:
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost.h      |  1 +
>  lib/librte_vhost/vhost_user.c | 19 +++++++++++++++++++
>  lib/librte_vhost/vhost_user.h |  4 +++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 21722d8a8..9453cb28d 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -364,6 +364,7 @@ struct virtio_net {
>  	rte_spinlock_t		slave_req_lock;
>  
>  	int			postcopy_ufd;
> +	int			postcopy_listening;
>  
>  	/*
>  	 * Device id to identify a specific backend device.
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 71721edc7..bd468ca12 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
>  	[VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
>  	[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
>  	[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
> +	[VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
>  };
>  
>  /* The possible results of a message handling function */
> @@ -1548,6 +1549,23 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev,
>  #endif
>  }
>  
> +static int
> +vhost_user_set_postcopy_listen(struct virtio_net **pdev,
> +			struct VhostUserMsg *msg __rte_unused,
> +			int main_fd __rte_unused)
> +{
> +	struct virtio_net *dev = *pdev;
> +
> +	if (dev->mem && dev->mem->nregions) {
> +		RTE_LOG(ERR, VHOST_CONFIG,
> +				"Regions already registered at postcopy-listen\n");
> +		return VH_RESULT_ERR;
> +	}
> +	dev->postcopy_listening = 1;
> +
> +	return VH_RESULT_OK;
> +}
> +
>  typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  					struct VhostUserMsg *msg,
>  					int main_fd);
> @@ -1576,6 +1594,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
>  	[VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
>  	[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
>  	[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
> +	[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
>  };
>  
>  
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index 2030b40a5..73b1fe2b9 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -51,7 +51,9 @@ typedef enum VhostUserRequest {
>  	VHOST_USER_CRYPTO_CREATE_SESS = 26,
>  	VHOST_USER_CRYPTO_CLOSE_SESS = 27,
>  	VHOST_USER_POSTCOPY_ADVISE = 28,
> -	VHOST_USER_MAX = 29
> +	VHOST_USER_POSTCOPY_LISTEN = 29,
> +	VHOST_USER_POSTCOPY_END = 30,

I think, this should be part of patch 14 for consistency.

> +	VHOST_USER_MAX = 31
>  } VhostUserRequest;
>  
>  typedef enum VhostUserSlaveRequest {
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-02  9:36 ` [PATCH v2 01/17] vhost: fix messages error checks Maxime Coquelin
@ 2018-10-02 14:15   ` Ilya Maximets
  2018-10-03  7:50     ` Maxime Coquelin
  0 siblings, 1 reply; 32+ messages in thread
From: Ilya Maximets @ 2018-10-02 14:15 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable

On 02.10.2018 12:36, Maxime Coquelin wrote:
> Return of message handling has now changed to an enum that can
> take non-negative value that is not zero in case a reply is
> needed. But the code checking the variable afterwards has not
> been updated, leading to success messages handling being
> treated as errors.
> 
> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost_user.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 7ef3fb4a4..060b41893 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>  	}
>  
>  skip_to_post_handle:
> -	if (!ret && dev->extern_ops.post_msg_handle) {
> +	if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>  		uint32_t need_reply;
>  
>  		ret = (*dev->extern_ops.post_msg_handle)(
> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>  		vhost_user_unlock_all_queue_pairs(dev);
>  
>  	if (msg.flags & VHOST_USER_NEED_REPLY) {

Maybe we need to reply here only if we didn't reply
already (not VH_RESULT_REPLY) ? Otherwise, we could
reply twice (with payload and with return code).

> -		msg.payload.u64 = !!ret;
> +		msg.payload.u64 = ret == VH_RESULT_ERR;
>  		msg.size = sizeof(msg.payload.u64);
>  		send_vhost_reply(fd, &msg);
> -	} else if (ret) {
> +	} else if (ret == VH_RESULT_ERR) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
>  			"vhost message handling failed.\n");
>  		return -1;
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 14/17] vhost: add support to postcopy's end request
  2018-10-02  9:36 ` [PATCH v2 14/17] vhost: add support to postcopy's end request Maxime Coquelin
@ 2018-10-02 14:18   ` Ilya Maximets
  0 siblings, 0 replies; 32+ messages in thread
From: Ilya Maximets @ 2018-10-02 14:18 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable

On 02.10.2018 12:36, Maxime Coquelin wrote:
> The master sends this message before stopping handling
> userfaults, so that the backend closes the userfaultfd.
> 
> The master waits for the slave to acknowledge the request
> with an empty 64bits payload for synchronization purpose.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost_user.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index b207de6e0..ee7337ac8 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -77,6 +77,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
>  	[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
>  	[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
>  	[VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
> +	[VHOST_USER_POSTCOPY_END]  = "VHOST_USER_POSTCOPY_END",
>  };
>  
>  /* The possible results of a message handling function */
> @@ -1640,6 +1641,25 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev,
>  	return VH_RESULT_OK;
>  }
>  
> +static int
> +vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg,
> +			int main_fd __rte_unused)
> +{
> +	struct virtio_net *dev = *pdev;
> +
> +	dev->postcopy_listening = 0;
> +	if (dev->postcopy_ufd >= 0) {
> +		close(dev->postcopy_ufd);
> +		dev->postcopy_ufd = -1;
> +	}
> +
> +	msg->payload.u64 = 0;
> +	msg->size = sizeof(msg->payload.u64);
> +	msg->fd_num = 0;
> +
> +	return 0;

I think, it should return VH_RESULT_REPLY as the reply is
mandatory according to specification. In pair with change
in patch #1 this will not produce double replies.

> +}
> +
>  typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  					struct VhostUserMsg *msg,
>  					int main_fd);
> @@ -1669,6 +1689,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
>  	[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
>  	[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
>  	[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
> +	[VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end,
>  };
>  
>  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/17] vhost: add support for postcopy's listen message
  2018-10-02 14:11   ` Ilya Maximets
@ 2018-10-03  7:46     ` Maxime Coquelin
  0 siblings, 0 replies; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-03  7:46 UTC (permalink / raw)
  To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable



On 10/02/2018 04:11 PM, Ilya Maximets wrote:
> On 02.10.2018 12:36, Maxime Coquelin wrote:
>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/vhost.h      |  1 +
>>   lib/librte_vhost/vhost_user.c | 19 +++++++++++++++++++
>>   lib/librte_vhost/vhost_user.h |  4 +++-
>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
>> index 21722d8a8..9453cb28d 100644
>> --- a/lib/librte_vhost/vhost.h
>> +++ b/lib/librte_vhost/vhost.h
>> @@ -364,6 +364,7 @@ struct virtio_net {
>>   	rte_spinlock_t		slave_req_lock;
>>   
>>   	int			postcopy_ufd;
>> +	int			postcopy_listening;
>>   
>>   	/*
>>   	 * Device id to identify a specific backend device.
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index 71721edc7..bd468ca12 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
>>   	[VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
>>   	[VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
>>   	[VHOST_USER_POSTCOPY_ADVISE]  = "VHOST_USER_POSTCOPY_ADVISE",
>> +	[VHOST_USER_POSTCOPY_LISTEN]  = "VHOST_USER_POSTCOPY_LISTEN",
>>   };
>>   
>>   /* The possible results of a message handling function */
>> @@ -1548,6 +1549,23 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev,
>>   #endif
>>   }
>>   
>> +static int
>> +vhost_user_set_postcopy_listen(struct virtio_net **pdev,
>> +			struct VhostUserMsg *msg __rte_unused,
>> +			int main_fd __rte_unused)
>> +{
>> +	struct virtio_net *dev = *pdev;
>> +
>> +	if (dev->mem && dev->mem->nregions) {
>> +		RTE_LOG(ERR, VHOST_CONFIG,
>> +				"Regions already registered at postcopy-listen\n");
>> +		return VH_RESULT_ERR;
>> +	}
>> +	dev->postcopy_listening = 1;
>> +
>> +	return VH_RESULT_OK;
>> +}
>> +
>>   typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>>   					struct VhostUserMsg *msg,
>>   					int main_fd);
>> @@ -1576,6 +1594,7 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
>>   	[VHOST_USER_SET_SLAVE_REQ_FD] = vhost_user_set_req_fd,
>>   	[VHOST_USER_IOTLB_MSG] = vhost_user_iotlb_msg,
>>   	[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
>> +	[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
>>   };
>>   
>>   
>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
>> index 2030b40a5..73b1fe2b9 100644
>> --- a/lib/librte_vhost/vhost_user.h
>> +++ b/lib/librte_vhost/vhost_user.h
>> @@ -51,7 +51,9 @@ typedef enum VhostUserRequest {
>>   	VHOST_USER_CRYPTO_CREATE_SESS = 26,
>>   	VHOST_USER_CRYPTO_CLOSE_SESS = 27,
>>   	VHOST_USER_POSTCOPY_ADVISE = 28,
>> -	VHOST_USER_MAX = 29
>> +	VHOST_USER_POSTCOPY_LISTEN = 29,
>> +	VHOST_USER_POSTCOPY_END = 30,
> 
> I think, this should be part of patch 14 for consistency.

Indeed, it has been fixed-up in wrong commit I guess.

Thanks for spotting it.
Maxime

>> +	VHOST_USER_MAX = 31
>>   } VhostUserRequest;
>>   
>>   typedef enum VhostUserSlaveRequest {
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-02 14:15   ` Ilya Maximets
@ 2018-10-03  7:50     ` Maxime Coquelin
  2018-10-03  7:57       ` Ilya Maximets
  0 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-03  7:50 UTC (permalink / raw)
  To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable



On 10/02/2018 04:15 PM, Ilya Maximets wrote:
> On 02.10.2018 12:36, Maxime Coquelin wrote:
>> Return of message handling has now changed to an enum that can
>> take non-negative value that is not zero in case a reply is
>> needed. But the code checking the variable afterwards has not
>> been updated, leading to success messages handling being
>> treated as errors.
>>
>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/vhost_user.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index 7ef3fb4a4..060b41893 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>   	}
>>   
>>   skip_to_post_handle:
>> -	if (!ret && dev->extern_ops.post_msg_handle) {
>> +	if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>   		uint32_t need_reply;
>>   
>>   		ret = (*dev->extern_ops.post_msg_handle)(
>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>   		vhost_user_unlock_all_queue_pairs(dev);
>>   
>>   	if (msg.flags & VHOST_USER_NEED_REPLY) {
> 
> Maybe we need to reply here only if we didn't reply
> already (not VH_RESULT_REPLY) ? Otherwise, we could
> reply twice (with payload and with return code).

Well, if the master sets this bit, it means it is waiting for
a "reply-ack", so not sending it would cause the master to wait
forever.

It is the master responsibility to not set this bit for requests
already expecting a non "reply-ack" reply (as you fixed it for
postcopy's set mem table case).

>> -		msg.payload.u64 = !!ret;
>> +		msg.payload.u64 = ret == VH_RESULT_ERR;
>>   		msg.size = sizeof(msg.payload.u64);
>>   		send_vhost_reply(fd, &msg);
>> -	} else if (ret) {
>> +	} else if (ret == VH_RESULT_ERR) {
>>   		RTE_LOG(ERR, VHOST_CONFIG,
>>   			"vhost message handling failed.\n");
>>   		return -1;
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-03  7:50     ` Maxime Coquelin
@ 2018-10-03  7:57       ` Ilya Maximets
  2018-10-03  8:02         ` Maxime Coquelin
  0 siblings, 1 reply; 32+ messages in thread
From: Ilya Maximets @ 2018-10-03  7:57 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Michael S. Tsirkin

On 03.10.2018 10:50, Maxime Coquelin wrote:
> 
> 
> On 10/02/2018 04:15 PM, Ilya Maximets wrote:
>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>> Return of message handling has now changed to an enum that can
>>> take non-negative value that is not zero in case a reply is
>>> needed. But the code checking the variable afterwards has not
>>> been updated, leading to success messages handling being
>>> treated as errors.
>>>
>>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>   lib/librte_vhost/vhost_user.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>> index 7ef3fb4a4..060b41893 100644
>>> --- a/lib/librte_vhost/vhost_user.c
>>> +++ b/lib/librte_vhost/vhost_user.c
>>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>       }
>>>     skip_to_post_handle:
>>> -    if (!ret && dev->extern_ops.post_msg_handle) {
>>> +    if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>>           uint32_t need_reply;
>>>             ret = (*dev->extern_ops.post_msg_handle)(
>>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>>           vhost_user_unlock_all_queue_pairs(dev);
>>>         if (msg.flags & VHOST_USER_NEED_REPLY) {
>>
>> Maybe we need to reply here only if we didn't reply
>> already (not VH_RESULT_REPLY) ? Otherwise, we could
>> reply twice (with payload and with return code).
> 
> Well, if the master sets this bit, it means it is waiting for
> a "reply-ack", so not sending it would cause the master to wait
> forever.
> 
> It is the master responsibility to not set this bit for requests
> already expecting a non "reply-ack" reply (as you fixed it for
> postcopy's set mem table case).

vhost-user docs in QEMU says:
"
For the message types that already solicit a reply from the client, the
presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
no behavioural change.
"
i.e. even if QEMU sets the need_reply flag, vhost should not reply twice.
Am I missing something?

> 
>>> -        msg.payload.u64 = !!ret;
>>> +        msg.payload.u64 = ret == VH_RESULT_ERR;
>>>           msg.size = sizeof(msg.payload.u64);
>>>           send_vhost_reply(fd, &msg);
>>> -    } else if (ret) {
>>> +    } else if (ret == VH_RESULT_ERR) {
>>>           RTE_LOG(ERR, VHOST_CONFIG,
>>>               "vhost message handling failed.\n");
>>>           return -1;
>>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-03  7:57       ` Ilya Maximets
@ 2018-10-03  8:02         ` Maxime Coquelin
  2018-10-03  8:32           ` Ilya Maximets
  0 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-03  8:02 UTC (permalink / raw)
  To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Michael S. Tsirkin



On 10/03/2018 09:57 AM, Ilya Maximets wrote:
> On 03.10.2018 10:50, Maxime Coquelin wrote:
>>
>>
>> On 10/02/2018 04:15 PM, Ilya Maximets wrote:
>>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>>> Return of message handling has now changed to an enum that can
>>>> take non-negative value that is not zero in case a reply is
>>>> needed. But the code checking the variable afterwards has not
>>>> been updated, leading to success messages handling being
>>>> treated as errors.
>>>>
>>>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>>>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>>    lib/librte_vhost/vhost_user.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>> index 7ef3fb4a4..060b41893 100644
>>>> --- a/lib/librte_vhost/vhost_user.c
>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>        }
>>>>      skip_to_post_handle:
>>>> -    if (!ret && dev->extern_ops.post_msg_handle) {
>>>> +    if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>>>            uint32_t need_reply;
>>>>              ret = (*dev->extern_ops.post_msg_handle)(
>>>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>>>            vhost_user_unlock_all_queue_pairs(dev);
>>>>          if (msg.flags & VHOST_USER_NEED_REPLY) {
>>>
>>> Maybe we need to reply here only if we didn't reply
>>> already (not VH_RESULT_REPLY) ? Otherwise, we could
>>> reply twice (with payload and with return code).
>>
>> Well, if the master sets this bit, it means it is waiting for
>> a "reply-ack", so not sending it would cause the master to wait
>> forever.
>>
>> It is the master responsibility to not set this bit for requests
>> already expecting a non "reply-ack" reply (as you fixed it for
>> postcopy's set mem table case).
> 
> vhost-user docs in QEMU says:
> "
> For the message types that already solicit a reply from the client, the
> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> no behavioural change.
> "
> i.e. even if QEMU sets the need_reply flag, vhost should not reply twice.
> Am I missing something?

Oh, right. Thanks for pointing it out.

So coming back to the DPDK implementation, I just had a look again, and 
it seems that we don't send a reply twice, as send_vhost_reply takes
care of clearing the VHOST_USER_NEED_REPLY flag.
Do you confirm my understanding is correct?

>>
>>>> -        msg.payload.u64 = !!ret;
>>>> +        msg.payload.u64 = ret == VH_RESULT_ERR;
>>>>            msg.size = sizeof(msg.payload.u64);
>>>>            send_vhost_reply(fd, &msg);
>>>> -    } else if (ret) {
>>>> +    } else if (ret == VH_RESULT_ERR) {
>>>>            RTE_LOG(ERR, VHOST_CONFIG,
>>>>                "vhost message handling failed.\n");
>>>>            return -1;
>>>>
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-03  8:02         ` Maxime Coquelin
@ 2018-10-03  8:32           ` Ilya Maximets
  2018-10-03  9:07             ` Ilya Maximets
  0 siblings, 1 reply; 32+ messages in thread
From: Ilya Maximets @ 2018-10-03  8:32 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Michael S. Tsirkin

On 03.10.2018 11:02, Maxime Coquelin wrote:
> 
> 
> On 10/03/2018 09:57 AM, Ilya Maximets wrote:
>> On 03.10.2018 10:50, Maxime Coquelin wrote:
>>>
>>>
>>> On 10/02/2018 04:15 PM, Ilya Maximets wrote:
>>>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>>>> Return of message handling has now changed to an enum that can
>>>>> take non-negative value that is not zero in case a reply is
>>>>> needed. But the code checking the variable afterwards has not
>>>>> been updated, leading to success messages handling being
>>>>> treated as errors.
>>>>>
>>>>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>>>>
>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>> ---
>>>>>    lib/librte_vhost/vhost_user.c | 6 +++---
>>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>>> index 7ef3fb4a4..060b41893 100644
>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>        }
>>>>>      skip_to_post_handle:
>>>>> -    if (!ret && dev->extern_ops.post_msg_handle) {
>>>>> +    if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>>>>            uint32_t need_reply;
>>>>>              ret = (*dev->extern_ops.post_msg_handle)(
>>>>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>            vhost_user_unlock_all_queue_pairs(dev);
>>>>>          if (msg.flags & VHOST_USER_NEED_REPLY) {
>>>>
>>>> Maybe we need to reply here only if we didn't reply
>>>> already (not VH_RESULT_REPLY) ? Otherwise, we could
>>>> reply twice (with payload and with return code).
>>>
>>> Well, if the master sets this bit, it means it is waiting for
>>> a "reply-ack", so not sending it would cause the master to wait
>>> forever.
>>>
>>> It is the master responsibility to not set this bit for requests
>>> already expecting a non "reply-ack" reply (as you fixed it for
>>> postcopy's set mem table case).
>>
>> vhost-user docs in QEMU says:
>> "
>> For the message types that already solicit a reply from the client, the
>> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
>> no behavioural change.
>> "
>> i.e. even if QEMU sets the need_reply flag, vhost should not reply twice.
>> Am I missing something?
> 
> Oh, right. Thanks for pointing it out.
> 
> So coming back to the DPDK implementation, I just had a look again, and it seems that we don't send a reply twice, as send_vhost_reply takes
> care of clearing the VHOST_USER_NEED_REPLY flag.
> Do you confirm my understanding is correct?

Hmm. Yes, you're right. send_vhost_reply clears the VHOST_USER_NEED_REPLY
flag and vhost doesn't send replies twice.
Maybe some comment with clarifications needed here, or some more
refactoring to make this aspect more clear.

So, we have a situation where only one of the message processing stages
is able to reply even if it's not the last stage for the message:
1. extern_ops.pre_msg_handle
2. "master"
3. extern_ops.post_msg_handle
4. result i.e. (!!ret)

extern_ops API is a bit confusing from this point of view. It has
serious restrictions for replies which are not described anywhere.
I mean that pre and post processing stages are able to request the
reply, but the post processing reply will be just dropped
(or "master" reply will be dropped).
This is, actually, not an issue until we have only one user of them,
which uses only one of the callbacks. But maybe this API should be
described more in comments or docs.

> 
>>>
>>>>> -        msg.payload.u64 = !!ret;
>>>>> +        msg.payload.u64 = ret == VH_RESULT_ERR;
>>>>>            msg.size = sizeof(msg.payload.u64);
>>>>>            send_vhost_reply(fd, &msg);
>>>>> -    } else if (ret) {
>>>>> +    } else if (ret == VH_RESULT_ERR) {
>>>>>            RTE_LOG(ERR, VHOST_CONFIG,
>>>>>                "vhost message handling failed.\n");
>>>>>            return -1;
>>>>>
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-03  8:32           ` Ilya Maximets
@ 2018-10-03  9:07             ` Ilya Maximets
  2018-10-03 14:39               ` Maxime Coquelin
  0 siblings, 1 reply; 32+ messages in thread
From: Ilya Maximets @ 2018-10-03  9:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Michael S. Tsirkin

On 03.10.2018 11:32, Ilya Maximets wrote:
> On 03.10.2018 11:02, Maxime Coquelin wrote:
>>
>>
>> On 10/03/2018 09:57 AM, Ilya Maximets wrote:
>>> On 03.10.2018 10:50, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 10/02/2018 04:15 PM, Ilya Maximets wrote:
>>>>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>>>>> Return of message handling has now changed to an enum that can
>>>>>> take non-negative value that is not zero in case a reply is
>>>>>> needed. But the code checking the variable afterwards has not
>>>>>> been updated, leading to success messages handling being
>>>>>> treated as errors.
>>>>>>
>>>>>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>>>>>
>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> ---
>>>>>>    lib/librte_vhost/vhost_user.c | 6 +++---
>>>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>>>> index 7ef3fb4a4..060b41893 100644
>>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>        }
>>>>>>      skip_to_post_handle:
>>>>>> -    if (!ret && dev->extern_ops.post_msg_handle) {
>>>>>> +    if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>>>>>            uint32_t need_reply;
>>>>>>              ret = (*dev->extern_ops.post_msg_handle)(
>>>>>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>            vhost_user_unlock_all_queue_pairs(dev);
>>>>>>          if (msg.flags & VHOST_USER_NEED_REPLY) {
>>>>>
>>>>> Maybe we need to reply here only if we didn't reply
>>>>> already (not VH_RESULT_REPLY) ? Otherwise, we could
>>>>> reply twice (with payload and with return code).
>>>>
>>>> Well, if the master sets this bit, it means it is waiting for
>>>> a "reply-ack", so not sending it would cause the master to wait
>>>> forever.
>>>>
>>>> It is the master responsibility to not set this bit for requests
>>>> already expecting a non "reply-ack" reply (as you fixed it for
>>>> postcopy's set mem table case).
>>>
>>> vhost-user docs in QEMU says:
>>> "
>>> For the message types that already solicit a reply from the client, the
>>> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
>>> no behavioural change.
>>> "
>>> i.e. even if QEMU sets the need_reply flag, vhost should not reply twice.
>>> Am I missing something?
>>
>> Oh, right. Thanks for pointing it out.
>>
>> So coming back to the DPDK implementation, I just had a look again, and it seems that we don't send a reply twice, as send_vhost_reply takes
>> care of clearing the VHOST_USER_NEED_REPLY flag.
>> Do you confirm my understanding is correct?
> 
> Hmm. Yes, you're right. send_vhost_reply clears the VHOST_USER_NEED_REPLY
> flag and vhost doesn't send replies twice.
> Maybe some comment with clarifications needed here, or some more
> refactoring to make this aspect more clear.
> 

Sorry. Please, ignore the text below.

> So, we have a situation where only one of the message processing stages
> is able to reply even if it's not the last stage for the message:
> 1. extern_ops.pre_msg_handle
> 2. "master"
> 3. extern_ops.post_msg_handle
> 4. result i.e. (!!ret)
> 
> extern_ops API is a bit confusing from this point of view. It has
> serious restrictions for replies which are not described anywhere.
> I mean that pre and post processing stages are able to request the
> reply, but the post processing reply will be just dropped
> (or "master" reply will be dropped).
> This is, actually, not an issue until we have only one user of them,
> which uses only one of the callbacks. But maybe this API should be
> described more in comments or docs.
> 
>>
>>>>
>>>>>> -        msg.payload.u64 = !!ret;
>>>>>> +        msg.payload.u64 = ret == VH_RESULT_ERR;
>>>>>>            msg.size = sizeof(msg.payload.u64);
>>>>>>            send_vhost_reply(fd, &msg);
>>>>>> -    } else if (ret) {
>>>>>> +    } else if (ret == VH_RESULT_ERR) {
>>>>>>            RTE_LOG(ERR, VHOST_CONFIG,
>>>>>>                "vhost message handling failed.\n");
>>>>>>            return -1;
>>>>>>
>>>>
>>>>
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/17] vhost: fix return code of messages requiring replies
  2018-10-02  9:36 ` [PATCH v2 02/17] vhost: fix return code of messages requiring replies Maxime Coquelin
@ 2018-10-03 13:26   ` Ilya Maximets
  2018-10-03 14:49     ` Maxime Coquelin
  0 siblings, 1 reply; 32+ messages in thread
From: Ilya Maximets @ 2018-10-03 13:26 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable

On 02.10.2018 12:36, Maxime Coquelin wrote:
> VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
> and VHOST_USER_SET_LOG_BASE require replies, so their handlers
> should return VH_RESULT_REPLY, not VH_RESULT_OK.
> 
> Fixes: 2cfbbb86c62a ("vhost: unify message handling function signature")
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost_user.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 060b41893..ce0ac0098 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1161,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
>  
>  	msg->size = sizeof(msg->payload.state);
>  
> -	return VH_RESULT_OK;
> +	return VH_RESULT_REPLY;
>  }
>  
>  /*
> @@ -1218,7 +1218,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
>  	msg->payload.u64 = protocol_features;
>  	msg->size = sizeof(msg->payload.u64);
>  
> -	return VH_RESULT_OK;
> +	return VH_RESULT_REPLY;
>  }
>  
>  static int
> @@ -1298,7 +1298,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
>  
>  	msg->size = sizeof(msg->payload.u64);

Maybe we need to set size to zero? This message should not have
"Slave payload" according to docs and QEMU does not check it.

>  
> -	return VH_RESULT_OK;
> +	return VH_RESULT_REPLY;
>  }
>  
>  static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-03  9:07             ` Ilya Maximets
@ 2018-10-03 14:39               ` Maxime Coquelin
  2018-10-04  5:42                 ` Ilya Maximets
  0 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-03 14:39 UTC (permalink / raw)
  To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Michael S. Tsirkin



On 10/03/2018 11:07 AM, Ilya Maximets wrote:
> On 03.10.2018 11:32, Ilya Maximets wrote:
>> On 03.10.2018 11:02, Maxime Coquelin wrote:
>>>
>>>
>>> On 10/03/2018 09:57 AM, Ilya Maximets wrote:
>>>> On 03.10.2018 10:50, Maxime Coquelin wrote:
>>>>>
>>>>>
>>>>> On 10/02/2018 04:15 PM, Ilya Maximets wrote:
>>>>>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>>>>>> Return of message handling has now changed to an enum that can
>>>>>>> take non-negative value that is not zero in case a reply is
>>>>>>> needed. But the code checking the variable afterwards has not
>>>>>>> been updated, leading to success messages handling being
>>>>>>> treated as errors.
>>>>>>>
>>>>>>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>>>>>>
>>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>> ---
>>>>>>>     lib/librte_vhost/vhost_user.c | 6 +++---
>>>>>>>     1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>>>>> index 7ef3fb4a4..060b41893 100644
>>>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>>>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>>         }
>>>>>>>       skip_to_post_handle:
>>>>>>> -    if (!ret && dev->extern_ops.post_msg_handle) {
>>>>>>> +    if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>>>>>>             uint32_t need_reply;
>>>>>>>               ret = (*dev->extern_ops.post_msg_handle)(
>>>>>>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>>             vhost_user_unlock_all_queue_pairs(dev);
>>>>>>>           if (msg.flags & VHOST_USER_NEED_REPLY) {
>>>>>>
>>>>>> Maybe we need to reply here only if we didn't reply
>>>>>> already (not VH_RESULT_REPLY) ? Otherwise, we could
>>>>>> reply twice (with payload and with return code).
>>>>>
>>>>> Well, if the master sets this bit, it means it is waiting for
>>>>> a "reply-ack", so not sending it would cause the master to wait
>>>>> forever.
>>>>>
>>>>> It is the master responsibility to not set this bit for requests
>>>>> already expecting a non "reply-ack" reply (as you fixed it for
>>>>> postcopy's set mem table case).
>>>>
>>>> vhost-user docs in QEMU says:
>>>> "
>>>> For the message types that already solicit a reply from the client, the
>>>> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
>>>> no behavioural change.
>>>> "
>>>> i.e. even if QEMU sets the need_reply flag, vhost should not reply twice.
>>>> Am I missing something?
>>>
>>> Oh, right. Thanks for pointing it out.
>>>
>>> So coming back to the DPDK implementation, I just had a look again, and it seems that we don't send a reply twice, as send_vhost_reply takes
>>> care of clearing the VHOST_USER_NEED_REPLY flag.
>>> Do you confirm my understanding is correct?
>>
>> Hmm. Yes, you're right. send_vhost_reply clears the VHOST_USER_NEED_REPLY
>> flag and vhost doesn't send replies twice.
>> Maybe some comment with clarifications needed here, or some more
>> refactoring to make this aspect more clear.
>>

Agree.
I'm adding a comment, I don't think a refactoring is required, and I
would be reluctant to add one more refactoring so close to the
integration deadline.

Does it work for you?

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/17] vhost: fix return code of messages requiring replies
  2018-10-03 13:26   ` Ilya Maximets
@ 2018-10-03 14:49     ` Maxime Coquelin
  2018-10-04  5:49       ` Ilya Maximets
  0 siblings, 1 reply; 32+ messages in thread
From: Maxime Coquelin @ 2018-10-03 14:49 UTC (permalink / raw)
  To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable



On 10/03/2018 03:26 PM, Ilya Maximets wrote:
> On 02.10.2018 12:36, Maxime Coquelin wrote:
>> VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
>> and VHOST_USER_SET_LOG_BASE require replies, so their handlers
>> should return VH_RESULT_REPLY, not VH_RESULT_OK.
>>
>> Fixes: 2cfbbb86c62a ("vhost: unify message handling function signature")
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/vhost_user.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index 060b41893..ce0ac0098 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -1161,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
>>   
>>   	msg->size = sizeof(msg->payload.state);
>>   
>> -	return VH_RESULT_OK;
>> +	return VH_RESULT_REPLY;
>>   }
>>   
>>   /*
>> @@ -1218,7 +1218,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
>>   	msg->payload.u64 = protocol_features;
>>   	msg->size = sizeof(msg->payload.u64);
>>   
>> -	return VH_RESULT_OK;
>> +	return VH_RESULT_REPLY;
>>   }
>>   
>>   static int
>> @@ -1298,7 +1298,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
>>   
>>   	msg->size = sizeof(msg->payload.u64);
> 
> Maybe we need to set size to zero? This message should not have
> "Slave payload" according to docs and QEMU does not check it.

I agree with the fix, but it's here since the beginning, not related to
this series. I will add it at the beginning of the series though.

I think the spec could also be updated, to clarify what payload is
expected when VHOST_USER_PROTOCOL_F_LOG_SHMFD is negotiated.

> 
>>   
>> -	return VH_RESULT_OK;
>> +	return VH_RESULT_REPLY;
>>   }
>>   
>>   static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/17] vhost: fix messages error checks
  2018-10-03 14:39               ` Maxime Coquelin
@ 2018-10-04  5:42                 ` Ilya Maximets
  0 siblings, 0 replies; 32+ messages in thread
From: Ilya Maximets @ 2018-10-04  5:42 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable, Michael S. Tsirkin

On 03.10.2018 17:39, Maxime Coquelin wrote:
> 
> 
> On 10/03/2018 11:07 AM, Ilya Maximets wrote:
>> On 03.10.2018 11:32, Ilya Maximets wrote:
>>> On 03.10.2018 11:02, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 10/03/2018 09:57 AM, Ilya Maximets wrote:
>>>>> On 03.10.2018 10:50, Maxime Coquelin wrote:
>>>>>>
>>>>>>
>>>>>> On 10/02/2018 04:15 PM, Ilya Maximets wrote:
>>>>>>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>>>>>>> Return of message handling has now changed to an enum that can
>>>>>>>> take non-negative value that is not zero in case a reply is
>>>>>>>> needed. But the code checking the variable afterwards has not
>>>>>>>> been updated, leading to success messages handling being
>>>>>>>> treated as errors.
>>>>>>>>
>>>>>>>> Fixes: 4e601952cae6 ("vhost: message handling implemented as a callback array")
>>>>>>>>
>>>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>> ---
>>>>>>>>     lib/librte_vhost/vhost_user.c | 6 +++---
>>>>>>>>     1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>>>>>> index 7ef3fb4a4..060b41893 100644
>>>>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>>>>> @@ -1783,7 +1783,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>>>         }
>>>>>>>>       skip_to_post_handle:
>>>>>>>> -    if (!ret && dev->extern_ops.post_msg_handle) {
>>>>>>>> +    if (ret != VH_RESULT_ERR && dev->extern_ops.post_msg_handle) {
>>>>>>>>             uint32_t need_reply;
>>>>>>>>               ret = (*dev->extern_ops.post_msg_handle)(
>>>>>>>> @@ -1800,10 +1800,10 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>>>             vhost_user_unlock_all_queue_pairs(dev);
>>>>>>>>           if (msg.flags & VHOST_USER_NEED_REPLY) {
>>>>>>>
>>>>>>> Maybe we need to reply here only if we didn't reply
>>>>>>> already (not VH_RESULT_REPLY) ? Otherwise, we could
>>>>>>> reply twice (with payload and with return code).
>>>>>>
>>>>>> Well, if the master sets this bit, it means it is waiting for
>>>>>> a "reply-ack", so not sending it would cause the master to wait
>>>>>> forever.
>>>>>>
>>>>>> It is the master responsibility to not set this bit for requests
>>>>>> already expecting a non "reply-ack" reply (as you fixed it for
>>>>>> postcopy's set mem table case).
>>>>>
>>>>> vhost-user docs in QEMU says:
>>>>> "
>>>>> For the message types that already solicit a reply from the client, the
>>>>> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
>>>>> no behavioural change.
>>>>> "
>>>>> i.e. even if QEMU sets the need_reply flag, vhost should not reply twice.
>>>>> Am I missing something?
>>>>
>>>> Oh, right. Thanks for pointing it out.
>>>>
>>>> So coming back to the DPDK implementation, I just had a look again, and it seems that we don't send a reply twice, as send_vhost_reply takes
>>>> care of clearing the VHOST_USER_NEED_REPLY flag.
>>>> Do you confirm my understanding is correct?
>>>
>>> Hmm. Yes, you're right. send_vhost_reply clears the VHOST_USER_NEED_REPLY
>>> flag and vhost doesn't send replies twice.
>>> Maybe some comment with clarifications needed here, or some more
>>> refactoring to make this aspect more clear.
>>>
> 
> Agree.
> I'm adding a comment, I don't think a refactoring is required, and I
> would be reluctant to add one more refactoring so close to the
> integration deadline.
> 
> Does it work for you?

Sure. Thanks. I agree that it's not the right time for refactoring now.

> 
> Thanks,
> Maxime
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/17] vhost: fix return code of messages requiring replies
  2018-10-03 14:49     ` Maxime Coquelin
@ 2018-10-04  5:49       ` Ilya Maximets
  0 siblings, 0 replies; 32+ messages in thread
From: Ilya Maximets @ 2018-10-04  5:49 UTC (permalink / raw)
  To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann,
	nicknickolaev, bruce.richardson, alejandro.lucero
  Cc: dgilbert, stable

On 03.10.2018 17:49, Maxime Coquelin wrote:
> 
> 
> On 10/03/2018 03:26 PM, Ilya Maximets wrote:
>> On 02.10.2018 12:36, Maxime Coquelin wrote:
>>> VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
>>> and VHOST_USER_SET_LOG_BASE require replies, so their handlers
>>> should return VH_RESULT_REPLY, not VH_RESULT_OK.
>>>
>>> Fixes: 2cfbbb86c62a ("vhost: unify message handling function signature")
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>   lib/librte_vhost/vhost_user.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>> index 060b41893..ce0ac0098 100644
>>> --- a/lib/librte_vhost/vhost_user.c
>>> +++ b/lib/librte_vhost/vhost_user.c
>>> @@ -1161,7 +1161,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
>>>         msg->size = sizeof(msg->payload.state);
>>>   -    return VH_RESULT_OK;
>>> +    return VH_RESULT_REPLY;
>>>   }
>>>     /*
>>> @@ -1218,7 +1218,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
>>>       msg->payload.u64 = protocol_features;
>>>       msg->size = sizeof(msg->payload.u64);
>>>   -    return VH_RESULT_OK;
>>> +    return VH_RESULT_REPLY;
>>>   }
>>>     static int
>>> @@ -1298,7 +1298,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
>>>         msg->size = sizeof(msg->payload.u64);
>>
>> Maybe we need to set size to zero? This message should not have
>> "Slave payload" according to docs and QEMU does not check it.
> 
> I agree with the fix, but it's here since the beginning, not related to
> this series. I will add it at the beginning of the series though.

OK.

> 
> I think the spec could also be updated, to clarify what payload is
> expected when VHOST_USER_PROTOCOL_F_LOG_SHMFD is negotiated.

Yeah. There was an attempt few years ago, but it wasn't merged because of
code changes, I guess:
    https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg03526.html

> 
>>
>>>   -    return VH_RESULT_OK;
>>> +    return VH_RESULT_REPLY;
>>>   }
>>>     static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
>>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-10-04  5:46 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02  9:36 [PATCH v2 00/17] vhost: add postcopy live-migration support Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 01/17] vhost: fix messages error checks Maxime Coquelin
2018-10-02 14:15   ` Ilya Maximets
2018-10-03  7:50     ` Maxime Coquelin
2018-10-03  7:57       ` Ilya Maximets
2018-10-03  8:02         ` Maxime Coquelin
2018-10-03  8:32           ` Ilya Maximets
2018-10-03  9:07             ` Ilya Maximets
2018-10-03 14:39               ` Maxime Coquelin
2018-10-04  5:42                 ` Ilya Maximets
2018-10-02  9:36 ` [PATCH v2 02/17] vhost: fix return code of messages requiring replies Maxime Coquelin
2018-10-03 13:26   ` Ilya Maximets
2018-10-03 14:49     ` Maxime Coquelin
2018-10-04  5:49       ` Ilya Maximets
2018-10-02  9:36 ` [PATCH v2 03/17] vhost: fix error handling when mem table gets updated Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 04/17] vhost: define postcopy protocol flag Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 05/17] vhost: add number of fds to vhost-user messages and use it Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 06/17] vhost: pass socket fd to message handling callbacks Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 07/17] vhost: enable fds passing when sending vhost-user messages Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 08/17] vhost: add config flag for postcopy feature Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 09/17] vhost: introduce postcopy's advise message Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 10/17] vhost: add support for postcopy's listen message Maxime Coquelin
2018-10-02 14:11   ` Ilya Maximets
2018-10-03  7:46     ` Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 11/17] vhost: register new regions with userfaultfd Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 12/17] vhost: avoid useless VhostUserMemory copy Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 13/17] vhost: send userfault range addresses back to qemu Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 14/17] vhost: add support to postcopy's end request Maxime Coquelin
2018-10-02 14:18   ` Ilya Maximets
2018-10-02  9:36 ` [PATCH v2 15/17] vhost: enable postcopy protocol feature Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 16/17] vhost: add flag to enable postcopy live-migration Maxime Coquelin
2018-10-02  9:36 ` [PATCH v2 17/17] net/vhost: add parameter to enable postcopy support Maxime Coquelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.