[PATCH net 0/4] Fix various issue of vhost

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net 0/4] Fix various issue of vhost
@ 2018-12-10  9:44 Jason Wang
  2018-12-10  9:44 ` [PATCH net 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jason Wang @ 2018-12-10  9:44 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

Hi:

This series tries to fix various issues of vhost:

- Patch 1 adds a missing write barrier between used idx updating and
  logging.
- Patch 2-3 brings back the protection of device IOTLB through vq
  mutex, this fixes possible use after free in device IOTLB entries.
- Patch 4 fixes the diry page logging when device IOTLB is
  enabled. We should done through GPA instead of GIOVA, this was done
  through logging through iovec and traversing GPA->HPA list for the
  GPA.

Please consider them for -stable.

Thanks

Jason Wang (4):
  vhost: make sure used idx is seen before log in vhost_add_used_n()
  vhost_net: rework on the lock ordering for busy polling
  Revert "net: vhost: lock the vqs one by one"
  vhost: log dirty page correctly

 drivers/vhost/net.c   |  21 +++++++--
 drivers/vhost/vhost.c | 101 ++++++++++++++++++++++++++++++++++--------
 drivers/vhost/vhost.h |   3 +-
 3 files changed, 102 insertions(+), 23 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n()
  2018-12-10  9:44 [PATCH net 0/4] Fix various issue of vhost Jason Wang
@ 2018-12-10  9:44 ` Jason Wang
  2018-12-10  9:44 ` [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling Jason Wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-12-10  9:44 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().

Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6b98d8e3a5bf..5915f240275a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2220,6 +2220,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		return -EFAULT;
 	}
 	if (unlikely(vq->log_used)) {
+		/* Make sure used idx is seen before log. */
+		smp_wmb();
 		/* Log used index update. */
 		log_write(vq->log_base,
 			  vq->log_addr + offsetof(struct vring_used, idx),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling
  2018-12-10  9:44 [PATCH net 0/4] Fix various issue of vhost Jason Wang
  2018-12-10  9:44 ` [PATCH net 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
@ 2018-12-10  9:44 ` Jason Wang
  2018-12-11  1:34   ` Michael S. Tsirkin
  2018-12-10  9:44 ` [PATCH net 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2018-12-10  9:44 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Tonghao Zhang

When we try to do rx busy polling in tx path in commit 441abde4cd84
("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
after tx vq mutex is held. This may lead deadlock so we try to lock vq
one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
one"). With this commit, we avoid the deadlock with the assumption
that handle_rx() and handle_tx() run in a same process. But this
commit remove the protection for IOTLB updating which requires the
mutex of each vq to be held.

To solve this issue, the first step is to have a exact same lock
ordering for vhost_net. This is done through:

- For handle_rx(), if busy polling is enabled, lock tx vq immediately.
- For handle_tx(), always lock rx vq before tx vq, and unlock it if
  busy polling is not enabled.
- Remove the tricky locking codes in busy polling.

With this, we can have a exact same lock ordering for vhost_net, this
allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
vqs one by one") in next patch.

The patch will add two more atomic operations on the tx path during
each round of handle_tx(). 1 byte TCP_RR does not notice such
overhead.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ab11b2bee273..5f272ab4d5b4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	struct socket *sock;
 	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
 
-	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
 	vhost_disable_notify(&net->dev, vq);
 	sock = rvq->private_data;
 
@@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 		vhost_net_busy_poll_try_queue(net, vq);
 	else if (!poll_rx) /* On tx here, sock has no rx data. */
 		vhost_enable_notify(&net->dev, rvq);
-
-	mutex_unlock(&vq->mutex);
 }
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
@@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 static void handle_tx(struct vhost_net *net)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
 	struct vhost_virtqueue *vq = &nvq->vq;
+	struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
 	struct socket *sock;
 
+	mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
 	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
+	if (!vq->busyloop_timeout)
+		mutex_unlock(&vq_rx->mutex);
+
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
@@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
 		handle_tx_copy(net, sock);
 
 out:
+	if (vq->busyloop_timeout)
+		mutex_unlock(&vq_rx->mutex);
 	mutex_unlock(&vq->mutex);
 }
 
@@ -1060,7 +1065,9 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 static void handle_rx(struct vhost_net *net)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_RX];
+	struct vhost_net_virtqueue *nvq_tx = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
+	struct vhost_virtqueue *vq_tx = &nvq_tx->vq;
 	unsigned uninitialized_var(in), log;
 	struct vhost_log *vq_log;
 	struct msghdr msg = {
@@ -1086,6 +1093,9 @@ static void handle_rx(struct vhost_net *net)
 	int recv_pkts = 0;
 
 	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
+	if (vq->busyloop_timeout)
+		mutex_lock_nested(&vq_tx->mutex, VHOST_NET_VQ_TX);
+
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
@@ -1200,6 +1210,8 @@ static void handle_rx(struct vhost_net *net)
 out:
 	vhost_net_signal_used(nvq);
 	mutex_unlock(&vq->mutex);
+	if (vq->busyloop_timeout)
+		mutex_unlock(&vq_tx->mutex);
 }
 
 static void handle_tx_kick(struct vhost_work *work)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-10  9:44 [PATCH net 0/4] Fix various issue of vhost Jason Wang
  2018-12-10  9:44 ` [PATCH net 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
  2018-12-10  9:44 ` [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling Jason Wang
@ 2018-12-10  9:44 ` Jason Wang
  2018-12-10  9:44 ` [PATCH net 4/4] vhost: log dirty page correctly Jason Wang
  2018-12-10 19:47 ` [PATCH net 0/4] Fix various issue of vhost David Miller
  4 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-12-10  9:44 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Tonghao Zhang

This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've exact the same lock order
with the help of previous patch, it's safe to revert it without having
deadlock.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5915f240275a..55e5aa662ad5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -295,11 +295,8 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i) {
-		mutex_lock(&d->vqs[i]->mutex);
+	for (i = 0; i < d->nvqs; ++i)
 		__vhost_vq_meta_reset(d->vqs[i]);
-		mutex_unlock(&d->vqs[i]->mutex);
-	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -895,6 +892,20 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
+static void vhost_dev_lock_vqs(struct vhost_dev *d)
+{
+	int i = 0;
+	for (i = 0; i < d->nvqs; ++i)
+		mutex_lock_nested(&d->vqs[i]->mutex, i);
+}
+
+static void vhost_dev_unlock_vqs(struct vhost_dev *d)
+{
+	int i = 0;
+	for (i = 0; i < d->nvqs; ++i)
+		mutex_unlock(&d->vqs[i]->mutex);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -976,6 +987,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
+	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1009,6 +1021,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
+	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net 4/4] vhost: log dirty page correctly
  2018-12-10  9:44 [PATCH net 0/4] Fix various issue of vhost Jason Wang
                   ` (2 preceding siblings ...)
  2018-12-10  9:44 ` [PATCH net 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
@ 2018-12-10  9:44 ` Jason Wang
  2018-12-10 15:14   ` kbuild test robot
  2018-12-19 17:29   ` kbuild test robot
  2018-12-10 19:47 ` [PATCH net 0/4] Fix various issue of vhost David Miller
  4 siblings, 2 replies; 15+ messages in thread
From: Jason Wang @ 2018-12-10  9:44 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Jintack Lim

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |  3 +-
 drivers/vhost/vhost.c | 78 +++++++++++++++++++++++++++++++++++--------
 drivers/vhost/vhost.h |  3 +-
 3 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5f272ab4d5b4..754ca22efb43 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1196,7 +1196,8 @@ static void handle_rx(struct vhost_net *net)
 		if (nvq->done_idx > VHOST_NET_BATCH)
 			vhost_net_signal_used(nvq);
 		if (unlikely(vq_log))
-			vhost_log_write(vq, vq_log, log, vhost_len);
+			vhost_log_write(vq, vq_log, log, vhost_len,
+					vq->iov, in);
 		total_len += vhost_len;
 		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
 			vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 55e5aa662ad5..8ab279720a2b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1733,11 +1733,66 @@ static int log_write(void __user *log_base,
 	return r;
 }
 
+static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
+{
+	struct vhost_umem *umem = vq->umem;
+	struct vhost_umem_node *u;
+	u64 gpa;
+	int r;
+	bool hit = false;
+
+	list_for_each_entry(u, &umem->umem_list, link) {
+		if (u->userspace_addr < hva &&
+		    u->userspace_addr + u->size >=
+		    hva + len) {
+			gpa = u->start + hva - u->userspace_addr;
+			r = log_write(vq->log_base, gpa, len);
+			if (r < 0)
+				return r;
+			hit = true;
+		}
+	}
+
+	/* No reverse mapping, should be a bug */
+	WARN_ON(!hit);
+	return 0;
+}
+
+static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
+{
+	struct iovec iov[64];
+	int i, ret;
+
+	if (!vq->iotlb) {
+		log_write(vq->log_base, vq->log_addr + used_offset, len);
+		return;
+	}
+
+	ret = translate_desc(vq, (u64)vq->used + used_offset, len, iov, 64,
+			     VHOST_ACCESS_WO);
+	WARN_ON(ret < 0);
+
+	for (i = 0; i < ret; i++) {
+		ret = log_write_hva(vq, (u64)iov[i].iov_base, iov[i].iov_len);
+		WARN_ON(ret);
+	}
+}
+
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-		    unsigned int log_num, u64 len)
+		    unsigned int log_num, u64 len, struct iovec *iov, int count)
 {
 	int i, r;
 
+	if (vq->iotlb) {
+		for (i = 0; i < count; i++) {
+			r = log_write_hva(vq, (u64)iov[i].iov_base,
+					  iov[i].iov_len);
+			if (r < 0)
+				return r;
+		}
+		return 0;
+	}
+
 	/* Make sure data written is seen before log. */
 	smp_wmb();
 	for (i = 0; i < log_num; ++i) {
@@ -1769,9 +1824,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
 		smp_wmb();
 		/* Log used flag write. */
 		used = &vq->used->flags;
-		log_write(vq->log_base, vq->log_addr +
-			  (used - (void __user *)vq->used),
-			  sizeof vq->used->flags);
+		log_used(vq, (used - (void __user *)vq->used),
+			 sizeof vq->used->flags);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -1789,9 +1843,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
 		smp_wmb();
 		/* Log avail event write */
 		used = vhost_avail_event(vq);
-		log_write(vq->log_base, vq->log_addr +
-			  (used - (void __user *)vq->used),
-			  sizeof *vhost_avail_event(vq));
+		log_used(vq, (used - (void __user *)vq->used),
+			 sizeof *vhost_avail_event(vq));
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -2191,10 +2244,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 		/* Make sure data is seen before log. */
 		smp_wmb();
 		/* Log used ring entry write. */
-		log_write(vq->log_base,
-			  vq->log_addr +
-			   ((void __user *)used - (void __user *)vq->used),
-			  count * sizeof *used);
+		log_used(vq, ((void __user *)used - (void __user *)vq->used),
+			 count * sizeof *used);
 	}
 	old = vq->last_used_idx;
 	new = (vq->last_used_idx += count);
@@ -2236,9 +2287,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		/* Make sure used idx is seen before log. */
 		smp_wmb();
 		/* Log used index update. */
-		log_write(vq->log_base,
-			  vq->log_addr + offsetof(struct vring_used, idx),
-			  sizeof vq->used->idx);
+		log_used(vq, offsetof(struct vring_used, idx),
+			 sizeof vq->used->idx);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 466ef7542291..1b675dad5e05 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-		    unsigned int log_num, u64 len);
+		    unsigned int log_num, u64 len,
+		    struct iovec *iov, int count);
 int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
 
 struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net 4/4] vhost: log dirty page correctly
  2018-12-10  9:44 ` [PATCH net 4/4] vhost: log dirty page correctly Jason Wang
@ 2018-12-10 15:14   ` kbuild test robot
  2018-12-11  1:30     ` Michael S. Tsirkin
  2018-12-19 17:29   ` kbuild test robot
  1 sibling, 1 reply; 15+ messages in thread
From: kbuild test robot @ 2018-12-10 15:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, kvm, virtualization, netdev,
	linux-kernel, Jintack Lim

[-- Attachment #1: Type: text/plain, Size: 8376 bytes --]

Hi Jason,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/Fix-various-issue-of-vhost/20181210-223236
config: i386-randconfig-x072-201849 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers//vhost/vhost.c: In function 'log_used':
>> drivers//vhost/vhost.c:1771:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     ret = translate_desc(vq, (u64)vq->used + used_offset, len, iov, 64,
                              ^
   drivers//vhost/vhost.c:1776:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      ret = log_write_hva(vq, (u64)iov[i].iov_base, iov[i].iov_len);
                              ^
   drivers//vhost/vhost.c: In function 'vhost_log_write':
   drivers//vhost/vhost.c:1788:26: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       r = log_write_hva(vq, (u64)iov[i].iov_base,
                             ^
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 1 arch/x86/include/asm/barrier.h:array_index_mask_nospec
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
   Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:clear_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:test_and_set_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u32
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 1 include/linux/list.h:__list_del
   Cyclomatic Complexity 1 include/linux/list.h:list_empty
   Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
   Cyclomatic Complexity 3 include/linux/string.h:memset
   Cyclomatic Complexity 5 include/linux/string.h:memcpy
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_dec_and_test
   Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_dec_and_test
   Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
   Cyclomatic Complexity 1 include/linux/thread_info.h:set_ti_thread_flag
   Cyclomatic Complexity 1 include/linux/thread_info.h:check_object_size
   Cyclomatic Complexity 5 include/linux/thread_info.h:check_copy_size
   Cyclomatic Complexity 1 arch/x86/include/asm/preempt.h:preempt_count
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock
   Cyclomatic Complexity 1 include/linux/wait.h:init_waitqueue_func_entry
   Cyclomatic Complexity 1 include/linux/llist.h:init_llist_head
   Cyclomatic Complexity 1 include/linux/llist.h:llist_empty
   Cyclomatic Complexity 1 include/linux/llist.h:llist_del_all
   Cyclomatic Complexity 1 include/linux/rbtree.h:rb_link_node
   Cyclomatic Complexity 3 include/linux/overflow.h:__ab_c_size
   Cyclomatic Complexity 1 include/linux/page_ref.h:page_ref_dec_and_test
   Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info
   Cyclomatic Complexity 1 include/linux/sched.h:need_resched
   Cyclomatic Complexity 1 include/linux/mm.h:put_page_testzero
   Cyclomatic Complexity 1 include/linux/mm.h:put_devmap_managed_page
   Cyclomatic Complexity 1 include/uapi/linux/virtio_ring.h:vring_need_event
   Cyclomatic Complexity 1 include/linux/virtio_byteorder.h:virtio_legacy_is_little_endian
   Cyclomatic Complexity 2 include/linux/uio.h:copy_to_iter
   Cyclomatic Complexity 2 include/linux/uio.h:copy_from_iter
   Cyclomatic Complexity 2 include/linux/uio.h:copy_from_iter_full
   Cyclomatic Complexity 1 include/linux/uio.h:iov_iter_count
   Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_type
   Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
   Cyclomatic Complexity 67 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 4 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 1 arch/x86/include/asm/smap.h:clac
   Cyclomatic Complexity 1 arch/x86/include/asm/smap.h:stac
   Cyclomatic Complexity 1 arch/x86/include/asm/uaccess.h:set_fs
   Cyclomatic Complexity 1 arch/x86/include/asm/uaccess_32.h:raw_copy_to_user
   Cyclomatic Complexity 5 arch/x86/include/asm/uaccess_32.h:raw_copy_from_user
   Cyclomatic Complexity 1 include/linux/uaccess.h:__copy_from_user
   Cyclomatic Complexity 1 include/linux/uaccess.h:__copy_to_user
   Cyclomatic Complexity 2 include/linux/uaccess.h:copy_from_user
   Cyclomatic Complexity 2 include/linux/uaccess.h:copy_to_user
   Cyclomatic Complexity 4 include/linux/poll.h:poll_wait
   Cyclomatic Complexity 1 include/linux/poll.h:init_poll_funcptr
   Cyclomatic Complexity 1 include/linux/rbtree_augmented.h:rb_set_parent
   Cyclomatic Complexity 1 include/linux/rbtree_augmented.h:rb_set_parent_color
   Cyclomatic Complexity 3 include/linux/rbtree_augmented.h:__rb_change_child
   Cyclomatic Complexity 11 include/linux/rbtree_augmented.h:__rb_erase_augmented
   Cyclomatic Complexity 2 include/linux/rbtree_augmented.h:rb_erase_augmented_cached
   Cyclomatic Complexity 1 drivers//vhost/vhost.h:vhost_has_feature
   Cyclomatic Complexity 1 drivers//vhost/vhost.h:vhost_backend_has_feature
   Cyclomatic Complexity 1 drivers//vhost/vhost.h:vhost_is_little_endian
   Cyclomatic Complexity 5 drivers//vhost/vhost.c:vhost_umem_interval_tree_compute_subtree_last
   Cyclomatic Complexity 3 drivers//vhost/vhost.c:vhost_umem_interval_tree_augment_propagate
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_umem_interval_tree_augment_copy
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_umem_interval_tree_augment_rotate
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_umem_interval_tree_remove
   Cyclomatic Complexity 7 drivers//vhost/vhost.c:vhost_umem_interval_tree_subtree_search
   Cyclomatic Complexity 4 drivers//vhost/vhost.c:vhost_umem_interval_tree_iter_first
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_disable_cross_endian
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_enable_cross_endian_big
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_enable_cross_endian_little
   Cyclomatic Complexity 5 drivers//vhost/vhost.c:vhost_set_vring_endian
   Cyclomatic Complexity 2 drivers//vhost/vhost.c:vhost_get_vring_endian
   Cyclomatic Complexity 3 drivers//vhost/vhost.c:vhost_init_is_le
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_reset_is_le
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_work_init
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_poll_init
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_has_work
   Cyclomatic Complexity 2 drivers//vhost/vhost.c:__vhost_vq_meta_reset
   Cyclomatic Complexity 2 drivers//vhost/vhost.c:vhost_vq_meta_reset
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_vq_reset
   Cyclomatic Complexity 2 drivers//vhost/vhost.c:vhost_dev_check_owner
   Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_dev_has_owner

vim +1771 drivers//vhost/vhost.c

  1760	
  1761	static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
  1762	{
  1763		struct iovec iov[64];
  1764		int i, ret;
  1765	
  1766		if (!vq->iotlb) {
  1767			log_write(vq->log_base, vq->log_addr + used_offset, len);
  1768			return;
  1769		}
  1770	
> 1771		ret = translate_desc(vq, (u64)vq->used + used_offset, len, iov, 64,
  1772				     VHOST_ACCESS_WO);
  1773		WARN_ON(ret < 0);
  1774	
  1775		for (i = 0; i < ret; i++) {
  1776			ret = log_write_hva(vq, (u64)iov[i].iov_base, iov[i].iov_len);
  1777			WARN_ON(ret);
  1778		}
  1779	}
  1780	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23759 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 0/4] Fix various issue of vhost
  2018-12-10  9:44 [PATCH net 0/4] Fix various issue of vhost Jason Wang
                   ` (3 preceding siblings ...)
  2018-12-10  9:44 ` [PATCH net 4/4] vhost: log dirty page correctly Jason Wang
@ 2018-12-10 19:47 ` David Miller
  2018-12-11  3:01   ` Jason Wang
  4 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2018-12-10 19:47 UTC (permalink / raw)
  To: jasowang; +Cc: mst, kvm, virtualization, netdev, linux-kernel

From: Jason Wang <jasowang@redhat.com>
Date: Mon, 10 Dec 2018 17:44:50 +0800

> This series tries to fix various issues of vhost:
> 
> - Patch 1 adds a missing write barrier between used idx updating and
>   logging.
> - Patch 2-3 brings back the protection of device IOTLB through vq
>   mutex, this fixes possible use after free in device IOTLB entries.
> - Patch 4 fixes the diry page logging when device IOTLB is
>   enabled. We should done through GPA instead of GIOVA, this was done
>   through logging through iovec and traversing GPA->HPA list for the
>   GPA.
> 
> Please consider them for -stable.

Looks like the kbuild robot found some problems.

->used is a pointer (which might be 32-bit) and you're casting it to
a u64 in the translate_desc() calls of patch #4.

Please make sure that you don't actually require the full domain of
a u64 in these values, as obviously if vq->used is a pointer you will
only get a 32-bit domain on 32-bit architectures.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 4/4] vhost: log dirty page correctly
  2018-12-10 15:14   ` kbuild test robot
@ 2018-12-11  1:30     ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2018-12-11  1:30 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Jason Wang, kbuild-all, kvm, virtualization, netdev,
	linux-kernel, Jintack Lim

On Mon, Dec 10, 2018 at 11:14:41PM +0800, kbuild test robot wrote:
> Hi Jason,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on net/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Jason-Wang/Fix-various-issue-of-vhost/20181210-223236
> config: i386-randconfig-x072-201849 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 
> All warnings (new ones prefixed by >>):
> 
>    drivers//vhost/vhost.c: In function 'log_used':
> >> drivers//vhost/vhost.c:1771:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
>      ret = translate_desc(vq, (u64)vq->used + used_offset, len, iov, 64,
>                               ^
>    drivers//vhost/vhost.c:1776:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
>       ret = log_write_hva(vq, (u64)iov[i].iov_base, iov[i].iov_len);
>                               ^
>    drivers//vhost/vhost.c: In function 'vhost_log_write':
>    drivers//vhost/vhost.c:1788:26: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
>        r = log_write_hva(vq, (u64)iov[i].iov_base,
>                              ^

It's a technicality, cast to unsigned long and the warning will go away.
Donnu why does gcc bother with these warnings. Nothing is wrong
unless size of pointer is > size of int.

>    Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
>    Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
>    Cyclomatic Complexity 1 arch/x86/include/asm/barrier.h:array_index_mask_nospec
>    Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
>    Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
>    Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
>    Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:clear_bit
>    Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:test_and_set_bit
>    Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
>    Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
>    Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
>    Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u32
>    Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
>    Cyclomatic Complexity 1 include/linux/list.h:__list_del
>    Cyclomatic Complexity 1 include/linux/list.h:list_empty
>    Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
>    Cyclomatic Complexity 3 include/linux/string.h:memset
>    Cyclomatic Complexity 5 include/linux/string.h:memcpy
>    Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
>    Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_dec_and_test
>    Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_dec_and_test
>    Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
>    Cyclomatic Complexity 1 include/linux/thread_info.h:set_ti_thread_flag
>    Cyclomatic Complexity 1 include/linux/thread_info.h:check_object_size
>    Cyclomatic Complexity 5 include/linux/thread_info.h:check_copy_size
>    Cyclomatic Complexity 1 arch/x86/include/asm/preempt.h:preempt_count
>    Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
>    Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock
>    Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock
>    Cyclomatic Complexity 1 include/linux/wait.h:init_waitqueue_func_entry
>    Cyclomatic Complexity 1 include/linux/llist.h:init_llist_head
>    Cyclomatic Complexity 1 include/linux/llist.h:llist_empty
>    Cyclomatic Complexity 1 include/linux/llist.h:llist_del_all
>    Cyclomatic Complexity 1 include/linux/rbtree.h:rb_link_node
>    Cyclomatic Complexity 3 include/linux/overflow.h:__ab_c_size
>    Cyclomatic Complexity 1 include/linux/page_ref.h:page_ref_dec_and_test
>    Cyclomatic Complexity 1 include/linux/sched.h:task_thread_info
>    Cyclomatic Complexity 1 include/linux/sched.h:need_resched
>    Cyclomatic Complexity 1 include/linux/mm.h:put_page_testzero
>    Cyclomatic Complexity 1 include/linux/mm.h:put_devmap_managed_page
>    Cyclomatic Complexity 1 include/uapi/linux/virtio_ring.h:vring_need_event
>    Cyclomatic Complexity 1 include/linux/virtio_byteorder.h:virtio_legacy_is_little_endian
>    Cyclomatic Complexity 2 include/linux/uio.h:copy_to_iter
>    Cyclomatic Complexity 2 include/linux/uio.h:copy_from_iter
>    Cyclomatic Complexity 2 include/linux/uio.h:copy_from_iter_full
>    Cyclomatic Complexity 1 include/linux/uio.h:iov_iter_count
>    Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_type
>    Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
>    Cyclomatic Complexity 67 include/linux/slab.h:kmalloc_large
>    Cyclomatic Complexity 4 include/linux/slab.h:kmalloc
>    Cyclomatic Complexity 1 arch/x86/include/asm/smap.h:clac
>    Cyclomatic Complexity 1 arch/x86/include/asm/smap.h:stac
>    Cyclomatic Complexity 1 arch/x86/include/asm/uaccess.h:set_fs
>    Cyclomatic Complexity 1 arch/x86/include/asm/uaccess_32.h:raw_copy_to_user
>    Cyclomatic Complexity 5 arch/x86/include/asm/uaccess_32.h:raw_copy_from_user
>    Cyclomatic Complexity 1 include/linux/uaccess.h:__copy_from_user
>    Cyclomatic Complexity 1 include/linux/uaccess.h:__copy_to_user
>    Cyclomatic Complexity 2 include/linux/uaccess.h:copy_from_user
>    Cyclomatic Complexity 2 include/linux/uaccess.h:copy_to_user
>    Cyclomatic Complexity 4 include/linux/poll.h:poll_wait
>    Cyclomatic Complexity 1 include/linux/poll.h:init_poll_funcptr
>    Cyclomatic Complexity 1 include/linux/rbtree_augmented.h:rb_set_parent
>    Cyclomatic Complexity 1 include/linux/rbtree_augmented.h:rb_set_parent_color
>    Cyclomatic Complexity 3 include/linux/rbtree_augmented.h:__rb_change_child
>    Cyclomatic Complexity 11 include/linux/rbtree_augmented.h:__rb_erase_augmented
>    Cyclomatic Complexity 2 include/linux/rbtree_augmented.h:rb_erase_augmented_cached
>    Cyclomatic Complexity 1 drivers//vhost/vhost.h:vhost_has_feature
>    Cyclomatic Complexity 1 drivers//vhost/vhost.h:vhost_backend_has_feature
>    Cyclomatic Complexity 1 drivers//vhost/vhost.h:vhost_is_little_endian
>    Cyclomatic Complexity 5 drivers//vhost/vhost.c:vhost_umem_interval_tree_compute_subtree_last
>    Cyclomatic Complexity 3 drivers//vhost/vhost.c:vhost_umem_interval_tree_augment_propagate
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_umem_interval_tree_augment_copy
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_umem_interval_tree_augment_rotate
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_umem_interval_tree_remove
>    Cyclomatic Complexity 7 drivers//vhost/vhost.c:vhost_umem_interval_tree_subtree_search
>    Cyclomatic Complexity 4 drivers//vhost/vhost.c:vhost_umem_interval_tree_iter_first
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_disable_cross_endian
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_enable_cross_endian_big
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_enable_cross_endian_little
>    Cyclomatic Complexity 5 drivers//vhost/vhost.c:vhost_set_vring_endian
>    Cyclomatic Complexity 2 drivers//vhost/vhost.c:vhost_get_vring_endian
>    Cyclomatic Complexity 3 drivers//vhost/vhost.c:vhost_init_is_le
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_reset_is_le
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_work_init
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_poll_init
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_has_work
>    Cyclomatic Complexity 2 drivers//vhost/vhost.c:__vhost_vq_meta_reset
>    Cyclomatic Complexity 2 drivers//vhost/vhost.c:vhost_vq_meta_reset
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_vq_reset
>    Cyclomatic Complexity 2 drivers//vhost/vhost.c:vhost_dev_check_owner
>    Cyclomatic Complexity 1 drivers//vhost/vhost.c:vhost_dev_has_owner
> 
> vim +1771 drivers//vhost/vhost.c
> 
>   1760	
>   1761	static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
>   1762	{
>   1763		struct iovec iov[64];
>   1764		int i, ret;
>   1765	
>   1766		if (!vq->iotlb) {
>   1767			log_write(vq->log_base, vq->log_addr + used_offset, len);
>   1768			return;
>   1769		}
>   1770	
> > 1771		ret = translate_desc(vq, (u64)vq->used + used_offset, len, iov, 64,
>   1772				     VHOST_ACCESS_WO);
>   1773		WARN_ON(ret < 0);
>   1774	
>   1775		for (i = 0; i < ret; i++) {
>   1776			ret = log_write_hva(vq, (u64)iov[i].iov_base, iov[i].iov_len);
>   1777			WARN_ON(ret);
>   1778		}
>   1779	}
>   1780	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling
  2018-12-10  9:44 ` [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling Jason Wang
@ 2018-12-11  1:34   ` Michael S. Tsirkin
  2018-12-11  3:06     ` Jason Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2018-12-11  1:34 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang

On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:
> When we try to do rx busy polling in tx path in commit 441abde4cd84
> ("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
> after tx vq mutex is held. This may lead deadlock so we try to lock vq
> one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
> one"). With this commit, we avoid the deadlock with the assumption
> that handle_rx() and handle_tx() run in a same process. But this
> commit remove the protection for IOTLB updating which requires the
> mutex of each vq to be held.
> 
> To solve this issue, the first step is to have a exact same lock
> ordering for vhost_net. This is done through:
> 
> - For handle_rx(), if busy polling is enabled, lock tx vq immediately.
> - For handle_tx(), always lock rx vq before tx vq, and unlock it if
>   busy polling is not enabled.
> - Remove the tricky locking codes in busy polling.
> 
> With this, we can have a exact same lock ordering for vhost_net, this
> allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
> vqs one by one") in next patch.
> 
> The patch will add two more atomic operations on the tx path during
> each round of handle_tx(). 1 byte TCP_RR does not notice such
> overhead.
> 
> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ab11b2bee273..5f272ab4d5b4 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>  	struct socket *sock;
>  	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
>  
> -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
>  	vhost_disable_notify(&net->dev, vq);
>  	sock = rvq->private_data;
>  
> @@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>  		vhost_net_busy_poll_try_queue(net, vq);
>  	else if (!poll_rx) /* On tx here, sock has no rx data. */
>  		vhost_enable_notify(&net->dev, rvq);
> -
> -	mutex_unlock(&vq->mutex);
>  }
>  
>  static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> @@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  static void handle_tx(struct vhost_net *net)
>  {
>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
>  	struct vhost_virtqueue *vq = &nvq->vq;
> +	struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
>  	struct socket *sock;
>  
> +	mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
>  	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
> +	if (!vq->busyloop_timeout)
> +		mutex_unlock(&vq_rx->mutex);
> +
>  	sock = vq->private_data;
>  	if (!sock)
>  		goto out;
> @@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
>  		handle_tx_copy(net, sock);
>  
>  out:
> +	if (vq->busyloop_timeout)
> +		mutex_unlock(&vq_rx->mutex);
>  	mutex_unlock(&vq->mutex);
>  }
>  


So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
is just messed up. Why can't tx polling drop rx lock before
getting the tx lock and vice versa?

Or if we really wanted to force everything to be locked at
all times, let's just use a single mutex.



> @@ -1060,7 +1065,9 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
>  static void handle_rx(struct vhost_net *net)
>  {
>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_RX];
> +	struct vhost_net_virtqueue *nvq_tx = &net->vqs[VHOST_NET_VQ_TX];
>  	struct vhost_virtqueue *vq = &nvq->vq;
> +	struct vhost_virtqueue *vq_tx = &nvq_tx->vq;
>  	unsigned uninitialized_var(in), log;
>  	struct vhost_log *vq_log;
>  	struct msghdr msg = {
> @@ -1086,6 +1093,9 @@ static void handle_rx(struct vhost_net *net)
>  	int recv_pkts = 0;
>  
>  	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
> +	if (vq->busyloop_timeout)
> +		mutex_lock_nested(&vq_tx->mutex, VHOST_NET_VQ_TX);
> +
>  	sock = vq->private_data;
>  	if (!sock)
>  		goto out;
> @@ -1200,6 +1210,8 @@ static void handle_rx(struct vhost_net *net)
>  out:
>  	vhost_net_signal_used(nvq);
>  	mutex_unlock(&vq->mutex);
> +	if (vq->busyloop_timeout)
> +		mutex_unlock(&vq_tx->mutex);
>  }
>  
>  static void handle_tx_kick(struct vhost_work *work)
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 0/4] Fix various issue of vhost
  2018-12-10 19:47 ` [PATCH net 0/4] Fix various issue of vhost David Miller
@ 2018-12-11  3:01   ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-12-11  3:01 UTC (permalink / raw)
  To: David Miller; +Cc: mst, kvm, virtualization, netdev, linux-kernel


On 2018/12/11 上午3:47, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Mon, 10 Dec 2018 17:44:50 +0800
>
>> This series tries to fix various issues of vhost:
>>
>> - Patch 1 adds a missing write barrier between used idx updating and
>>    logging.
>> - Patch 2-3 brings back the protection of device IOTLB through vq
>>    mutex, this fixes possible use after free in device IOTLB entries.
>> - Patch 4 fixes the diry page logging when device IOTLB is
>>    enabled. We should done through GPA instead of GIOVA, this was done
>>    through logging through iovec and traversing GPA->HPA list for the
>>    GPA.
>>
>> Please consider them for -stable.
> Looks like the kbuild robot found some problems.
>
> ->used is a pointer (which might be 32-bit) and you're casting it to
> a u64 in the translate_desc() calls of patch #4.
>
> Please make sure that you don't actually require the full domain of
> a u64 in these values, as obviously if vq->used is a pointer you will
> only get a 32-bit domain on 32-bit architectures.


It seems the reason is that I cast from plain void pointer directly. Let 
me cast it to uintptr_t first.

Thanks


>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling
  2018-12-11  1:34   ` Michael S. Tsirkin
@ 2018-12-11  3:06     ` Jason Wang
  2018-12-11  4:04       ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2018-12-11  3:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang


On 2018/12/11 上午9:34, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:
>> When we try to do rx busy polling in tx path in commit 441abde4cd84
>> ("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
>> after tx vq mutex is held. This may lead deadlock so we try to lock vq
>> one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
>> one"). With this commit, we avoid the deadlock with the assumption
>> that handle_rx() and handle_tx() run in a same process. But this
>> commit remove the protection for IOTLB updating which requires the
>> mutex of each vq to be held.
>>
>> To solve this issue, the first step is to have a exact same lock
>> ordering for vhost_net. This is done through:
>>
>> - For handle_rx(), if busy polling is enabled, lock tx vq immediately.
>> - For handle_tx(), always lock rx vq before tx vq, and unlock it if
>>    busy polling is not enabled.
>> - Remove the tricky locking codes in busy polling.
>>
>> With this, we can have a exact same lock ordering for vhost_net, this
>> allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
>> vqs one by one") in next patch.
>>
>> The patch will add two more atomic operations on the tx path during
>> each round of handle_tx(). 1 byte TCP_RR does not notice such
>> overhead.
>>
>> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
>> Cc: Tonghao Zhang<xiangxia.m.yue@gmail.com>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>   drivers/vhost/net.c | 18 +++++++++++++++---
>>   1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index ab11b2bee273..5f272ab4d5b4 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>>   	struct socket *sock;
>>   	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
>>   
>> -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
>>   	vhost_disable_notify(&net->dev, vq);
>>   	sock = rvq->private_data;
>>   
>> @@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>>   		vhost_net_busy_poll_try_queue(net, vq);
>>   	else if (!poll_rx) /* On tx here, sock has no rx data. */
>>   		vhost_enable_notify(&net->dev, rvq);
>> -
>> -	mutex_unlock(&vq->mutex);
>>   }
>>   
>>   static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>> @@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>>   static void handle_tx(struct vhost_net *net)
>>   {
>>   	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>> +	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
>>   	struct vhost_virtqueue *vq = &nvq->vq;
>> +	struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
>>   	struct socket *sock;
>>   
>> +	mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
>>   	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
>> +	if (!vq->busyloop_timeout)
>> +		mutex_unlock(&vq_rx->mutex);
>> +
>>   	sock = vq->private_data;
>>   	if (!sock)
>>   		goto out;
>> @@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
>>   		handle_tx_copy(net, sock);
>>   
>>   out:
>> +	if (vq->busyloop_timeout)
>> +		mutex_unlock(&vq_rx->mutex);
>>   	mutex_unlock(&vq->mutex);
>>   }
>>   
> So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
> is just messed up. Why can't tx polling drop rx lock before
> getting the tx lock and vice versa?


Because we want to poll both tx and rx virtqueue at the same time 
(vhost_net_busy_poll()).

     while (vhost_can_busy_poll(endtime)) {
         if (vhost_has_work(&net->dev)) {
             *busyloop_intr = true;
             break;
         }

         if ((sock_has_rx_data(sock) &&
              !vhost_vq_avail_empty(&net->dev, rvq)) ||
             !vhost_vq_avail_empty(&net->dev, tvq))
             break;

         cpu_relax();

     }


And we disable kicks and notification for better performance.


>
> Or if we really wanted to force everything to be locked at
> all times, let's just use a single mutex.
>
>
>

We could, but it might requires more changes which could be done for 
-next I believe.


Thanks


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling
  2018-12-11  3:06     ` Jason Wang
@ 2018-12-11  4:04       ` Michael S. Tsirkin
  2018-12-12  3:03         ` Jason Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2018-12-11  4:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang

On Tue, Dec 11, 2018 at 11:06:43AM +0800, Jason Wang wrote:
> 
> On 2018/12/11 上午9:34, Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:
> > > When we try to do rx busy polling in tx path in commit 441abde4cd84
> > > ("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
> > > after tx vq mutex is held. This may lead deadlock so we try to lock vq
> > > one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
> > > one"). With this commit, we avoid the deadlock with the assumption
> > > that handle_rx() and handle_tx() run in a same process. But this
> > > commit remove the protection for IOTLB updating which requires the
> > > mutex of each vq to be held.
> > > 
> > > To solve this issue, the first step is to have a exact same lock
> > > ordering for vhost_net. This is done through:
> > > 
> > > - For handle_rx(), if busy polling is enabled, lock tx vq immediately.
> > > - For handle_tx(), always lock rx vq before tx vq, and unlock it if
> > >    busy polling is not enabled.
> > > - Remove the tricky locking codes in busy polling.
> > > 
> > > With this, we can have a exact same lock ordering for vhost_net, this
> > > allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
> > > vqs one by one") in next patch.
> > > 
> > > The patch will add two more atomic operations on the tx path during
> > > each round of handle_tx(). 1 byte TCP_RR does not notice such
> > > overhead.
> > > 
> > > Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> > > Cc: Tonghao Zhang<xiangxia.m.yue@gmail.com>
> > > Signed-off-by: Jason Wang<jasowang@redhat.com>
> > > ---
> > >   drivers/vhost/net.c | 18 +++++++++++++++---
> > >   1 file changed, 15 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index ab11b2bee273..5f272ab4d5b4 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> > >   	struct socket *sock;
> > >   	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
> > > -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> > >   	vhost_disable_notify(&net->dev, vq);
> > >   	sock = rvq->private_data;
> > > @@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> > >   		vhost_net_busy_poll_try_queue(net, vq);
> > >   	else if (!poll_rx) /* On tx here, sock has no rx data. */
> > >   		vhost_enable_notify(&net->dev, rvq);
> > > -
> > > -	mutex_unlock(&vq->mutex);
> > >   }
> > >   static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> > > @@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > >   static void handle_tx(struct vhost_net *net)
> > >   {
> > >   	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> > > +	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
> > >   	struct vhost_virtqueue *vq = &nvq->vq;
> > > +	struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
> > >   	struct socket *sock;
> > > +	mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
> > >   	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
> > > +	if (!vq->busyloop_timeout)
> > > +		mutex_unlock(&vq_rx->mutex);
> > > +
> > >   	sock = vq->private_data;
> > >   	if (!sock)
> > >   		goto out;
> > > @@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
> > >   		handle_tx_copy(net, sock);
> > >   out:
> > > +	if (vq->busyloop_timeout)
> > > +		mutex_unlock(&vq_rx->mutex);
> > >   	mutex_unlock(&vq->mutex);
> > >   }
> > So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
> > is just messed up. Why can't tx polling drop rx lock before
> > getting the tx lock and vice versa?
> 
> 
> Because we want to poll both tx and rx virtqueue at the same time
> (vhost_net_busy_poll()).
> 
>     while (vhost_can_busy_poll(endtime)) {
>         if (vhost_has_work(&net->dev)) {
>             *busyloop_intr = true;
>             break;
>         }
> 
>         if ((sock_has_rx_data(sock) &&
>              !vhost_vq_avail_empty(&net->dev, rvq)) ||
>             !vhost_vq_avail_empty(&net->dev, tvq))
>             break;
> 
>         cpu_relax();
> 
>     }
> 
> 
> And we disable kicks and notification for better performance.

Right but it's all slow path - it happens when queue is
otherwise empty. So this is what I am saying: let's drop the locks
we hold around this.


> 
> > 
> > Or if we really wanted to force everything to be locked at
> > all times, let's just use a single mutex.
> > 
> > 
> > 
> 
> We could, but it might requires more changes which could be done for -next I
> believe.
> 
> 
> Thanks

I'd rather we kept the fine grained locking. E.g. people are
looking at splitting the tx and rx threads. But if not possible
let's fix it cleanly with a coarse-grained one. A mess here will
just create more trouble later.

-- 
MST

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling
  2018-12-11  4:04       ` Michael S. Tsirkin
@ 2018-12-12  3:03         ` Jason Wang
  2018-12-12  3:40           ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2018-12-12  3:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang


On 2018/12/11 下午12:04, Michael S. Tsirkin wrote:
> On Tue, Dec 11, 2018 at 11:06:43AM +0800, Jason Wang wrote:
>> On 2018/12/11 上午9:34, Michael S. Tsirkin wrote:
>>> On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:
>>>> When we try to do rx busy polling in tx path in commit 441abde4cd84
>>>> ("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
>>>> after tx vq mutex is held. This may lead deadlock so we try to lock vq
>>>> one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
>>>> one"). With this commit, we avoid the deadlock with the assumption
>>>> that handle_rx() and handle_tx() run in a same process. But this
>>>> commit remove the protection for IOTLB updating which requires the
>>>> mutex of each vq to be held.
>>>>
>>>> To solve this issue, the first step is to have a exact same lock
>>>> ordering for vhost_net. This is done through:
>>>>
>>>> - For handle_rx(), if busy polling is enabled, lock tx vq immediately.
>>>> - For handle_tx(), always lock rx vq before tx vq, and unlock it if
>>>>     busy polling is not enabled.
>>>> - Remove the tricky locking codes in busy polling.
>>>>
>>>> With this, we can have a exact same lock ordering for vhost_net, this
>>>> allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
>>>> vqs one by one") in next patch.
>>>>
>>>> The patch will add two more atomic operations on the tx path during
>>>> each round of handle_tx(). 1 byte TCP_RR does not notice such
>>>> overhead.
>>>>
>>>> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
>>>> Cc: Tonghao Zhang<xiangxia.m.yue@gmail.com>
>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>> ---
>>>>    drivers/vhost/net.c | 18 +++++++++++++++---
>>>>    1 file changed, 15 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> index ab11b2bee273..5f272ab4d5b4 100644
>>>> --- a/drivers/vhost/net.c
>>>> +++ b/drivers/vhost/net.c
>>>> @@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>>>>    	struct socket *sock;
>>>>    	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
>>>> -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
>>>>    	vhost_disable_notify(&net->dev, vq);
>>>>    	sock = rvq->private_data;
>>>> @@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>>>>    		vhost_net_busy_poll_try_queue(net, vq);
>>>>    	else if (!poll_rx) /* On tx here, sock has no rx data. */
>>>>    		vhost_enable_notify(&net->dev, rvq);
>>>> -
>>>> -	mutex_unlock(&vq->mutex);
>>>>    }
>>>>    static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>>>> @@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>>>>    static void handle_tx(struct vhost_net *net)
>>>>    {
>>>>    	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>>>> +	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
>>>>    	struct vhost_virtqueue *vq = &nvq->vq;
>>>> +	struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
>>>>    	struct socket *sock;
>>>> +	mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
>>>>    	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
>>>> +	if (!vq->busyloop_timeout)
>>>> +		mutex_unlock(&vq_rx->mutex);
>>>> +
>>>>    	sock = vq->private_data;
>>>>    	if (!sock)
>>>>    		goto out;
>>>> @@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
>>>>    		handle_tx_copy(net, sock);
>>>>    out:
>>>> +	if (vq->busyloop_timeout)
>>>> +		mutex_unlock(&vq_rx->mutex);
>>>>    	mutex_unlock(&vq->mutex);
>>>>    }
>>> So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
>>> is just messed up. Why can't tx polling drop rx lock before
>>> getting the tx lock and vice versa?
>>
>> Because we want to poll both tx and rx virtqueue at the same time
>> (vhost_net_busy_poll()).
>>
>>      while (vhost_can_busy_poll(endtime)) {
>>          if (vhost_has_work(&net->dev)) {
>>              *busyloop_intr = true;
>>              break;
>>          }
>>
>>          if ((sock_has_rx_data(sock) &&
>>               !vhost_vq_avail_empty(&net->dev, rvq)) ||
>>              !vhost_vq_avail_empty(&net->dev, tvq))
>>              break;
>>
>>          cpu_relax();
>>
>>      }
>>
>>
>> And we disable kicks and notification for better performance.
> Right but it's all slow path - it happens when queue is
> otherwise empty. So this is what I am saying: let's drop the locks
> we hold around this.


Is this really safe? I looks to me it can race with SET_VRING_ADDR. And 
the codes did more:

- access sock object

- access device IOTLB

- enable and disable notification

None of above is safe without the protection of vq mutex.


>
>
>>> Or if we really wanted to force everything to be locked at
>>> all times, let's just use a single mutex.
>>>
>>>
>>>
>> We could, but it might requires more changes which could be done for -next I
>> believe.
>>
>>
>> Thanks
> I'd rather we kept the fine grained locking. E.g. people are
> looking at splitting the tx and rx threads. But if not possible
> let's fix it cleanly with a coarse-grained one. A mess here will
> just create more trouble later.
>

I believe we won't go back to coarse one. Looks like we can solve this 
by using mutex_trylock() for rxq during TX. And don't do polling for rxq 
is a IOTLB updating is pending.

Let me post V2.

Thanks


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling
  2018-12-12  3:03         ` Jason Wang
@ 2018-12-12  3:40           ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12  3:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang

On Wed, Dec 12, 2018 at 11:03:57AM +0800, Jason Wang wrote:
> 
> On 2018/12/11 下午12:04, Michael S. Tsirkin wrote:
> > On Tue, Dec 11, 2018 at 11:06:43AM +0800, Jason Wang wrote:
> > > On 2018/12/11 上午9:34, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:
> > > > > When we try to do rx busy polling in tx path in commit 441abde4cd84
> > > > > ("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
> > > > > after tx vq mutex is held. This may lead deadlock so we try to lock vq
> > > > > one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
> > > > > one"). With this commit, we avoid the deadlock with the assumption
> > > > > that handle_rx() and handle_tx() run in a same process. But this
> > > > > commit remove the protection for IOTLB updating which requires the
> > > > > mutex of each vq to be held.
> > > > > 
> > > > > To solve this issue, the first step is to have a exact same lock
> > > > > ordering for vhost_net. This is done through:
> > > > > 
> > > > > - For handle_rx(), if busy polling is enabled, lock tx vq immediately.
> > > > > - For handle_tx(), always lock rx vq before tx vq, and unlock it if
> > > > >     busy polling is not enabled.
> > > > > - Remove the tricky locking codes in busy polling.
> > > > > 
> > > > > With this, we can have a exact same lock ordering for vhost_net, this
> > > > > allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
> > > > > vqs one by one") in next patch.
> > > > > 
> > > > > The patch will add two more atomic operations on the tx path during
> > > > > each round of handle_tx(). 1 byte TCP_RR does not notice such
> > > > > overhead.
> > > > > 
> > > > > Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> > > > > Cc: Tonghao Zhang<xiangxia.m.yue@gmail.com>
> > > > > Signed-off-by: Jason Wang<jasowang@redhat.com>
> > > > > ---
> > > > >    drivers/vhost/net.c | 18 +++++++++++++++---
> > > > >    1 file changed, 15 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > index ab11b2bee273..5f272ab4d5b4 100644
> > > > > --- a/drivers/vhost/net.c
> > > > > +++ b/drivers/vhost/net.c
> > > > > @@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> > > > >    	struct socket *sock;
> > > > >    	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
> > > > > -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> > > > >    	vhost_disable_notify(&net->dev, vq);
> > > > >    	sock = rvq->private_data;
> > > > > @@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> > > > >    		vhost_net_busy_poll_try_queue(net, vq);
> > > > >    	else if (!poll_rx) /* On tx here, sock has no rx data. */
> > > > >    		vhost_enable_notify(&net->dev, rvq);
> > > > > -
> > > > > -	mutex_unlock(&vq->mutex);
> > > > >    }
> > > > >    static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> > > > > @@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> > > > >    static void handle_tx(struct vhost_net *net)
> > > > >    {
> > > > >    	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> > > > > +	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
> > > > >    	struct vhost_virtqueue *vq = &nvq->vq;
> > > > > +	struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
> > > > >    	struct socket *sock;
> > > > > +	mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
> > > > >    	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
> > > > > +	if (!vq->busyloop_timeout)
> > > > > +		mutex_unlock(&vq_rx->mutex);
> > > > > +
> > > > >    	sock = vq->private_data;
> > > > >    	if (!sock)
> > > > >    		goto out;
> > > > > @@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
> > > > >    		handle_tx_copy(net, sock);
> > > > >    out:
> > > > > +	if (vq->busyloop_timeout)
> > > > > +		mutex_unlock(&vq_rx->mutex);
> > > > >    	mutex_unlock(&vq->mutex);
> > > > >    }
> > > > So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
> > > > is just messed up. Why can't tx polling drop rx lock before
> > > > getting the tx lock and vice versa?
> > > 
> > > Because we want to poll both tx and rx virtqueue at the same time
> > > (vhost_net_busy_poll()).
> > > 
> > >      while (vhost_can_busy_poll(endtime)) {
> > >          if (vhost_has_work(&net->dev)) {
> > >              *busyloop_intr = true;
> > >              break;
> > >          }
> > > 
> > >          if ((sock_has_rx_data(sock) &&
> > >               !vhost_vq_avail_empty(&net->dev, rvq)) ||
> > >              !vhost_vq_avail_empty(&net->dev, tvq))
> > >              break;
> > > 
> > >          cpu_relax();
> > > 
> > >      }
> > > 
> > > 
> > > And we disable kicks and notification for better performance.
> > Right but it's all slow path - it happens when queue is
> > otherwise empty. So this is what I am saying: let's drop the locks
> > we hold around this.
> 
> 
> Is this really safe? I looks to me it can race with SET_VRING_ADDR. And the
> codes did more:
> 
> - access sock object
> 
> - access device IOTLB
> 
> - enable and disable notification
> 
> None of above is safe without the protection of vq mutex.


ys but take another lock. just not nested.


> 
> > 
> > 
> > > > Or if we really wanted to force everything to be locked at
> > > > all times, let's just use a single mutex.
> > > > 
> > > > 
> > > > 
> > > We could, but it might requires more changes which could be done for -next I
> > > believe.
> > > 
> > > 
> > > Thanks
> > I'd rather we kept the fine grained locking. E.g. people are
> > looking at splitting the tx and rx threads. But if not possible
> > let's fix it cleanly with a coarse-grained one. A mess here will
> > just create more trouble later.
> > 
> 
> I believe we won't go back to coarse one. Looks like we can solve this by
> using mutex_trylock() for rxq during TX. And don't do polling for rxq is a
> IOTLB updating is pending.
> 
> Let me post V2.
> 
> Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 4/4] vhost: log dirty page correctly
  2018-12-10  9:44 ` [PATCH net 4/4] vhost: log dirty page correctly Jason Wang
  2018-12-10 15:14   ` kbuild test robot
@ 2018-12-19 17:29   ` kbuild test robot
  1 sibling, 0 replies; 15+ messages in thread
From: kbuild test robot @ 2018-12-19 17:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, kvm, virtualization, netdev,
	linux-kernel, Jintack Lim

[-- Attachment #1: Type: text/plain, Size: 10097 bytes --]

Hi Jason,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/Fix-various-issue-of-vhost/20181210-223236
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   include/linux/slab.h:332:43: warning: dubious: x & !y
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
>> drivers/vhost/vhost.c:1771:35: warning: cast removes address space '<asn:1>' of expression
   drivers/vhost/vhost.c:1776:42: warning: cast removes address space '<asn:1>' of expression
   drivers/vhost/vhost.c:1788:48: warning: cast removes address space '<asn:1>' of expression
   drivers/vhost/vhost.c:1819:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:1819:13:    expected void *addr
   drivers/vhost/vhost.c:1819:13:    got restricted __virtio16 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:1837:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:1837:13:    expected void *addr
   drivers/vhost/vhost.c:1837:13:    got restricted __virtio16 [noderef] [usertype] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:1874:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:1874:13:    expected void *addr
   drivers/vhost/vhost.c:1874:13:    got restricted __virtio16 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2073:21: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2073:21:    expected void *addr
   drivers/vhost/vhost.c:2073:21:    got restricted __virtio16 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2100:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2100:13:    expected void *addr
   drivers/vhost/vhost.c:2100:13:    got restricted __virtio16 [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2231:21: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2231:21:    expected void *addr
   drivers/vhost/vhost.c:2231:21:    got restricted __virtio32 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2235:21: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2235:21:    expected void *addr
   drivers/vhost/vhost.c:2235:21:    got restricted __virtio32 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2281:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2281:13:    expected void *addr
   drivers/vhost/vhost.c:2281:13:    got restricted __virtio16 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2315:21: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2315:21:    expected void *addr
   drivers/vhost/vhost.c:2315:21:    got restricted __virtio16 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2329:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2329:13:    expected void *addr
   drivers/vhost/vhost.c:2329:13:    got restricted __virtio16 [noderef] [usertype] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr
   drivers/vhost/vhost.c:851:42:    got void *addr
   drivers/vhost/vhost.c:2374:13: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:2374:13:    expected void *addr
   drivers/vhost/vhost.c:2374:13:    got restricted __virtio16 [noderef] <asn:1> *<noident>
   drivers/vhost/vhost.c:704:17: warning: incorrect type in return expression (different address spaces)
   drivers/vhost/vhost.c:704:17:    expected void [noderef] <asn:1> *
   drivers/vhost/vhost.c:704:17:    got void *<noident>
   drivers/vhost/vhost.c:851:42: warning: incorrect type in argument 2 (different address spaces)
   drivers/vhost/vhost.c:851:42:    expected void [noderef] <asn:1> *addr

vim +1771 drivers/vhost/vhost.c

  1760	
  1761	static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
  1762	{
  1763		struct iovec iov[64];
  1764		int i, ret;
  1765	
  1766		if (!vq->iotlb) {
  1767			log_write(vq->log_base, vq->log_addr + used_offset, len);
  1768			return;
  1769		}
  1770	
> 1771		ret = translate_desc(vq, (u64)vq->used + used_offset, len, iov, 64,
  1772				     VHOST_ACCESS_WO);
  1773		WARN_ON(ret < 0);
  1774	
  1775		for (i = 0; i < ret; i++) {
  1776			ret = log_write_hva(vq, (u64)iov[i].iov_base, iov[i].iov_len);
  1777			WARN_ON(ret);
  1778		}
  1779	}
  1780	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 66638 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-12-19 17:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10  9:44 [PATCH net 0/4] Fix various issue of vhost Jason Wang
2018-12-10  9:44 ` [PATCH net 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
2018-12-10  9:44 ` [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling Jason Wang
2018-12-11  1:34   ` Michael S. Tsirkin
2018-12-11  3:06     ` Jason Wang
2018-12-11  4:04       ` Michael S. Tsirkin
2018-12-12  3:03         ` Jason Wang
2018-12-12  3:40           ` Michael S. Tsirkin
2018-12-10  9:44 ` [PATCH net 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
2018-12-10  9:44 ` [PATCH net 4/4] vhost: log dirty page correctly Jason Wang
2018-12-10 15:14   ` kbuild test robot
2018-12-11  1:30     ` Michael S. Tsirkin
2018-12-19 17:29   ` kbuild test robot
2018-12-10 19:47 ` [PATCH net 0/4] Fix various issue of vhost David Miller
2018-12-11  3:01   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).