All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net V2 0/4] Fix various issue of vhost
@ 2018-12-12 10:08 Jason Wang
  2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

Hi:

This series tries to fix various issues of vhost:

- Patch 1 adds a missing write barrier between used idx updating and
  logging.
- Patch 2-3 brings back the protection of device IOTLB through vq
  mutex, this fixes possible use after free in device IOTLB entries.
- Patch 4-7 fixes the diry page logging when device IOTLB is
  enabled. We should done through GPA instead of GIOVA, this was done
  through intorudce HVA->GPA reverse mapping and convert HVA to GPA
  during logging dirty pages.

Please consider them for -stable.

Thanks

Changes from V1:
- silent compiler warning for 32bit.
- use mutex_trylock() on slowpath instead of mutex_lock() even on fast
  path.

Jason Wang (4):
  vhost: make sure used idx is seen before log in vhost_add_used_n()
  vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
  Revert "net: vhost: lock the vqs one by one"
  vhost: log dirty page correctly

 drivers/vhost/net.c   |  11 ++++-
 drivers/vhost/vhost.c | 102 ++++++++++++++++++++++++++++++++++--------
 drivers/vhost/vhost.h |   3 +-
 3 files changed, 95 insertions(+), 21 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n()
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
  2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 14:33   ` Michael S. Tsirkin
  2018-12-12 14:33   ` Michael S. Tsirkin
  2018-12-12 10:08 ` [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() Jason Wang
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().

Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6b98d8e3a5bf..5915f240275a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2220,6 +2220,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		return -EFAULT;
 	}
 	if (unlikely(vq->log_used)) {
+		/* Make sure used idx is seen before log. */
+		smp_wmb();
 		/* Log used index update. */
 		log_write(vq->log_base,
 			  vq->log_addr + offsetof(struct vring_used, idx),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n()
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 10:08 ` Jason Wang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().

Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6b98d8e3a5bf..5915f240275a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2220,6 +2220,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		return -EFAULT;
 	}
 	if (unlikely(vq->log_used)) {
+		/* Make sure used idx is seen before log. */
+		smp_wmb();
 		/* Log used index update. */
 		log_write(vq->log_base,
 			  vq->log_addr + offsetof(struct vring_used, idx),
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (2 preceding siblings ...)
  2018-12-12 10:08 ` [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 14:20   ` Michael S. Tsirkin
  2018-12-12 14:20   ` Michael S. Tsirkin
  2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Tonghao Zhang

We used to hold the mutex of paired virtqueue in
vhost_net_busy_poll(). But this will results an inconsistent lock
order which may cause deadlock if we try to bring back the protection
of device IOTLB with vq mutex that requires to hold mutex of all
virtqueues at the same time.

Fix this simply by switching to use mutex_trylock(), when fail just
skip the busy polling. This can happen when device IOTLB is under
updating which should be rare.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ab11b2bee273..ad7a6f475a44 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -513,7 +513,13 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	struct socket *sock;
 	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
 
-	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+	/* Try to hold the vq mutex of the paired virtqueue. We can't
+	 * use mutex_lock() here since we could not guarantee a
+	 * consistenet lock ordering.
+	 */
+	if (!mutex_trylock(&vq->mutex))
+		return;
+
 	vhost_disable_notify(&net->dev, vq);
 	sock = rvq->private_data;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
  2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
  2018-12-12 10:08 ` Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 10:08 ` Jason Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

We used to hold the mutex of paired virtqueue in
vhost_net_busy_poll(). But this will results an inconsistent lock
order which may cause deadlock if we try to bring back the protection
of device IOTLB with vq mutex that requires to hold mutex of all
virtqueues at the same time.

Fix this simply by switching to use mutex_trylock(), when fail just
skip the busy polling. This can happen when device IOTLB is under
updating which should be rare.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ab11b2bee273..ad7a6f475a44 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -513,7 +513,13 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	struct socket *sock;
 	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
 
-	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+	/* Try to hold the vq mutex of the paired virtqueue. We can't
+	 * use mutex_lock() here since we could not guarantee a
+	 * consistenet lock ordering.
+	 */
+	if (!mutex_trylock(&vq->mutex))
+		return;
+
 	vhost_disable_notify(&net->dev, vq);
 	sock = rvq->private_data;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (3 preceding siblings ...)
  2018-12-12 10:08 ` Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 14:24   ` Michael S. Tsirkin
  2018-12-12 14:24   ` Michael S. Tsirkin
  2018-12-12 10:08 ` Jason Wang
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Tonghao Zhang

This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've switched to use
mutex_trylock() in previous patch, it's safe to revert it without
having deadlock.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5915f240275a..55e5aa662ad5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -295,11 +295,8 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i) {
-		mutex_lock(&d->vqs[i]->mutex);
+	for (i = 0; i < d->nvqs; ++i)
 		__vhost_vq_meta_reset(d->vqs[i]);
-		mutex_unlock(&d->vqs[i]->mutex);
-	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -895,6 +892,20 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
+static void vhost_dev_lock_vqs(struct vhost_dev *d)
+{
+	int i = 0;
+	for (i = 0; i < d->nvqs; ++i)
+		mutex_lock_nested(&d->vqs[i]->mutex, i);
+}
+
+static void vhost_dev_unlock_vqs(struct vhost_dev *d)
+{
+	int i = 0;
+	for (i = 0; i < d->nvqs; ++i)
+		mutex_unlock(&d->vqs[i]->mutex);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -976,6 +987,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
+	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1009,6 +1021,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
+	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (4 preceding siblings ...)
  2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've switched to use
mutex_trylock() in previous patch, it's safe to revert it without
having deadlock.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5915f240275a..55e5aa662ad5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -295,11 +295,8 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i) {
-		mutex_lock(&d->vqs[i]->mutex);
+	for (i = 0; i < d->nvqs; ++i)
 		__vhost_vq_meta_reset(d->vqs[i]);
-		mutex_unlock(&d->vqs[i]->mutex);
-	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -895,6 +892,20 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
+static void vhost_dev_lock_vqs(struct vhost_dev *d)
+{
+	int i = 0;
+	for (i = 0; i < d->nvqs; ++i)
+		mutex_lock_nested(&d->vqs[i]->mutex, i);
+}
+
+static void vhost_dev_unlock_vqs(struct vhost_dev *d)
+{
+	int i = 0;
+	for (i = 0; i < d->nvqs; ++i)
+		mutex_unlock(&d->vqs[i]->mutex);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -976,6 +987,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
+	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1009,6 +1021,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
+	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (5 preceding siblings ...)
  2018-12-12 10:08 ` Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 14:32   ` Michael S. Tsirkin
  2018-12-12 14:32   ` Michael S. Tsirkin
  2018-12-12 10:08 ` Jason Wang
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Jintack Lim

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |  3 +-
 drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
 drivers/vhost/vhost.h |  3 +-
 3 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ad7a6f475a44..784df2b49628 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
 		if (nvq->done_idx > VHOST_NET_BATCH)
 			vhost_net_signal_used(nvq);
 		if (unlikely(vq_log))
-			vhost_log_write(vq, vq_log, log, vhost_len);
+			vhost_log_write(vq, vq_log, log, vhost_len,
+					vq->iov, in);
 		total_len += vhost_len;
 		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
 			vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 55e5aa662ad5..3660310604fd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
 	return r;
 }
 
+static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
+{
+	struct vhost_umem *umem = vq->umem;
+	struct vhost_umem_node *u;
+	u64 gpa;
+	int r;
+	bool hit = false;
+
+	list_for_each_entry(u, &umem->umem_list, link) {
+		if (u->userspace_addr < hva &&
+		    u->userspace_addr + u->size >=
+		    hva + len) {
+			gpa = u->start + hva - u->userspace_addr;
+			r = log_write(vq->log_base, gpa, len);
+			if (r < 0)
+				return r;
+			hit = true;
+		}
+	}
+
+	/* No reverse mapping, should be a bug */
+	WARN_ON(!hit);
+	return 0;
+}
+
+static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
+{
+	struct iovec iov[64];
+	int i, ret;
+
+	if (!vq->iotlb) {
+		log_write(vq->log_base, vq->log_addr + used_offset, len);
+		return;
+	}
+
+	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
+			     len, iov, 64, VHOST_ACCESS_WO);
+	WARN_ON(ret < 0);
+
+	for (i = 0; i < ret; i++) {
+		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
+				    iov[i].iov_len);
+		WARN_ON(ret);
+	}
+}
+
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-		    unsigned int log_num, u64 len)
+		    unsigned int log_num, u64 len, struct iovec *iov, int count)
 {
 	int i, r;
 
+	if (vq->iotlb) {
+		for (i = 0; i < count; i++) {
+			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
+					  iov[i].iov_len);
+			if (r < 0)
+				return r;
+		}
+		return 0;
+	}
+
 	/* Make sure data written is seen before log. */
 	smp_wmb();
 	for (i = 0; i < log_num; ++i) {
@@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
 		smp_wmb();
 		/* Log used flag write. */
 		used = &vq->used->flags;
-		log_write(vq->log_base, vq->log_addr +
-			  (used - (void __user *)vq->used),
-			  sizeof vq->used->flags);
+		log_used(vq, (used - (void __user *)vq->used),
+			 sizeof vq->used->flags);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
 		smp_wmb();
 		/* Log avail event write */
 		used = vhost_avail_event(vq);
-		log_write(vq->log_base, vq->log_addr +
-			  (used - (void __user *)vq->used),
-			  sizeof *vhost_avail_event(vq));
+		log_used(vq, (used - (void __user *)vq->used),
+			 sizeof *vhost_avail_event(vq));
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 		/* Make sure data is seen before log. */
 		smp_wmb();
 		/* Log used ring entry write. */
-		log_write(vq->log_base,
-			  vq->log_addr +
-			   ((void __user *)used - (void __user *)vq->used),
-			  count * sizeof *used);
+		log_used(vq, ((void __user *)used - (void __user *)vq->used),
+			 count * sizeof *used);
 	}
 	old = vq->last_used_idx;
 	new = (vq->last_used_idx += count);
@@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		/* Make sure used idx is seen before log. */
 		smp_wmb();
 		/* Log used index update. */
-		log_write(vq->log_base,
-			  vq->log_addr + offsetof(struct vring_used, idx),
-			  sizeof vq->used->idx);
+		log_used(vq, offsetof(struct vring_used, idx),
+			 sizeof vq->used->idx);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 466ef7542291..1b675dad5e05 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-		    unsigned int log_num, u64 len);
+		    unsigned int log_num, u64 len,
+		    struct iovec *iov, int count);
 int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
 
 struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (6 preceding siblings ...)
  2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
@ 2018-12-12 10:08 ` Jason Wang
  2018-12-12 23:31 ` [PATCH net V2 0/4] Fix various issue of vhost David Miller
  2018-12-12 23:31 ` David Miller
  9 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel; +Cc: Jintack Lim

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |  3 +-
 drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
 drivers/vhost/vhost.h |  3 +-
 3 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ad7a6f475a44..784df2b49628 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
 		if (nvq->done_idx > VHOST_NET_BATCH)
 			vhost_net_signal_used(nvq);
 		if (unlikely(vq_log))
-			vhost_log_write(vq, vq_log, log, vhost_len);
+			vhost_log_write(vq, vq_log, log, vhost_len,
+					vq->iov, in);
 		total_len += vhost_len;
 		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
 			vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 55e5aa662ad5..3660310604fd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
 	return r;
 }
 
+static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
+{
+	struct vhost_umem *umem = vq->umem;
+	struct vhost_umem_node *u;
+	u64 gpa;
+	int r;
+	bool hit = false;
+
+	list_for_each_entry(u, &umem->umem_list, link) {
+		if (u->userspace_addr < hva &&
+		    u->userspace_addr + u->size >=
+		    hva + len) {
+			gpa = u->start + hva - u->userspace_addr;
+			r = log_write(vq->log_base, gpa, len);
+			if (r < 0)
+				return r;
+			hit = true;
+		}
+	}
+
+	/* No reverse mapping, should be a bug */
+	WARN_ON(!hit);
+	return 0;
+}
+
+static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
+{
+	struct iovec iov[64];
+	int i, ret;
+
+	if (!vq->iotlb) {
+		log_write(vq->log_base, vq->log_addr + used_offset, len);
+		return;
+	}
+
+	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
+			     len, iov, 64, VHOST_ACCESS_WO);
+	WARN_ON(ret < 0);
+
+	for (i = 0; i < ret; i++) {
+		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
+				    iov[i].iov_len);
+		WARN_ON(ret);
+	}
+}
+
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-		    unsigned int log_num, u64 len)
+		    unsigned int log_num, u64 len, struct iovec *iov, int count)
 {
 	int i, r;
 
+	if (vq->iotlb) {
+		for (i = 0; i < count; i++) {
+			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
+					  iov[i].iov_len);
+			if (r < 0)
+				return r;
+		}
+		return 0;
+	}
+
 	/* Make sure data written is seen before log. */
 	smp_wmb();
 	for (i = 0; i < log_num; ++i) {
@@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
 		smp_wmb();
 		/* Log used flag write. */
 		used = &vq->used->flags;
-		log_write(vq->log_base, vq->log_addr +
-			  (used - (void __user *)vq->used),
-			  sizeof vq->used->flags);
+		log_used(vq, (used - (void __user *)vq->used),
+			 sizeof vq->used->flags);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
 		smp_wmb();
 		/* Log avail event write */
 		used = vhost_avail_event(vq);
-		log_write(vq->log_base, vq->log_addr +
-			  (used - (void __user *)vq->used),
-			  sizeof *vhost_avail_event(vq));
+		log_used(vq, (used - (void __user *)vq->used),
+			 sizeof *vhost_avail_event(vq));
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
@@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 		/* Make sure data is seen before log. */
 		smp_wmb();
 		/* Log used ring entry write. */
-		log_write(vq->log_base,
-			  vq->log_addr +
-			   ((void __user *)used - (void __user *)vq->used),
-			  count * sizeof *used);
+		log_used(vq, ((void __user *)used - (void __user *)vq->used),
+			 count * sizeof *used);
 	}
 	old = vq->last_used_idx;
 	new = (vq->last_used_idx += count);
@@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		/* Make sure used idx is seen before log. */
 		smp_wmb();
 		/* Log used index update. */
-		log_write(vq->log_base,
-			  vq->log_addr + offsetof(struct vring_used, idx),
-			  sizeof vq->used->idx);
+		log_used(vq, offsetof(struct vring_used, idx),
+			 sizeof vq->used->idx);
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 466ef7542291..1b675dad5e05 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-		    unsigned int log_num, u64 len);
+		    unsigned int log_num, u64 len,
+		    struct iovec *iov, int count);
 int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
 
 struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
  2018-12-12 10:08 ` Jason Wang
@ 2018-12-12 14:20   ` Michael S. Tsirkin
  2018-12-12 14:20   ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang, David Miller

On Wed, Dec 12, 2018 at 06:08:17PM +0800, Jason Wang wrote:
> We used to hold the mutex of paired virtqueue in
> vhost_net_busy_poll(). But this will results an inconsistent lock
> order which may cause deadlock if we try to bring back the protection
> of device IOTLB with vq mutex that requires to hold mutex of all
> virtqueues at the same time.
> 
> Fix this simply by switching to use mutex_trylock(), when fail just
> skip the busy polling. This can happen when device IOTLB is under
> updating which should be rare.
> 
> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

and I think we should try to put this fix in 4.20 too.


> ---
>  drivers/vhost/net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ab11b2bee273..ad7a6f475a44 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -513,7 +513,13 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>  	struct socket *sock;
>  	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
>  
> -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> +	/* Try to hold the vq mutex of the paired virtqueue. We can't
> +	 * use mutex_lock() here since we could not guarantee a
> +	 * consistenet lock ordering.
> +	 */
> +	if (!mutex_trylock(&vq->mutex))
> +		return;
> +
>  	vhost_disable_notify(&net->dev, vq);
>  	sock = rvq->private_data;
>  
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
  2018-12-12 10:08 ` Jason Wang
  2018-12-12 14:20   ` Michael S. Tsirkin
@ 2018-12-12 14:20   ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:20 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, netdev, linux-kernel, virtualization, David Miller

On Wed, Dec 12, 2018 at 06:08:17PM +0800, Jason Wang wrote:
> We used to hold the mutex of paired virtqueue in
> vhost_net_busy_poll(). But this will results an inconsistent lock
> order which may cause deadlock if we try to bring back the protection
> of device IOTLB with vq mutex that requires to hold mutex of all
> virtqueues at the same time.
> 
> Fix this simply by switching to use mutex_trylock(), when fail just
> skip the busy polling. This can happen when device IOTLB is under
> updating which should be rare.
> 
> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

and I think we should try to put this fix in 4.20 too.


> ---
>  drivers/vhost/net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ab11b2bee273..ad7a6f475a44 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -513,7 +513,13 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>  	struct socket *sock;
>  	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
>  
> -	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> +	/* Try to hold the vq mutex of the paired virtqueue. We can't
> +	 * use mutex_lock() here since we could not guarantee a
> +	 * consistenet lock ordering.
> +	 */
> +	if (!mutex_trylock(&vq->mutex))
> +		return;
> +
>  	vhost_disable_notify(&net->dev, vq);
>  	sock = rvq->private_data;
>  
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
  2018-12-12 14:24   ` Michael S. Tsirkin
@ 2018-12-12 14:24   ` Michael S. Tsirkin
  2018-12-13  2:27     ` Jason Wang
  2018-12-13  2:27     ` Jason Wang
  1 sibling, 2 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:24 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang

On Wed, Dec 12, 2018 at 06:08:18PM +0800, Jason Wang wrote:
> This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
> protect device IOTLB with vq mutex, which will lead e.g use after free
> for device IOTLB entries. And since we've switched to use
> mutex_trylock() in previous patch, it's safe to revert it without
> having deadlock.
> 
> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

I'd try to put this in 4.20 if we can
and it's needed for -stable I think.

Also looks like we should allow iotlb entries per vq
to improve locking. What do you think?

> ---
>  drivers/vhost/vhost.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 5915f240275a..55e5aa662ad5 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -295,11 +295,8 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
>  {
>  	int i;
>  
> -	for (i = 0; i < d->nvqs; ++i) {
> -		mutex_lock(&d->vqs[i]->mutex);
> +	for (i = 0; i < d->nvqs; ++i)
>  		__vhost_vq_meta_reset(d->vqs[i]);
> -		mutex_unlock(&d->vqs[i]->mutex);
> -	}
>  }
>  
>  static void vhost_vq_reset(struct vhost_dev *dev,
> @@ -895,6 +892,20 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
>  #define vhost_get_used(vq, x, ptr) \
>  	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
>  
> +static void vhost_dev_lock_vqs(struct vhost_dev *d)
> +{
> +	int i = 0;
> +	for (i = 0; i < d->nvqs; ++i)
> +		mutex_lock_nested(&d->vqs[i]->mutex, i);
> +}
> +
> +static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> +{
> +	int i = 0;
> +	for (i = 0; i < d->nvqs; ++i)
> +		mutex_unlock(&d->vqs[i]->mutex);
> +}
> +
>  static int vhost_new_umem_range(struct vhost_umem *umem,
>  				u64 start, u64 size, u64 end,
>  				u64 userspace_addr, int perm)
> @@ -976,6 +987,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
>  	int ret = 0;
>  
>  	mutex_lock(&dev->mutex);
> +	vhost_dev_lock_vqs(dev);
>  	switch (msg->type) {
>  	case VHOST_IOTLB_UPDATE:
>  		if (!dev->iotlb) {
> @@ -1009,6 +1021,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
>  		break;
>  	}
>  
> +	vhost_dev_unlock_vqs(dev);
>  	mutex_unlock(&dev->mutex);
>  
>  	return ret;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
@ 2018-12-12 14:24   ` Michael S. Tsirkin
  2018-12-12 14:24   ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:24 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization

On Wed, Dec 12, 2018 at 06:08:18PM +0800, Jason Wang wrote:
> This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
> protect device IOTLB with vq mutex, which will lead e.g use after free
> for device IOTLB entries. And since we've switched to use
> mutex_trylock() in previous patch, it's safe to revert it without
> having deadlock.
> 
> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

I'd try to put this in 4.20 if we can
and it's needed for -stable I think.

Also looks like we should allow iotlb entries per vq
to improve locking. What do you think?

> ---
>  drivers/vhost/vhost.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 5915f240275a..55e5aa662ad5 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -295,11 +295,8 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
>  {
>  	int i;
>  
> -	for (i = 0; i < d->nvqs; ++i) {
> -		mutex_lock(&d->vqs[i]->mutex);
> +	for (i = 0; i < d->nvqs; ++i)
>  		__vhost_vq_meta_reset(d->vqs[i]);
> -		mutex_unlock(&d->vqs[i]->mutex);
> -	}
>  }
>  
>  static void vhost_vq_reset(struct vhost_dev *dev,
> @@ -895,6 +892,20 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
>  #define vhost_get_used(vq, x, ptr) \
>  	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
>  
> +static void vhost_dev_lock_vqs(struct vhost_dev *d)
> +{
> +	int i = 0;
> +	for (i = 0; i < d->nvqs; ++i)
> +		mutex_lock_nested(&d->vqs[i]->mutex, i);
> +}
> +
> +static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> +{
> +	int i = 0;
> +	for (i = 0; i < d->nvqs; ++i)
> +		mutex_unlock(&d->vqs[i]->mutex);
> +}
> +
>  static int vhost_new_umem_range(struct vhost_umem *umem,
>  				u64 start, u64 size, u64 end,
>  				u64 userspace_addr, int perm)
> @@ -976,6 +987,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
>  	int ret = 0;
>  
>  	mutex_lock(&dev->mutex);
> +	vhost_dev_lock_vqs(dev);
>  	switch (msg->type) {
>  	case VHOST_IOTLB_UPDATE:
>  		if (!dev->iotlb) {
> @@ -1009,6 +1021,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
>  		break;
>  	}
>  
> +	vhost_dev_unlock_vqs(dev);
>  	mutex_unlock(&dev->mutex);
>  
>  	return ret;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
@ 2018-12-12 14:32   ` Michael S. Tsirkin
  2018-12-13  2:39     ` Jason Wang
  2018-12-13  2:39     ` Jason Wang
  2018-12-12 14:32   ` Michael S. Tsirkin
  1 sibling, 2 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim

On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
> 
> To solve this issue, when logging with device IOTLB enabled, we will:
> 
> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>    get HVA, for writable descriptor, get HVA through iovec. For used
>    ring update, translate its GIOVA to HVA
> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>    through GPA. Pay attention this reverse mapping is not guaranteed
>    to be unique, so we should log each possible GPA in this case.
> 
> This fix the failure of scp to guest during migration. In -next, we
> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> 
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> Cc: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

It's a nasty bug for sure but it's been like this for a long
time so I'm inclined to say let's put it in 4.21,
and queue for stable.

So please split this out from this series.

Also, I'd like to see a feature bit that allows GPA in IOTLBs.

> ---
>  drivers/vhost/net.c   |  3 +-
>  drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
>  drivers/vhost/vhost.h |  3 +-
>  3 files changed, 69 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ad7a6f475a44..784df2b49628 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
>  		if (nvq->done_idx > VHOST_NET_BATCH)
>  			vhost_net_signal_used(nvq);
>  		if (unlikely(vq_log))
> -			vhost_log_write(vq, vq_log, log, vhost_len);
> +			vhost_log_write(vq, vq_log, log, vhost_len,
> +					vq->iov, in);
>  		total_len += vhost_len;
>  		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>  			vhost_poll_queue(&vq->poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 55e5aa662ad5..3660310604fd 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
>  	return r;
>  }
>  
> +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
> +{
> +	struct vhost_umem *umem = vq->umem;
> +	struct vhost_umem_node *u;
> +	u64 gpa;
> +	int r;
> +	bool hit = false;
> +
> +	list_for_each_entry(u, &umem->umem_list, link) {
> +		if (u->userspace_addr < hva &&
> +		    u->userspace_addr + u->size >=
> +		    hva + len) {
> +			gpa = u->start + hva - u->userspace_addr;
> +			r = log_write(vq->log_base, gpa, len);
> +			if (r < 0)
> +				return r;
> +			hit = true;
> +		}
> +	}
> +
> +	/* No reverse mapping, should be a bug */
> +	WARN_ON(!hit);

Maybe it should but userspace can trigger this easily I think.
We need to stop the device not warn in kernel log.

Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.


> +	return 0;
> +}
> +
> +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
> +{
> +	struct iovec iov[64];
> +	int i, ret;
> +
> +	if (!vq->iotlb) {
> +		log_write(vq->log_base, vq->log_addr + used_offset, len);
> +		return;
> +	}

This change seems questionable. used ring writes 
use their own machinery it does not go through iotlb.
Same should apply to log I think.

> +
> +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
> +			     len, iov, 64, VHOST_ACCESS_WO);
> +	WARN_ON(ret < 0);


Same thing here. translation failures can be triggered from guest.
warn on is not a good error handling strategy ...

> +
> +	for (i = 0; i < ret; i++) {
> +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
> +				    iov[i].iov_len);
> +		WARN_ON(ret);
> +	}
> +}
> +
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> -		    unsigned int log_num, u64 len)
> +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
>  {
>  	int i, r;
>  
> +	if (vq->iotlb) {
> +		for (i = 0; i < count; i++) {
> +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
> +					  iov[i].iov_len);
> +			if (r < 0)
> +				return r;
> +		}
> +		return 0;
> +	}
> +
>  	/* Make sure data written is seen before log. */
>  	smp_wmb();
>  	for (i = 0; i < log_num; ++i) {
> @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  		smp_wmb();
>  		/* Log used flag write. */
>  		used = &vq->used->flags;
> -		log_write(vq->log_base, vq->log_addr +
> -			  (used - (void __user *)vq->used),
> -			  sizeof vq->used->flags);
> +		log_used(vq, (used - (void __user *)vq->used),
> +			 sizeof vq->used->flags);
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  		smp_wmb();
>  		/* Log avail event write */
>  		used = vhost_avail_event(vq);
> -		log_write(vq->log_base, vq->log_addr +
> -			  (used - (void __user *)vq->used),
> -			  sizeof *vhost_avail_event(vq));
> +		log_used(vq, (used - (void __user *)vq->used),
> +			 sizeof *vhost_avail_event(vq));
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>  		/* Make sure data is seen before log. */
>  		smp_wmb();
>  		/* Log used ring entry write. */
> -		log_write(vq->log_base,
> -			  vq->log_addr +
> -			   ((void __user *)used - (void __user *)vq->used),
> -			  count * sizeof *used);
> +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
> +			 count * sizeof *used);
>  	}
>  	old = vq->last_used_idx;
>  	new = (vq->last_used_idx += count);
> @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>  		/* Make sure used idx is seen before log. */
>  		smp_wmb();
>  		/* Log used index update. */
> -		log_write(vq->log_base,
> -			  vq->log_addr + offsetof(struct vring_used, idx),
> -			  sizeof vq->used->idx);
> +		log_used(vq, offsetof(struct vring_used, idx),
> +			 sizeof vq->used->idx);
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 466ef7542291..1b675dad5e05 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
>  bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>  
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> -		    unsigned int log_num, u64 len);
> +		    unsigned int log_num, u64 len,
> +		    struct iovec *iov, int count);
>  int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
>  
>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
  2018-12-12 14:32   ` Michael S. Tsirkin
@ 2018-12-12 14:32   ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization

On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
> 
> To solve this issue, when logging with device IOTLB enabled, we will:
> 
> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>    get HVA, for writable descriptor, get HVA through iovec. For used
>    ring update, translate its GIOVA to HVA
> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>    through GPA. Pay attention this reverse mapping is not guaranteed
>    to be unique, so we should log each possible GPA in this case.
> 
> This fix the failure of scp to guest during migration. In -next, we
> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> 
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> Cc: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

It's a nasty bug for sure but it's been like this for a long
time so I'm inclined to say let's put it in 4.21,
and queue for stable.

So please split this out from this series.

Also, I'd like to see a feature bit that allows GPA in IOTLBs.

> ---
>  drivers/vhost/net.c   |  3 +-
>  drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
>  drivers/vhost/vhost.h |  3 +-
>  3 files changed, 69 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ad7a6f475a44..784df2b49628 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
>  		if (nvq->done_idx > VHOST_NET_BATCH)
>  			vhost_net_signal_used(nvq);
>  		if (unlikely(vq_log))
> -			vhost_log_write(vq, vq_log, log, vhost_len);
> +			vhost_log_write(vq, vq_log, log, vhost_len,
> +					vq->iov, in);
>  		total_len += vhost_len;
>  		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>  			vhost_poll_queue(&vq->poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 55e5aa662ad5..3660310604fd 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
>  	return r;
>  }
>  
> +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
> +{
> +	struct vhost_umem *umem = vq->umem;
> +	struct vhost_umem_node *u;
> +	u64 gpa;
> +	int r;
> +	bool hit = false;
> +
> +	list_for_each_entry(u, &umem->umem_list, link) {
> +		if (u->userspace_addr < hva &&
> +		    u->userspace_addr + u->size >=
> +		    hva + len) {
> +			gpa = u->start + hva - u->userspace_addr;
> +			r = log_write(vq->log_base, gpa, len);
> +			if (r < 0)
> +				return r;
> +			hit = true;
> +		}
> +	}
> +
> +	/* No reverse mapping, should be a bug */
> +	WARN_ON(!hit);

Maybe it should but userspace can trigger this easily I think.
We need to stop the device not warn in kernel log.

Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.


> +	return 0;
> +}
> +
> +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
> +{
> +	struct iovec iov[64];
> +	int i, ret;
> +
> +	if (!vq->iotlb) {
> +		log_write(vq->log_base, vq->log_addr + used_offset, len);
> +		return;
> +	}

This change seems questionable. used ring writes 
use their own machinery it does not go through iotlb.
Same should apply to log I think.

> +
> +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
> +			     len, iov, 64, VHOST_ACCESS_WO);
> +	WARN_ON(ret < 0);


Same thing here. translation failures can be triggered from guest.
warn on is not a good error handling strategy ...

> +
> +	for (i = 0; i < ret; i++) {
> +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
> +				    iov[i].iov_len);
> +		WARN_ON(ret);
> +	}
> +}
> +
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> -		    unsigned int log_num, u64 len)
> +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
>  {
>  	int i, r;
>  
> +	if (vq->iotlb) {
> +		for (i = 0; i < count; i++) {
> +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
> +					  iov[i].iov_len);
> +			if (r < 0)
> +				return r;
> +		}
> +		return 0;
> +	}
> +
>  	/* Make sure data written is seen before log. */
>  	smp_wmb();
>  	for (i = 0; i < log_num; ++i) {
> @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  		smp_wmb();
>  		/* Log used flag write. */
>  		used = &vq->used->flags;
> -		log_write(vq->log_base, vq->log_addr +
> -			  (used - (void __user *)vq->used),
> -			  sizeof vq->used->flags);
> +		log_used(vq, (used - (void __user *)vq->used),
> +			 sizeof vq->used->flags);
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  		smp_wmb();
>  		/* Log avail event write */
>  		used = vhost_avail_event(vq);
> -		log_write(vq->log_base, vq->log_addr +
> -			  (used - (void __user *)vq->used),
> -			  sizeof *vhost_avail_event(vq));
> +		log_used(vq, (used - (void __user *)vq->used),
> +			 sizeof *vhost_avail_event(vq));
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>  		/* Make sure data is seen before log. */
>  		smp_wmb();
>  		/* Log used ring entry write. */
> -		log_write(vq->log_base,
> -			  vq->log_addr +
> -			   ((void __user *)used - (void __user *)vq->used),
> -			  count * sizeof *used);
> +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
> +			 count * sizeof *used);
>  	}
>  	old = vq->last_used_idx;
>  	new = (vq->last_used_idx += count);
> @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>  		/* Make sure used idx is seen before log. */
>  		smp_wmb();
>  		/* Log used index update. */
> -		log_write(vq->log_base,
> -			  vq->log_addr + offsetof(struct vring_used, idx),
> -			  sizeof vq->used->idx);
> +		log_used(vq, offsetof(struct vring_used, idx),
> +			 sizeof vq->used->idx);
>  		if (vq->log_ctx)
>  			eventfd_signal(vq->log_ctx, 1);
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 466ef7542291..1b675dad5e05 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
>  bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>  
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> -		    unsigned int log_num, u64 len);
> +		    unsigned int log_num, u64 len,
> +		    struct iovec *iov, int count);
>  int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
>  
>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n()
  2018-12-12 10:08 ` Jason Wang
  2018-12-12 14:33   ` Michael S. Tsirkin
@ 2018-12-12 14:33   ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel

On Wed, Dec 12, 2018 at 06:08:16PM +0800, Jason Wang wrote:
> We miss a write barrier that guarantees used idx is updated and seen
> before log. This will let userspace sync and copy used ring before
> used idx is update. Fix this by adding a barrier before log_write().
> 
> Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support")
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

also for 4.20 and seems like a stable candidate.

> ---
>  drivers/vhost/vhost.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 6b98d8e3a5bf..5915f240275a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2220,6 +2220,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>  		return -EFAULT;
>  	}
>  	if (unlikely(vq->log_used)) {
> +		/* Make sure used idx is seen before log. */
> +		smp_wmb();
>  		/* Log used index update. */
>  		log_write(vq->log_base,
>  			  vq->log_addr + offsetof(struct vring_used, idx),
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n()
  2018-12-12 10:08 ` Jason Wang
@ 2018-12-12 14:33   ` Michael S. Tsirkin
  2018-12-12 14:33   ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-12 14:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization

On Wed, Dec 12, 2018 at 06:08:16PM +0800, Jason Wang wrote:
> We miss a write barrier that guarantees used idx is updated and seen
> before log. This will let userspace sync and copy used ring before
> used idx is update. Fix this by adding a barrier before log_write().
> 
> Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support")
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

also for 4.20 and seems like a stable candidate.

> ---
>  drivers/vhost/vhost.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 6b98d8e3a5bf..5915f240275a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2220,6 +2220,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>  		return -EFAULT;
>  	}
>  	if (unlikely(vq->log_used)) {
> +		/* Make sure used idx is seen before log. */
> +		smp_wmb();
>  		/* Log used index update. */
>  		log_write(vq->log_base,
>  			  vq->log_addr + offsetof(struct vring_used, idx),
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 0/4] Fix various issue of vhost
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (8 preceding siblings ...)
  2018-12-12 23:31 ` [PATCH net V2 0/4] Fix various issue of vhost David Miller
@ 2018-12-12 23:31 ` David Miller
  2018-12-13  2:42   ` Jason Wang
  2018-12-13  2:42   ` Jason Wang
  9 siblings, 2 replies; 46+ messages in thread
From: David Miller @ 2018-12-12 23:31 UTC (permalink / raw)
  To: jasowang; +Cc: mst, kvm, virtualization, netdev, linux-kernel

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 12 Dec 2018 18:08:15 +0800

> This series tries to fix various issues of vhost:
> 
> - Patch 1 adds a missing write barrier between used idx updating and
>   logging.
> - Patch 2-3 brings back the protection of device IOTLB through vq
>   mutex, this fixes possible use after free in device IOTLB entries.
> - Patch 4-7 fixes the diry page logging when device IOTLB is
>   enabled. We should done through GPA instead of GIOVA, this was done
>   through intorudce HVA->GPA reverse mapping and convert HVA to GPA
>   during logging dirty pages.
> 
> Please consider them for -stable.
> 
> Thanks
> 
> Changes from V1:
> - silent compiler warning for 32bit.
> - use mutex_trylock() on slowpath instead of mutex_lock() even on fast
>   path.

Hello Jason.

Look like Michael wants you to split out patch #4 and target
net-next with it.

So please do that and respin the first 3 patches here with Michael's
ACKs.

Thanks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 0/4] Fix various issue of vhost
  2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
                   ` (7 preceding siblings ...)
  2018-12-12 10:08 ` Jason Wang
@ 2018-12-12 23:31 ` David Miller
  2018-12-12 23:31 ` David Miller
  9 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2018-12-12 23:31 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 12 Dec 2018 18:08:15 +0800

> This series tries to fix various issues of vhost:
> 
> - Patch 1 adds a missing write barrier between used idx updating and
>   logging.
> - Patch 2-3 brings back the protection of device IOTLB through vq
>   mutex, this fixes possible use after free in device IOTLB entries.
> - Patch 4-7 fixes the diry page logging when device IOTLB is
>   enabled. We should done through GPA instead of GIOVA, this was done
>   through intorudce HVA->GPA reverse mapping and convert HVA to GPA
>   during logging dirty pages.
> 
> Please consider them for -stable.
> 
> Thanks
> 
> Changes from V1:
> - silent compiler warning for 32bit.
> - use mutex_trylock() on slowpath instead of mutex_lock() even on fast
>   path.

Hello Jason.

Look like Michael wants you to split out patch #4 and target
net-next with it.

So please do that and respin the first 3 patches here with Michael's
ACKs.

Thanks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-12 14:24   ` Michael S. Tsirkin
@ 2018-12-13  2:27     ` Jason Wang
  2018-12-13  2:27     ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-13  2:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, Tonghao Zhang


On 2018/12/12 下午10:24, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 06:08:18PM +0800, Jason Wang wrote:
>> This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
>> protect device IOTLB with vq mutex, which will lead e.g use after free
>> for device IOTLB entries. And since we've switched to use
>> mutex_trylock() in previous patch, it's safe to revert it without
>> having deadlock.
>>
>> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
>> Cc: Tonghao Zhang<xiangxia.m.yue@gmail.com>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>
> I'd try to put this in 4.20 if we can
> and it's needed for -stable I think.
>
> Also looks like we should allow iotlb entries per vq
> to improve locking. What do you think?
>

Yes, we can do it for -next.

Thanks


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one"
  2018-12-12 14:24   ` Michael S. Tsirkin
  2018-12-13  2:27     ` Jason Wang
@ 2018-12-13  2:27     ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-13  2:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization


On 2018/12/12 下午10:24, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 06:08:18PM +0800, Jason Wang wrote:
>> This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't
>> protect device IOTLB with vq mutex, which will lead e.g use after free
>> for device IOTLB entries. And since we've switched to use
>> mutex_trylock() in previous patch, it's safe to revert it without
>> having deadlock.
>>
>> Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
>> Cc: Tonghao Zhang<xiangxia.m.yue@gmail.com>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>
> I'd try to put this in 4.20 if we can
> and it's needed for -stable I think.
>
> Also looks like we should allow iotlb entries per vq
> to improve locking. What do you think?
>

Yes, we can do it for -next.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-12 14:32   ` Michael S. Tsirkin
@ 2018-12-13  2:39     ` Jason Wang
  2018-12-13 14:31       ` Michael S. Tsirkin
  2018-12-13 14:31       ` Michael S. Tsirkin
  2018-12-13  2:39     ` Jason Wang
  1 sibling, 2 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-13  2:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim


On 2018/12/12 下午10:32, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
>> Vhost dirty page logging API is designed to sync through GPA. But we
>> try to log GIOVA when device IOTLB is enabled. This is wrong and may
>> lead to missing data after migration.
>>
>> To solve this issue, when logging with device IOTLB enabled, we will:
>>
>> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>>     get HVA, for writable descriptor, get HVA through iovec. For used
>>     ring update, translate its GIOVA to HVA
>> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>>     through GPA. Pay attention this reverse mapping is not guaranteed
>>     to be unique, so we should log each possible GPA in this case.
>>
>> This fix the failure of scp to guest during migration. In -next, we
>> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
>>
>> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
>> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
>> Cc: Jintack Lim <jintack@cs.columbia.edu>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> It's a nasty bug for sure but it's been like this for a long
> time so I'm inclined to say let's put it in 4.21,
> and queue for stable.
>
> So please split this out from this series.


Ok.


>
> Also, I'd like to see a feature bit that allows GPA in IOTLBs.


Just to make sure I understand this. It looks to me we should:

- allow passing GIOVA->GPA through UAPI

- cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB 
for performance

Is this what you suggest?

Thanks


>
>> ---
>>   drivers/vhost/net.c   |  3 +-
>>   drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
>>   drivers/vhost/vhost.h |  3 +-
>>   3 files changed, 69 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index ad7a6f475a44..784df2b49628 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
>>   		if (nvq->done_idx > VHOST_NET_BATCH)
>>   			vhost_net_signal_used(nvq);
>>   		if (unlikely(vq_log))
>> -			vhost_log_write(vq, vq_log, log, vhost_len);
>> +			vhost_log_write(vq, vq_log, log, vhost_len,
>> +					vq->iov, in);
>>   		total_len += vhost_len;
>>   		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>>   			vhost_poll_queue(&vq->poll);
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 55e5aa662ad5..3660310604fd 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
>>   	return r;
>>   }
>>   
>> +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
>> +{
>> +	struct vhost_umem *umem = vq->umem;
>> +	struct vhost_umem_node *u;
>> +	u64 gpa;
>> +	int r;
>> +	bool hit = false;
>> +
>> +	list_for_each_entry(u, &umem->umem_list, link) {
>> +		if (u->userspace_addr < hva &&
>> +		    u->userspace_addr + u->size >=
>> +		    hva + len) {
>> +			gpa = u->start + hva - u->userspace_addr;
>> +			r = log_write(vq->log_base, gpa, len);
>> +			if (r < 0)
>> +				return r;
>> +			hit = true;
>> +		}
>> +	}
>> +
>> +	/* No reverse mapping, should be a bug */
>> +	WARN_ON(!hit);
> Maybe it should but userspace can trigger this easily I think.
> We need to stop the device not warn in kernel log.
>
> Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.
>

Ok.


>> +	return 0;
>> +}
>> +
>> +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
>> +{
>> +	struct iovec iov[64];
>> +	int i, ret;
>> +
>> +	if (!vq->iotlb) {
>> +		log_write(vq->log_base, vq->log_addr + used_offset, len);
>> +		return;
>> +	}
> This change seems questionable. used ring writes
> use their own machinery it does not go through iotlb.
> Same should apply to log I think.


The problem is used ring may not be physically contiguous with Device 
IOTLB enabled. So it should go through it.


>
>> +
>> +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
>> +			     len, iov, 64, VHOST_ACCESS_WO);
>> +	WARN_ON(ret < 0);
>
> Same thing here. translation failures can be triggered from guest.
> warn on is not a good error handling strategy ...


Ok. Let me fix it.


Thanks


>> +
>> +	for (i = 0; i < ret; i++) {
>> +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
>> +				    iov[i].iov_len);
>> +		WARN_ON(ret);
>> +	}
>> +}
>> +
>>   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>> -		    unsigned int log_num, u64 len)
>> +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
>>   {
>>   	int i, r;
>>   
>> +	if (vq->iotlb) {
>> +		for (i = 0; i < count; i++) {
>> +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
>> +					  iov[i].iov_len);
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		return 0;
>> +	}
>> +
>>   	/* Make sure data written is seen before log. */
>>   	smp_wmb();
>>   	for (i = 0; i < log_num; ++i) {
>> @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>>   		smp_wmb();
>>   		/* Log used flag write. */
>>   		used = &vq->used->flags;
>> -		log_write(vq->log_base, vq->log_addr +
>> -			  (used - (void __user *)vq->used),
>> -			  sizeof vq->used->flags);
>> +		log_used(vq, (used - (void __user *)vq->used),
>> +			 sizeof vq->used->flags);
>>   		if (vq->log_ctx)
>>   			eventfd_signal(vq->log_ctx, 1);
>>   	}
>> @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>>   		smp_wmb();
>>   		/* Log avail event write */
>>   		used = vhost_avail_event(vq);
>> -		log_write(vq->log_base, vq->log_addr +
>> -			  (used - (void __user *)vq->used),
>> -			  sizeof *vhost_avail_event(vq));
>> +		log_used(vq, (used - (void __user *)vq->used),
>> +			 sizeof *vhost_avail_event(vq));
>>   		if (vq->log_ctx)
>>   			eventfd_signal(vq->log_ctx, 1);
>>   	}
>> @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>>   		/* Make sure data is seen before log. */
>>   		smp_wmb();
>>   		/* Log used ring entry write. */
>> -		log_write(vq->log_base,
>> -			  vq->log_addr +
>> -			   ((void __user *)used - (void __user *)vq->used),
>> -			  count * sizeof *used);
>> +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
>> +			 count * sizeof *used);
>>   	}
>>   	old = vq->last_used_idx;
>>   	new = (vq->last_used_idx += count);
>> @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>>   		/* Make sure used idx is seen before log. */
>>   		smp_wmb();
>>   		/* Log used index update. */
>> -		log_write(vq->log_base,
>> -			  vq->log_addr + offsetof(struct vring_used, idx),
>> -			  sizeof vq->used->idx);
>> +		log_used(vq, offsetof(struct vring_used, idx),
>> +			 sizeof vq->used->idx);
>>   		if (vq->log_ctx)
>>   			eventfd_signal(vq->log_ctx, 1);
>>   	}
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index 466ef7542291..1b675dad5e05 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
>>   bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>>   
>>   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>> -		    unsigned int log_num, u64 len);
>> +		    unsigned int log_num, u64 len,
>> +		    struct iovec *iov, int count);
>>   int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
>>   
>>   struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
>> -- 
>> 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-12 14:32   ` Michael S. Tsirkin
  2018-12-13  2:39     ` Jason Wang
@ 2018-12-13  2:39     ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-13  2:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization


On 2018/12/12 下午10:32, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
>> Vhost dirty page logging API is designed to sync through GPA. But we
>> try to log GIOVA when device IOTLB is enabled. This is wrong and may
>> lead to missing data after migration.
>>
>> To solve this issue, when logging with device IOTLB enabled, we will:
>>
>> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>>     get HVA, for writable descriptor, get HVA through iovec. For used
>>     ring update, translate its GIOVA to HVA
>> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>>     through GPA. Pay attention this reverse mapping is not guaranteed
>>     to be unique, so we should log each possible GPA in this case.
>>
>> This fix the failure of scp to guest during migration. In -next, we
>> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
>>
>> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
>> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
>> Cc: Jintack Lim <jintack@cs.columbia.edu>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> It's a nasty bug for sure but it's been like this for a long
> time so I'm inclined to say let's put it in 4.21,
> and queue for stable.
>
> So please split this out from this series.


Ok.


>
> Also, I'd like to see a feature bit that allows GPA in IOTLBs.


Just to make sure I understand this. It looks to me we should:

- allow passing GIOVA->GPA through UAPI

- cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB 
for performance

Is this what you suggest?

Thanks


>
>> ---
>>   drivers/vhost/net.c   |  3 +-
>>   drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
>>   drivers/vhost/vhost.h |  3 +-
>>   3 files changed, 69 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index ad7a6f475a44..784df2b49628 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
>>   		if (nvq->done_idx > VHOST_NET_BATCH)
>>   			vhost_net_signal_used(nvq);
>>   		if (unlikely(vq_log))
>> -			vhost_log_write(vq, vq_log, log, vhost_len);
>> +			vhost_log_write(vq, vq_log, log, vhost_len,
>> +					vq->iov, in);
>>   		total_len += vhost_len;
>>   		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>>   			vhost_poll_queue(&vq->poll);
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 55e5aa662ad5..3660310604fd 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
>>   	return r;
>>   }
>>   
>> +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
>> +{
>> +	struct vhost_umem *umem = vq->umem;
>> +	struct vhost_umem_node *u;
>> +	u64 gpa;
>> +	int r;
>> +	bool hit = false;
>> +
>> +	list_for_each_entry(u, &umem->umem_list, link) {
>> +		if (u->userspace_addr < hva &&
>> +		    u->userspace_addr + u->size >=
>> +		    hva + len) {
>> +			gpa = u->start + hva - u->userspace_addr;
>> +			r = log_write(vq->log_base, gpa, len);
>> +			if (r < 0)
>> +				return r;
>> +			hit = true;
>> +		}
>> +	}
>> +
>> +	/* No reverse mapping, should be a bug */
>> +	WARN_ON(!hit);
> Maybe it should but userspace can trigger this easily I think.
> We need to stop the device not warn in kernel log.
>
> Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.
>

Ok.


>> +	return 0;
>> +}
>> +
>> +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
>> +{
>> +	struct iovec iov[64];
>> +	int i, ret;
>> +
>> +	if (!vq->iotlb) {
>> +		log_write(vq->log_base, vq->log_addr + used_offset, len);
>> +		return;
>> +	}
> This change seems questionable. used ring writes
> use their own machinery it does not go through iotlb.
> Same should apply to log I think.


The problem is used ring may not be physically contiguous with Device 
IOTLB enabled. So it should go through it.


>
>> +
>> +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
>> +			     len, iov, 64, VHOST_ACCESS_WO);
>> +	WARN_ON(ret < 0);
>
> Same thing here. translation failures can be triggered from guest.
> warn on is not a good error handling strategy ...


Ok. Let me fix it.


Thanks


>> +
>> +	for (i = 0; i < ret; i++) {
>> +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
>> +				    iov[i].iov_len);
>> +		WARN_ON(ret);
>> +	}
>> +}
>> +
>>   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>> -		    unsigned int log_num, u64 len)
>> +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
>>   {
>>   	int i, r;
>>   
>> +	if (vq->iotlb) {
>> +		for (i = 0; i < count; i++) {
>> +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
>> +					  iov[i].iov_len);
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		return 0;
>> +	}
>> +
>>   	/* Make sure data written is seen before log. */
>>   	smp_wmb();
>>   	for (i = 0; i < log_num; ++i) {
>> @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>>   		smp_wmb();
>>   		/* Log used flag write. */
>>   		used = &vq->used->flags;
>> -		log_write(vq->log_base, vq->log_addr +
>> -			  (used - (void __user *)vq->used),
>> -			  sizeof vq->used->flags);
>> +		log_used(vq, (used - (void __user *)vq->used),
>> +			 sizeof vq->used->flags);
>>   		if (vq->log_ctx)
>>   			eventfd_signal(vq->log_ctx, 1);
>>   	}
>> @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>>   		smp_wmb();
>>   		/* Log avail event write */
>>   		used = vhost_avail_event(vq);
>> -		log_write(vq->log_base, vq->log_addr +
>> -			  (used - (void __user *)vq->used),
>> -			  sizeof *vhost_avail_event(vq));
>> +		log_used(vq, (used - (void __user *)vq->used),
>> +			 sizeof *vhost_avail_event(vq));
>>   		if (vq->log_ctx)
>>   			eventfd_signal(vq->log_ctx, 1);
>>   	}
>> @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>>   		/* Make sure data is seen before log. */
>>   		smp_wmb();
>>   		/* Log used ring entry write. */
>> -		log_write(vq->log_base,
>> -			  vq->log_addr +
>> -			   ((void __user *)used - (void __user *)vq->used),
>> -			  count * sizeof *used);
>> +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
>> +			 count * sizeof *used);
>>   	}
>>   	old = vq->last_used_idx;
>>   	new = (vq->last_used_idx += count);
>> @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>>   		/* Make sure used idx is seen before log. */
>>   		smp_wmb();
>>   		/* Log used index update. */
>> -		log_write(vq->log_base,
>> -			  vq->log_addr + offsetof(struct vring_used, idx),
>> -			  sizeof vq->used->idx);
>> +		log_used(vq, offsetof(struct vring_used, idx),
>> +			 sizeof vq->used->idx);
>>   		if (vq->log_ctx)
>>   			eventfd_signal(vq->log_ctx, 1);
>>   	}
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index 466ef7542291..1b675dad5e05 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
>>   bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>>   
>>   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>> -		    unsigned int log_num, u64 len);
>> +		    unsigned int log_num, u64 len,
>> +		    struct iovec *iov, int count);
>>   int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
>>   
>>   struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
>> -- 
>> 2.17.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 0/4] Fix various issue of vhost
  2018-12-12 23:31 ` David Miller
  2018-12-13  2:42   ` Jason Wang
@ 2018-12-13  2:42   ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-13  2:42 UTC (permalink / raw)
  To: David Miller; +Cc: mst, kvm, virtualization, netdev, linux-kernel


On 2018/12/13 上午7:31, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 12 Dec 2018 18:08:15 +0800
>
>> This series tries to fix various issues of vhost:
>>
>> - Patch 1 adds a missing write barrier between used idx updating and
>>    logging.
>> - Patch 2-3 brings back the protection of device IOTLB through vq
>>    mutex, this fixes possible use after free in device IOTLB entries.
>> - Patch 4-7 fixes the diry page logging when device IOTLB is
>>    enabled. We should done through GPA instead of GIOVA, this was done
>>    through intorudce HVA->GPA reverse mapping and convert HVA to GPA
>>    during logging dirty pages.
>>
>> Please consider them for -stable.
>>
>> Thanks
>>
>> Changes from V1:
>> - silent compiler warning for 32bit.
>> - use mutex_trylock() on slowpath instead of mutex_lock() even on fast
>>    path.
> Hello Jason.
>
> Look like Michael wants you to split out patch #4 and target
> net-next with it.
>
> So please do that and respin the first 3 patches here with Michael's
> ACKs.
>
> Thanks.


Yes, will send V3.

Thanks


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 0/4] Fix various issue of vhost
  2018-12-12 23:31 ` David Miller
@ 2018-12-13  2:42   ` Jason Wang
  2018-12-13  2:42   ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-13  2:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, virtualization, linux-kernel, kvm, mst


On 2018/12/13 上午7:31, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 12 Dec 2018 18:08:15 +0800
>
>> This series tries to fix various issues of vhost:
>>
>> - Patch 1 adds a missing write barrier between used idx updating and
>>    logging.
>> - Patch 2-3 brings back the protection of device IOTLB through vq
>>    mutex, this fixes possible use after free in device IOTLB entries.
>> - Patch 4-7 fixes the diry page logging when device IOTLB is
>>    enabled. We should done through GPA instead of GIOVA, this was done
>>    through intorudce HVA->GPA reverse mapping and convert HVA to GPA
>>    during logging dirty pages.
>>
>> Please consider them for -stable.
>>
>> Thanks
>>
>> Changes from V1:
>> - silent compiler warning for 32bit.
>> - use mutex_trylock() on slowpath instead of mutex_lock() even on fast
>>    path.
> Hello Jason.
>
> Look like Michael wants you to split out patch #4 and target
> net-next with it.
>
> So please do that and respin the first 3 patches here with Michael's
> ACKs.
>
> Thanks.


Yes, will send V3.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-13  2:39     ` Jason Wang
  2018-12-13 14:31       ` Michael S. Tsirkin
@ 2018-12-13 14:31       ` Michael S. Tsirkin
  2018-12-14  2:43         ` Jason Wang
  2018-12-14  2:43         ` Jason Wang
  1 sibling, 2 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-13 14:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim

On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote:
> 
> On 2018/12/12 下午10:32, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> > > Vhost dirty page logging API is designed to sync through GPA. But we
> > > try to log GIOVA when device IOTLB is enabled. This is wrong and may
> > > lead to missing data after migration.
> > > 
> > > To solve this issue, when logging with device IOTLB enabled, we will:
> > > 
> > > 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
> > >     get HVA, for writable descriptor, get HVA through iovec. For used
> > >     ring update, translate its GIOVA to HVA
> > > 2) traverse the GPA->HVA mapping to get the possible GPA and log
> > >     through GPA. Pay attention this reverse mapping is not guaranteed
> > >     to be unique, so we should log each possible GPA in this case.
> > > 
> > > This fix the failure of scp to guest during migration. In -next, we
> > > will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> > > 
> > > Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> > > Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> > > Cc: Jintack Lim <jintack@cs.columbia.edu>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > It's a nasty bug for sure but it's been like this for a long
> > time so I'm inclined to say let's put it in 4.21,
> > and queue for stable.
> > 
> > So please split this out from this series.
> 
> 
> Ok.
> 
> 
> > 
> > Also, I'd like to see a feature bit that allows GPA in IOTLBs.
> 
> 
> Just to make sure I understand this. It looks to me we should:
> 
> - allow passing GIOVA->GPA through UAPI
> 
> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> performance
> 
> Is this what you suggest?
> 
> Thanks

Not really. We already have GPA->HVA, so I suggested a flag to pass
GIOVA->GPA in the IOTLB.

This has advantages for security since a single table needs
then to be validated to ensure guest does not corrupt
QEMU memory.


> 
> > 
> > > ---
> > >   drivers/vhost/net.c   |  3 +-
> > >   drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
> > >   drivers/vhost/vhost.h |  3 +-
> > >   3 files changed, 69 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index ad7a6f475a44..784df2b49628 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
> > >   		if (nvq->done_idx > VHOST_NET_BATCH)
> > >   			vhost_net_signal_used(nvq);
> > >   		if (unlikely(vq_log))
> > > -			vhost_log_write(vq, vq_log, log, vhost_len);
> > > +			vhost_log_write(vq, vq_log, log, vhost_len,
> > > +					vq->iov, in);
> > >   		total_len += vhost_len;
> > >   		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
> > >   			vhost_poll_queue(&vq->poll);
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 55e5aa662ad5..3660310604fd 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
> > >   	return r;
> > >   }
> > > +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
> > > +{
> > > +	struct vhost_umem *umem = vq->umem;
> > > +	struct vhost_umem_node *u;
> > > +	u64 gpa;
> > > +	int r;
> > > +	bool hit = false;
> > > +
> > > +	list_for_each_entry(u, &umem->umem_list, link) {
> > > +		if (u->userspace_addr < hva &&
> > > +		    u->userspace_addr + u->size >=
> > > +		    hva + len) {
> > > +			gpa = u->start + hva - u->userspace_addr;
> > > +			r = log_write(vq->log_base, gpa, len);
> > > +			if (r < 0)
> > > +				return r;
> > > +			hit = true;
> > > +		}
> > > +	}
> > > +
> > > +	/* No reverse mapping, should be a bug */
> > > +	WARN_ON(!hit);
> > Maybe it should but userspace can trigger this easily I think.
> > We need to stop the device not warn in kernel log.
> > 
> > Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.
> > 
> 
> Ok.
> 
> 
> > > +	return 0;
> > > +}
> > > +
> > > +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
> > > +{
> > > +	struct iovec iov[64];
> > > +	int i, ret;
> > > +
> > > +	if (!vq->iotlb) {
> > > +		log_write(vq->log_base, vq->log_addr + used_offset, len);
> > > +		return;
> > > +	}
> > This change seems questionable. used ring writes
> > use their own machinery it does not go through iotlb.
> > Same should apply to log I think.
> 
> 
> The problem is used ring may not be physically contiguous with Device IOTLB
> enabled. So it should go through it.
> 
> 
> > 
> > > +
> > > +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
> > > +			     len, iov, 64, VHOST_ACCESS_WO);
> > > +	WARN_ON(ret < 0);
> > 
> > Same thing here. translation failures can be triggered from guest.
> > warn on is not a good error handling strategy ...
> 
> 
> Ok. Let me fix it.
> 
> 
> Thanks
> 
> 
> > > +
> > > +	for (i = 0; i < ret; i++) {
> > > +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
> > > +				    iov[i].iov_len);
> > > +		WARN_ON(ret);
> > > +	}
> > > +}
> > > +
> > >   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> > > -		    unsigned int log_num, u64 len)
> > > +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
> > >   {
> > >   	int i, r;
> > > +	if (vq->iotlb) {
> > > +		for (i = 0; i < count; i++) {
> > > +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
> > > +					  iov[i].iov_len);
> > > +			if (r < 0)
> > > +				return r;
> > > +		}
> > > +		return 0;
> > > +	}
> > > +
> > >   	/* Make sure data written is seen before log. */
> > >   	smp_wmb();
> > >   	for (i = 0; i < log_num; ++i) {
> > > @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
> > >   		smp_wmb();
> > >   		/* Log used flag write. */
> > >   		used = &vq->used->flags;
> > > -		log_write(vq->log_base, vq->log_addr +
> > > -			  (used - (void __user *)vq->used),
> > > -			  sizeof vq->used->flags);
> > > +		log_used(vq, (used - (void __user *)vq->used),
> > > +			 sizeof vq->used->flags);
> > >   		if (vq->log_ctx)
> > >   			eventfd_signal(vq->log_ctx, 1);
> > >   	}
> > > @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
> > >   		smp_wmb();
> > >   		/* Log avail event write */
> > >   		used = vhost_avail_event(vq);
> > > -		log_write(vq->log_base, vq->log_addr +
> > > -			  (used - (void __user *)vq->used),
> > > -			  sizeof *vhost_avail_event(vq));
> > > +		log_used(vq, (used - (void __user *)vq->used),
> > > +			 sizeof *vhost_avail_event(vq));
> > >   		if (vq->log_ctx)
> > >   			eventfd_signal(vq->log_ctx, 1);
> > >   	}
> > > @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
> > >   		/* Make sure data is seen before log. */
> > >   		smp_wmb();
> > >   		/* Log used ring entry write. */
> > > -		log_write(vq->log_base,
> > > -			  vq->log_addr +
> > > -			   ((void __user *)used - (void __user *)vq->used),
> > > -			  count * sizeof *used);
> > > +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
> > > +			 count * sizeof *used);
> > >   	}
> > >   	old = vq->last_used_idx;
> > >   	new = (vq->last_used_idx += count);
> > > @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
> > >   		/* Make sure used idx is seen before log. */
> > >   		smp_wmb();
> > >   		/* Log used index update. */
> > > -		log_write(vq->log_base,
> > > -			  vq->log_addr + offsetof(struct vring_used, idx),
> > > -			  sizeof vq->used->idx);
> > > +		log_used(vq, offsetof(struct vring_used, idx),
> > > +			 sizeof vq->used->idx);
> > >   		if (vq->log_ctx)
> > >   			eventfd_signal(vq->log_ctx, 1);
> > >   	}
> > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > index 466ef7542291..1b675dad5e05 100644
> > > --- a/drivers/vhost/vhost.h
> > > +++ b/drivers/vhost/vhost.h
> > > @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
> > >   bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
> > >   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> > > -		    unsigned int log_num, u64 len);
> > > +		    unsigned int log_num, u64 len,
> > > +		    struct iovec *iov, int count);
> > >   int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
> > >   struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
> > > -- 
> > > 2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-13  2:39     ` Jason Wang
@ 2018-12-13 14:31       ` Michael S. Tsirkin
  2018-12-13 14:31       ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-13 14:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization

On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote:
> 
> On 2018/12/12 下午10:32, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> > > Vhost dirty page logging API is designed to sync through GPA. But we
> > > try to log GIOVA when device IOTLB is enabled. This is wrong and may
> > > lead to missing data after migration.
> > > 
> > > To solve this issue, when logging with device IOTLB enabled, we will:
> > > 
> > > 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
> > >     get HVA, for writable descriptor, get HVA through iovec. For used
> > >     ring update, translate its GIOVA to HVA
> > > 2) traverse the GPA->HVA mapping to get the possible GPA and log
> > >     through GPA. Pay attention this reverse mapping is not guaranteed
> > >     to be unique, so we should log each possible GPA in this case.
> > > 
> > > This fix the failure of scp to guest during migration. In -next, we
> > > will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> > > 
> > > Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> > > Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> > > Cc: Jintack Lim <jintack@cs.columbia.edu>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > It's a nasty bug for sure but it's been like this for a long
> > time so I'm inclined to say let's put it in 4.21,
> > and queue for stable.
> > 
> > So please split this out from this series.
> 
> 
> Ok.
> 
> 
> > 
> > Also, I'd like to see a feature bit that allows GPA in IOTLBs.
> 
> 
> Just to make sure I understand this. It looks to me we should:
> 
> - allow passing GIOVA->GPA through UAPI
> 
> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> performance
> 
> Is this what you suggest?
> 
> Thanks

Not really. We already have GPA->HVA, so I suggested a flag to pass
GIOVA->GPA in the IOTLB.

This has advantages for security since a single table needs
then to be validated to ensure guest does not corrupt
QEMU memory.


> 
> > 
> > > ---
> > >   drivers/vhost/net.c   |  3 +-
> > >   drivers/vhost/vhost.c | 79 +++++++++++++++++++++++++++++++++++--------
> > >   drivers/vhost/vhost.h |  3 +-
> > >   3 files changed, 69 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index ad7a6f475a44..784df2b49628 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -1192,7 +1192,8 @@ static void handle_rx(struct vhost_net *net)
> > >   		if (nvq->done_idx > VHOST_NET_BATCH)
> > >   			vhost_net_signal_used(nvq);
> > >   		if (unlikely(vq_log))
> > > -			vhost_log_write(vq, vq_log, log, vhost_len);
> > > +			vhost_log_write(vq, vq_log, log, vhost_len,
> > > +					vq->iov, in);
> > >   		total_len += vhost_len;
> > >   		if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
> > >   			vhost_poll_queue(&vq->poll);
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 55e5aa662ad5..3660310604fd 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -1733,11 +1733,67 @@ static int log_write(void __user *log_base,
> > >   	return r;
> > >   }
> > > +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
> > > +{
> > > +	struct vhost_umem *umem = vq->umem;
> > > +	struct vhost_umem_node *u;
> > > +	u64 gpa;
> > > +	int r;
> > > +	bool hit = false;
> > > +
> > > +	list_for_each_entry(u, &umem->umem_list, link) {
> > > +		if (u->userspace_addr < hva &&
> > > +		    u->userspace_addr + u->size >=
> > > +		    hva + len) {
> > > +			gpa = u->start + hva - u->userspace_addr;
> > > +			r = log_write(vq->log_base, gpa, len);
> > > +			if (r < 0)
> > > +				return r;
> > > +			hit = true;
> > > +		}
> > > +	}
> > > +
> > > +	/* No reverse mapping, should be a bug */
> > > +	WARN_ON(!hit);
> > Maybe it should but userspace can trigger this easily I think.
> > We need to stop the device not warn in kernel log.
> > 
> > Also there's an error fd: VHOST_SET_VRING_ERR, need to wake it up.
> > 
> 
> Ok.
> 
> 
> > > +	return 0;
> > > +}
> > > +
> > > +static void log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
> > > +{
> > > +	struct iovec iov[64];
> > > +	int i, ret;
> > > +
> > > +	if (!vq->iotlb) {
> > > +		log_write(vq->log_base, vq->log_addr + used_offset, len);
> > > +		return;
> > > +	}
> > This change seems questionable. used ring writes
> > use their own machinery it does not go through iotlb.
> > Same should apply to log I think.
> 
> 
> The problem is used ring may not be physically contiguous with Device IOTLB
> enabled. So it should go through it.
> 
> 
> > 
> > > +
> > > +	ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
> > > +			     len, iov, 64, VHOST_ACCESS_WO);
> > > +	WARN_ON(ret < 0);
> > 
> > Same thing here. translation failures can be triggered from guest.
> > warn on is not a good error handling strategy ...
> 
> 
> Ok. Let me fix it.
> 
> 
> Thanks
> 
> 
> > > +
> > > +	for (i = 0; i < ret; i++) {
> > > +		ret = log_write_hva(vq,	(u64)(uintptr_t)iov[i].iov_base,
> > > +				    iov[i].iov_len);
> > > +		WARN_ON(ret);
> > > +	}
> > > +}
> > > +
> > >   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> > > -		    unsigned int log_num, u64 len)
> > > +		    unsigned int log_num, u64 len, struct iovec *iov, int count)
> > >   {
> > >   	int i, r;
> > > +	if (vq->iotlb) {
> > > +		for (i = 0; i < count; i++) {
> > > +			r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
> > > +					  iov[i].iov_len);
> > > +			if (r < 0)
> > > +				return r;
> > > +		}
> > > +		return 0;
> > > +	}
> > > +
> > >   	/* Make sure data written is seen before log. */
> > >   	smp_wmb();
> > >   	for (i = 0; i < log_num; ++i) {
> > > @@ -1769,9 +1825,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
> > >   		smp_wmb();
> > >   		/* Log used flag write. */
> > >   		used = &vq->used->flags;
> > > -		log_write(vq->log_base, vq->log_addr +
> > > -			  (used - (void __user *)vq->used),
> > > -			  sizeof vq->used->flags);
> > > +		log_used(vq, (used - (void __user *)vq->used),
> > > +			 sizeof vq->used->flags);
> > >   		if (vq->log_ctx)
> > >   			eventfd_signal(vq->log_ctx, 1);
> > >   	}
> > > @@ -1789,9 +1844,8 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
> > >   		smp_wmb();
> > >   		/* Log avail event write */
> > >   		used = vhost_avail_event(vq);
> > > -		log_write(vq->log_base, vq->log_addr +
> > > -			  (used - (void __user *)vq->used),
> > > -			  sizeof *vhost_avail_event(vq));
> > > +		log_used(vq, (used - (void __user *)vq->used),
> > > +			 sizeof *vhost_avail_event(vq));
> > >   		if (vq->log_ctx)
> > >   			eventfd_signal(vq->log_ctx, 1);
> > >   	}
> > > @@ -2191,10 +2245,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
> > >   		/* Make sure data is seen before log. */
> > >   		smp_wmb();
> > >   		/* Log used ring entry write. */
> > > -		log_write(vq->log_base,
> > > -			  vq->log_addr +
> > > -			   ((void __user *)used - (void __user *)vq->used),
> > > -			  count * sizeof *used);
> > > +		log_used(vq, ((void __user *)used - (void __user *)vq->used),
> > > +			 count * sizeof *used);
> > >   	}
> > >   	old = vq->last_used_idx;
> > >   	new = (vq->last_used_idx += count);
> > > @@ -2236,9 +2288,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
> > >   		/* Make sure used idx is seen before log. */
> > >   		smp_wmb();
> > >   		/* Log used index update. */
> > > -		log_write(vq->log_base,
> > > -			  vq->log_addr + offsetof(struct vring_used, idx),
> > > -			  sizeof vq->used->idx);
> > > +		log_used(vq, offsetof(struct vring_used, idx),
> > > +			 sizeof vq->used->idx);
> > >   		if (vq->log_ctx)
> > >   			eventfd_signal(vq->log_ctx, 1);
> > >   	}
> > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > index 466ef7542291..1b675dad5e05 100644
> > > --- a/drivers/vhost/vhost.h
> > > +++ b/drivers/vhost/vhost.h
> > > @@ -205,7 +205,8 @@ bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
> > >   bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
> > >   int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> > > -		    unsigned int log_num, u64 len);
> > > +		    unsigned int log_num, u64 len,
> > > +		    struct iovec *iov, int count);
> > >   int vq_iotlb_prefetch(struct vhost_virtqueue *vq);
> > >   struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type);
> > > -- 
> > > 2.17.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-13 14:31       ` Michael S. Tsirkin
@ 2018-12-14  2:43         ` Jason Wang
  2018-12-14 13:20           ` Michael S. Tsirkin
  2018-12-14 13:20           ` Michael S. Tsirkin
  2018-12-14  2:43         ` Jason Wang
  1 sibling, 2 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-14  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim


On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>> Just to make sure I understand this. It looks to me we should:
>>
>> - allow passing GIOVA->GPA through UAPI
>>
>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>> performance
>>
>> Is this what you suggest?
>>
>> Thanks
> Not really. We already have GPA->HVA, so I suggested a flag to pass
> GIOVA->GPA in the IOTLB.
>
> This has advantages for security since a single table needs
> then to be validated to ensure guest does not corrupt
> QEMU memory.
>

I wonder how much we can gain through this. Currently, qemu IOMMU gives 
GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then 
pass GIOVA->HVA to vhost. It looks no difference to me.

Thanks


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-13 14:31       ` Michael S. Tsirkin
  2018-12-14  2:43         ` Jason Wang
@ 2018-12-14  2:43         ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-14  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization


On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>> Just to make sure I understand this. It looks to me we should:
>>
>> - allow passing GIOVA->GPA through UAPI
>>
>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>> performance
>>
>> Is this what you suggest?
>>
>> Thanks
> Not really. We already have GPA->HVA, so I suggested a flag to pass
> GIOVA->GPA in the IOTLB.
>
> This has advantages for security since a single table needs
> then to be validated to ensure guest does not corrupt
> QEMU memory.
>

I wonder how much we can gain through this. Currently, qemu IOMMU gives 
GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then 
pass GIOVA->HVA to vhost. It looks no difference to me.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-14  2:43         ` Jason Wang
  2018-12-14 13:20           ` Michael S. Tsirkin
@ 2018-12-14 13:20           ` Michael S. Tsirkin
  2018-12-24  3:43               ` Jason Wang
  1 sibling, 1 reply; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14 13:20 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim

On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> 
> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > Just to make sure I understand this. It looks to me we should:
> > > 
> > > - allow passing GIOVA->GPA through UAPI
> > > 
> > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > performance
> > > 
> > > Is this what you suggest?
> > > 
> > > Thanks
> > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > GIOVA->GPA in the IOTLB.
> > 
> > This has advantages for security since a single table needs
> > then to be validated to ensure guest does not corrupt
> > QEMU memory.
> > 
> 
> I wonder how much we can gain through this. Currently, qemu IOMMU gives
> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> GIOVA->HVA to vhost. It looks no difference to me.
> 
> Thanks

The difference is in security not in performance.  Getting a bad HVA
corrupts QEMU memory and it might be guest controlled. Very risky.  If
translations to HVA are done in a single place through a single table
it's safer as there's a single risky place.

-- 
MST

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-14  2:43         ` Jason Wang
@ 2018-12-14 13:20           ` Michael S. Tsirkin
  2018-12-14 13:20           ` Michael S. Tsirkin
  1 sibling, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-14 13:20 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization

On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> 
> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > Just to make sure I understand this. It looks to me we should:
> > > 
> > > - allow passing GIOVA->GPA through UAPI
> > > 
> > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > performance
> > > 
> > > Is this what you suggest?
> > > 
> > > Thanks
> > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > GIOVA->GPA in the IOTLB.
> > 
> > This has advantages for security since a single table needs
> > then to be validated to ensure guest does not corrupt
> > QEMU memory.
> > 
> 
> I wonder how much we can gain through this. Currently, qemu IOMMU gives
> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> GIOVA->HVA to vhost. It looks no difference to me.
> 
> Thanks

The difference is in security not in performance.  Getting a bad HVA
corrupts QEMU memory and it might be guest controlled. Very risky.  If
translations to HVA are done in a single place through a single table
it's safer as there's a single risky place.

-- 
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-14 13:20           ` Michael S. Tsirkin
@ 2018-12-24  3:43               ` Jason Wang
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-24  3:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim


On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>> Just to make sure I understand this. It looks to me we should:
>>>>
>>>> - allow passing GIOVA->GPA through UAPI
>>>>
>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>> performance
>>>>
>>>> Is this what you suggest?
>>>>
>>>> Thanks
>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>> GIOVA->GPA in the IOTLB.
>>>
>>> This has advantages for security since a single table needs
>>> then to be validated to ensure guest does not corrupt
>>> QEMU memory.
>>>
>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>> GIOVA->HVA to vhost. It looks no difference to me.
>>
>> Thanks
> The difference is in security not in performance.  Getting a bad HVA
> corrupts QEMU memory and it might be guest controlled. Very risky.


How can this be controlled by guest? HVA was generated from qemu ram 
blocks which is totally under the control of qemu memory core instead of 
guest.


Thanks


>   If
> translations to HVA are done in a single place through a single table
> it's safer as there's a single risky place.
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
@ 2018-12-24  3:43               ` Jason Wang
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-24  3:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization


On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>> Just to make sure I understand this. It looks to me we should:
>>>>
>>>> - allow passing GIOVA->GPA through UAPI
>>>>
>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>> performance
>>>>
>>>> Is this what you suggest?
>>>>
>>>> Thanks
>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>> GIOVA->GPA in the IOTLB.
>>>
>>> This has advantages for security since a single table needs
>>> then to be validated to ensure guest does not corrupt
>>> QEMU memory.
>>>
>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>> GIOVA->HVA to vhost. It looks no difference to me.
>>
>> Thanks
> The difference is in security not in performance.  Getting a bad HVA
> corrupts QEMU memory and it might be guest controlled. Very risky.


How can this be controlled by guest? HVA was generated from qemu ram 
blocks which is totally under the control of qemu memory core instead of 
guest.


Thanks


>   If
> translations to HVA are done in a single place through a single table
> it's safer as there's a single risky place.
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-24  3:43               ` Jason Wang
  (?)
@ 2018-12-24 17:41               ` Michael S. Tsirkin
  2018-12-25  9:43                 ` Jason Wang
  2018-12-25  9:43                 ` Jason Wang
  -1 siblings, 2 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-24 17:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim

On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> 
> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > Just to make sure I understand this. It looks to me we should:
> > > > > 
> > > > > - allow passing GIOVA->GPA through UAPI
> > > > > 
> > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > performance
> > > > > 
> > > > > Is this what you suggest?
> > > > > 
> > > > > Thanks
> > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > GIOVA->GPA in the IOTLB.
> > > > 
> > > > This has advantages for security since a single table needs
> > > > then to be validated to ensure guest does not corrupt
> > > > QEMU memory.
> > > > 
> > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > GIOVA->HVA to vhost. It looks no difference to me.
> > > 
> > > Thanks
> > The difference is in security not in performance.  Getting a bad HVA
> > corrupts QEMU memory and it might be guest controlled. Very risky.
> 
> 
> How can this be controlled by guest? HVA was generated from qemu ram blocks
> which is totally under the control of qemu memory core instead of guest.
> 
> 
> Thanks

It is ultimately under guest influence as guest supplies IOVA->GPA
translations.  qemu translates GPA->HVA and gives the translated result
to the kernel.  If it's not buggy and kernel isn't buggy it's all
fine.

But that's the approach that was proven not to work in the 20th century.
In the 21st century we are trying defence in depth approach.

My point is that a single code path that is responsible for
the HVA translations is better than two.

> 
> >   If
> > translations to HVA are done in a single place through a single table
> > it's safer as there's a single risky place.
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-24  3:43               ` Jason Wang
  (?)
  (?)
@ 2018-12-24 17:41               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-24 17:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization

On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> 
> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > Just to make sure I understand this. It looks to me we should:
> > > > > 
> > > > > - allow passing GIOVA->GPA through UAPI
> > > > > 
> > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > performance
> > > > > 
> > > > > Is this what you suggest?
> > > > > 
> > > > > Thanks
> > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > GIOVA->GPA in the IOTLB.
> > > > 
> > > > This has advantages for security since a single table needs
> > > > then to be validated to ensure guest does not corrupt
> > > > QEMU memory.
> > > > 
> > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > GIOVA->HVA to vhost. It looks no difference to me.
> > > 
> > > Thanks
> > The difference is in security not in performance.  Getting a bad HVA
> > corrupts QEMU memory and it might be guest controlled. Very risky.
> 
> 
> How can this be controlled by guest? HVA was generated from qemu ram blocks
> which is totally under the control of qemu memory core instead of guest.
> 
> 
> Thanks

It is ultimately under guest influence as guest supplies IOVA->GPA
translations.  qemu translates GPA->HVA and gives the translated result
to the kernel.  If it's not buggy and kernel isn't buggy it's all
fine.

But that's the approach that was proven not to work in the 20th century.
In the 21st century we are trying defence in depth approach.

My point is that a single code path that is responsible for
the HVA translations is better than two.

> 
> >   If
> > translations to HVA are done in a single place through a single table
> > it's safer as there's a single risky place.
> > 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-24 17:41               ` Michael S. Tsirkin
@ 2018-12-25  9:43                 ` Jason Wang
  2018-12-25 16:25                     ` Michael S. Tsirkin
  2018-12-25  9:43                 ` Jason Wang
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Wang @ 2018-12-25  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim


On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>
>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>
>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>> performance
>>>>>>
>>>>>> Is this what you suggest?
>>>>>>
>>>>>> Thanks
>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>> GIOVA->GPA in the IOTLB.
>>>>>
>>>>> This has advantages for security since a single table needs
>>>>> then to be validated to ensure guest does not corrupt
>>>>> QEMU memory.
>>>>>
>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>
>>>> Thanks
>>> The difference is in security not in performance.  Getting a bad HVA
>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>> which is totally under the control of qemu memory core instead of guest.
>>
>>
>> Thanks
> It is ultimately under guest influence as guest supplies IOVA->GPA
> translations.  qemu translates GPA->HVA and gives the translated result
> to the kernel.  If it's not buggy and kernel isn't buggy it's all
> fine.


If qemu provides buggy GPA->HVA, we can't workaround this. And I don't 
get the point why we even want to try this. Buggy qemu code can crash 
itself in many ways.


>
> But that's the approach that was proven not to work in the 20th century.
> In the 21st century we are trying defence in depth approach.
>
> My point is that a single code path that is responsible for
> the HVA translations is better than two.
>

So the difference whether or not use memory table information:

Current:

1) SET_MEM_TABLE: GPA->HVA

2) Qemu GIOVA->GPA

3) Qemu GPA->HVA

4) IOTLB_UPDATE: GIOVA->HVA

If I understand correctly you want to drop step 3 consider it might be 
buggy which is just 19 lines of code in qemu 
(vhost_memory_region_lookup()). This will ends up:

1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want 
to do it during device IOTLB lookup).

2) Extra bits to enable this capability.

So this looks need more codes in kernel than what qemu did in 
userspace.  Is this really worthwhile?

Thanks


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-24 17:41               ` Michael S. Tsirkin
  2018-12-25  9:43                 ` Jason Wang
@ 2018-12-25  9:43                 ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-25  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization


On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>
>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>
>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>> performance
>>>>>>
>>>>>> Is this what you suggest?
>>>>>>
>>>>>> Thanks
>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>> GIOVA->GPA in the IOTLB.
>>>>>
>>>>> This has advantages for security since a single table needs
>>>>> then to be validated to ensure guest does not corrupt
>>>>> QEMU memory.
>>>>>
>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>
>>>> Thanks
>>> The difference is in security not in performance.  Getting a bad HVA
>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>> which is totally under the control of qemu memory core instead of guest.
>>
>>
>> Thanks
> It is ultimately under guest influence as guest supplies IOVA->GPA
> translations.  qemu translates GPA->HVA and gives the translated result
> to the kernel.  If it's not buggy and kernel isn't buggy it's all
> fine.


If qemu provides buggy GPA->HVA, we can't workaround this. And I don't 
get the point why we even want to try this. Buggy qemu code can crash 
itself in many ways.


>
> But that's the approach that was proven not to work in the 20th century.
> In the 21st century we are trying defence in depth approach.
>
> My point is that a single code path that is responsible for
> the HVA translations is better than two.
>

So the difference whether or not use memory table information:

Current:

1) SET_MEM_TABLE: GPA->HVA

2) Qemu GIOVA->GPA

3) Qemu GPA->HVA

4) IOTLB_UPDATE: GIOVA->HVA

If I understand correctly you want to drop step 3 consider it might be 
buggy which is just 19 lines of code in qemu 
(vhost_memory_region_lookup()). This will ends up:

1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want 
to do it during device IOTLB lookup).

2) Extra bits to enable this capability.

So this looks need more codes in kernel than what qemu did in 
userspace.  Is this really worthwhile?

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-25  9:43                 ` Jason Wang
@ 2018-12-25 16:25                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-25 16:25 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim

On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> 
> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > > > Just to make sure I understand this. It looks to me we should:
> > > > > > > 
> > > > > > > - allow passing GIOVA->GPA through UAPI
> > > > > > > 
> > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > > > performance
> > > > > > > 
> > > > > > > Is this what you suggest?
> > > > > > > 
> > > > > > > Thanks
> > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > > > GIOVA->GPA in the IOTLB.
> > > > > > 
> > > > > > This has advantages for security since a single table needs
> > > > > > then to be validated to ensure guest does not corrupt
> > > > > > QEMU memory.
> > > > > > 
> > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > > > GIOVA->HVA to vhost. It looks no difference to me.
> > > > > 
> > > > > Thanks
> > > > The difference is in security not in performance.  Getting a bad HVA
> > > > corrupts QEMU memory and it might be guest controlled. Very risky.
> > > How can this be controlled by guest? HVA was generated from qemu ram blocks
> > > which is totally under the control of qemu memory core instead of guest.
> > > 
> > > 
> > > Thanks
> > It is ultimately under guest influence as guest supplies IOVA->GPA
> > translations.  qemu translates GPA->HVA and gives the translated result
> > to the kernel.  If it's not buggy and kernel isn't buggy it's all
> > fine.
> 
> 
> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
> the point why we even want to try this. Buggy qemu code can crash itself in
> many ways.
> 
> 
> > 
> > But that's the approach that was proven not to work in the 20th century.
> > In the 21st century we are trying defence in depth approach.
> > 
> > My point is that a single code path that is responsible for
> > the HVA translations is better than two.
> > 
> 
> So the difference whether or not use memory table information:
> 
> Current:
> 
> 1) SET_MEM_TABLE: GPA->HVA
> 
> 2) Qemu GIOVA->GPA
> 
> 3) Qemu GPA->HVA
> 
> 4) IOTLB_UPDATE: GIOVA->HVA
> 
> If I understand correctly you want to drop step 3 consider it might be buggy
> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
> will ends up:
> 
> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
> do it during device IOTLB lookup).
> 
> 2) Extra bits to enable this capability.
> 
> So this looks need more codes in kernel than what qemu did in userspace.  Is
> this really worthwhile?
> 
> Thanks

So there are several points I would like to make

1. At the moment without an iommu it is possible to
   change GPA-HVA mappings and everything keeps working
   because a change in memory tables flushes the rings.
   However I don't see the iotlb cache being invalidated
   on that path - did I miss it? If it is not there it's
   a related minor bug.

2. qemu already has a GPA. Discarding it and re-calculating
   when logging is on just seems wrong.
   However if you would like to *also* keep the HVA in the iotlb
   to avoid doing extra translations, that sounds like a
   reasonable optimization.

3. it also means that the hva->gpa translation only runs
   when logging is enabled. That is a rarely excercised
   path so any bugs there will not be caught.

So I really would like us long term to move away from
hva->gpa translations, keep them for legacy userspace only
but I don't really mind how we do it.

How about
- a new flag to pass an iotlb with *both* a gpa and hva
- for legacy userspace, calculate the gpa on iotlb update
  so the device then uses a shared code path

what do you think?


-- 
MST

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
@ 2018-12-25 16:25                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-25 16:25 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization

On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> 
> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > > > Just to make sure I understand this. It looks to me we should:
> > > > > > > 
> > > > > > > - allow passing GIOVA->GPA through UAPI
> > > > > > > 
> > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > > > performance
> > > > > > > 
> > > > > > > Is this what you suggest?
> > > > > > > 
> > > > > > > Thanks
> > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > > > GIOVA->GPA in the IOTLB.
> > > > > > 
> > > > > > This has advantages for security since a single table needs
> > > > > > then to be validated to ensure guest does not corrupt
> > > > > > QEMU memory.
> > > > > > 
> > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > > > GIOVA->HVA to vhost. It looks no difference to me.
> > > > > 
> > > > > Thanks
> > > > The difference is in security not in performance.  Getting a bad HVA
> > > > corrupts QEMU memory and it might be guest controlled. Very risky.
> > > How can this be controlled by guest? HVA was generated from qemu ram blocks
> > > which is totally under the control of qemu memory core instead of guest.
> > > 
> > > 
> > > Thanks
> > It is ultimately under guest influence as guest supplies IOVA->GPA
> > translations.  qemu translates GPA->HVA and gives the translated result
> > to the kernel.  If it's not buggy and kernel isn't buggy it's all
> > fine.
> 
> 
> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
> the point why we even want to try this. Buggy qemu code can crash itself in
> many ways.
> 
> 
> > 
> > But that's the approach that was proven not to work in the 20th century.
> > In the 21st century we are trying defence in depth approach.
> > 
> > My point is that a single code path that is responsible for
> > the HVA translations is better than two.
> > 
> 
> So the difference whether or not use memory table information:
> 
> Current:
> 
> 1) SET_MEM_TABLE: GPA->HVA
> 
> 2) Qemu GIOVA->GPA
> 
> 3) Qemu GPA->HVA
> 
> 4) IOTLB_UPDATE: GIOVA->HVA
> 
> If I understand correctly you want to drop step 3 consider it might be buggy
> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
> will ends up:
> 
> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
> do it during device IOTLB lookup).
> 
> 2) Extra bits to enable this capability.
> 
> So this looks need more codes in kernel than what qemu did in userspace.  Is
> this really worthwhile?
> 
> Thanks

So there are several points I would like to make

1. At the moment without an iommu it is possible to
   change GPA-HVA mappings and everything keeps working
   because a change in memory tables flushes the rings.
   However I don't see the iotlb cache being invalidated
   on that path - did I miss it? If it is not there it's
   a related minor bug.

2. qemu already has a GPA. Discarding it and re-calculating
   when logging is on just seems wrong.
   However if you would like to *also* keep the HVA in the iotlb
   to avoid doing extra translations, that sounds like a
   reasonable optimization.

3. it also means that the hva->gpa translation only runs
   when logging is enabled. That is a rarely excercised
   path so any bugs there will not be caught.

So I really would like us long term to move away from
hva->gpa translations, keep them for legacy userspace only
but I don't really mind how we do it.

How about
- a new flag to pass an iotlb with *both* a gpa and hva
- for legacy userspace, calculate the gpa on iotlb update
  so the device then uses a shared code path

what do you think?


-- 
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-25 16:25                     ` Michael S. Tsirkin
@ 2018-12-26  5:43                       ` Jason Wang
  -1 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-26  5:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim


On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
>>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>>>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>>>
>>>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>>>
>>>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>>>> performance
>>>>>>>>
>>>>>>>> Is this what you suggest?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>>>> GIOVA->GPA in the IOTLB.
>>>>>>>
>>>>>>> This has advantages for security since a single table needs
>>>>>>> then to be validated to ensure guest does not corrupt
>>>>>>> QEMU memory.
>>>>>>>
>>>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>>>
>>>>>> Thanks
>>>>> The difference is in security not in performance.  Getting a bad HVA
>>>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>>>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>>>> which is totally under the control of qemu memory core instead of guest.
>>>>
>>>>
>>>> Thanks
>>> It is ultimately under guest influence as guest supplies IOVA->GPA
>>> translations.  qemu translates GPA->HVA and gives the translated result
>>> to the kernel.  If it's not buggy and kernel isn't buggy it's all
>>> fine.
>>
>> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
>> the point why we even want to try this. Buggy qemu code can crash itself in
>> many ways.
>>
>>
>>> But that's the approach that was proven not to work in the 20th century.
>>> In the 21st century we are trying defence in depth approach.
>>>
>>> My point is that a single code path that is responsible for
>>> the HVA translations is better than two.
>>>
>> So the difference whether or not use memory table information:
>>
>> Current:
>>
>> 1) SET_MEM_TABLE: GPA->HVA
>>
>> 2) Qemu GIOVA->GPA
>>
>> 3) Qemu GPA->HVA
>>
>> 4) IOTLB_UPDATE: GIOVA->HVA
>>
>> If I understand correctly you want to drop step 3 consider it might be buggy
>> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
>> will ends up:
>>
>> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
>> do it during device IOTLB lookup).
>>
>> 2) Extra bits to enable this capability.
>>
>> So this looks need more codes in kernel than what qemu did in userspace.  Is
>> this really worthwhile?
>>
>> Thanks
> So there are several points I would like to make
>
> 1. At the moment without an iommu it is possible to
>     change GPA-HVA mappings and everything keeps working
>     because a change in memory tables flushes the rings.


Interesting, I don't know this before. But when can this happen?


>     However I don't see the iotlb cache being invalidated
>     on that path - did I miss it? If it is not there it's
>     a related minor bug.


It might have a bug. But a question is consider the case without IOMMU. 
We only update mem table (SET_MEM_TABLE), but not vring address. This 
looks like a bug as well?


>
> 2. qemu already has a GPA. Discarding it and re-calculating
>     when logging is on just seems wrong.
>     However if you would like to *also* keep the HVA in the iotlb
>     to avoid doing extra translations, that sounds like a
>     reasonable optimization.


Yes, traverse GPA->HVA mapping seems unnecessary.


>
> 3. it also means that the hva->gpa translation only runs
>     when logging is enabled. That is a rarely excercised
>     path so any bugs there will not be caught.


I wonder maybe some kind of unit-test may help here.


>
> So I really would like us long term to move away from
> hva->gpa translations, keep them for legacy userspace only
> but I don't really mind how we do it.
>
> How about
> - a new flag to pass an iotlb with *both* a gpa and hva
> - for legacy userspace, calculate the gpa on iotlb update
>    so the device then uses a shared code path
>
> what do you think?
>
>

I don't object this idea so I can try, just want to figure out why it 
was a must.

Thanks



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
@ 2018-12-26  5:43                       ` Jason Wang
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-26  5:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization


On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
>>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>>>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>>>
>>>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>>>
>>>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>>>> performance
>>>>>>>>
>>>>>>>> Is this what you suggest?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>>>> GIOVA->GPA in the IOTLB.
>>>>>>>
>>>>>>> This has advantages for security since a single table needs
>>>>>>> then to be validated to ensure guest does not corrupt
>>>>>>> QEMU memory.
>>>>>>>
>>>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>>>
>>>>>> Thanks
>>>>> The difference is in security not in performance.  Getting a bad HVA
>>>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>>>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>>>> which is totally under the control of qemu memory core instead of guest.
>>>>
>>>>
>>>> Thanks
>>> It is ultimately under guest influence as guest supplies IOVA->GPA
>>> translations.  qemu translates GPA->HVA and gives the translated result
>>> to the kernel.  If it's not buggy and kernel isn't buggy it's all
>>> fine.
>>
>> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
>> the point why we even want to try this. Buggy qemu code can crash itself in
>> many ways.
>>
>>
>>> But that's the approach that was proven not to work in the 20th century.
>>> In the 21st century we are trying defence in depth approach.
>>>
>>> My point is that a single code path that is responsible for
>>> the HVA translations is better than two.
>>>
>> So the difference whether or not use memory table information:
>>
>> Current:
>>
>> 1) SET_MEM_TABLE: GPA->HVA
>>
>> 2) Qemu GIOVA->GPA
>>
>> 3) Qemu GPA->HVA
>>
>> 4) IOTLB_UPDATE: GIOVA->HVA
>>
>> If I understand correctly you want to drop step 3 consider it might be buggy
>> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
>> will ends up:
>>
>> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
>> do it during device IOTLB lookup).
>>
>> 2) Extra bits to enable this capability.
>>
>> So this looks need more codes in kernel than what qemu did in userspace.  Is
>> this really worthwhile?
>>
>> Thanks
> So there are several points I would like to make
>
> 1. At the moment without an iommu it is possible to
>     change GPA-HVA mappings and everything keeps working
>     because a change in memory tables flushes the rings.


Interesting, I don't know this before. But when can this happen?


>     However I don't see the iotlb cache being invalidated
>     on that path - did I miss it? If it is not there it's
>     a related minor bug.


It might have a bug. But a question is consider the case without IOMMU. 
We only update mem table (SET_MEM_TABLE), but not vring address. This 
looks like a bug as well?


>
> 2. qemu already has a GPA. Discarding it and re-calculating
>     when logging is on just seems wrong.
>     However if you would like to *also* keep the HVA in the iotlb
>     to avoid doing extra translations, that sounds like a
>     reasonable optimization.


Yes, traverse GPA->HVA mapping seems unnecessary.


>
> 3. it also means that the hva->gpa translation only runs
>     when logging is enabled. That is a rarely excercised
>     path so any bugs there will not be caught.


I wonder maybe some kind of unit-test may help here.


>
> So I really would like us long term to move away from
> hva->gpa translations, keep them for legacy userspace only
> but I don't really mind how we do it.
>
> How about
> - a new flag to pass an iotlb with *both* a gpa and hva
> - for legacy userspace, calculate the gpa on iotlb update
>    so the device then uses a shared code path
>
> what do you think?
>
>

I don't object this idea so I can try, just want to figure out why it 
was a must.

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-26  5:43                       ` Jason Wang
  (?)
  (?)
@ 2018-12-26 13:46                       ` Michael S. Tsirkin
  2018-12-27  9:32                         ` Jason Wang
  2018-12-27  9:32                         ` Jason Wang
  -1 siblings, 2 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-26 13:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim

On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
> 
> On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
> > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> > > On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > > > On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > > > > > Just to make sure I understand this. It looks to me we should:
> > > > > > > > > 
> > > > > > > > > - allow passing GIOVA->GPA through UAPI
> > > > > > > > > 
> > > > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > > > > > performance
> > > > > > > > > 
> > > > > > > > > Is this what you suggest?
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > > > > > GIOVA->GPA in the IOTLB.
> > > > > > > > 
> > > > > > > > This has advantages for security since a single table needs
> > > > > > > > then to be validated to ensure guest does not corrupt
> > > > > > > > QEMU memory.
> > > > > > > > 
> > > > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > > > > > GIOVA->HVA to vhost. It looks no difference to me.
> > > > > > > 
> > > > > > > Thanks
> > > > > > The difference is in security not in performance.  Getting a bad HVA
> > > > > > corrupts QEMU memory and it might be guest controlled. Very risky.
> > > > > How can this be controlled by guest? HVA was generated from qemu ram blocks
> > > > > which is totally under the control of qemu memory core instead of guest.
> > > > > 
> > > > > 
> > > > > Thanks
> > > > It is ultimately under guest influence as guest supplies IOVA->GPA
> > > > translations.  qemu translates GPA->HVA and gives the translated result
> > > > to the kernel.  If it's not buggy and kernel isn't buggy it's all
> > > > fine.
> > > 
> > > If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
> > > the point why we even want to try this. Buggy qemu code can crash itself in
> > > many ways.
> > > 
> > > 
> > > > But that's the approach that was proven not to work in the 20th century.
> > > > In the 21st century we are trying defence in depth approach.
> > > > 
> > > > My point is that a single code path that is responsible for
> > > > the HVA translations is better than two.
> > > > 
> > > So the difference whether or not use memory table information:
> > > 
> > > Current:
> > > 
> > > 1) SET_MEM_TABLE: GPA->HVA
> > > 
> > > 2) Qemu GIOVA->GPA
> > > 
> > > 3) Qemu GPA->HVA
> > > 
> > > 4) IOTLB_UPDATE: GIOVA->HVA
> > > 
> > > If I understand correctly you want to drop step 3 consider it might be buggy
> > > which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
> > > will ends up:
> > > 
> > > 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
> > > do it during device IOTLB lookup).
> > > 
> > > 2) Extra bits to enable this capability.
> > > 
> > > So this looks need more codes in kernel than what qemu did in userspace.  Is
> > > this really worthwhile?
> > > 
> > > Thanks
> > So there are several points I would like to make
> > 
> > 1. At the moment without an iommu it is possible to
> >     change GPA-HVA mappings and everything keeps working
> >     because a change in memory tables flushes the rings.
> 
> 
> Interesting, I don't know this before. But when can this happen?


It doesn't happen with existing qemu. But it seems like a valid
thing to do to remap memory at a different address.


> 
> >     However I don't see the iotlb cache being invalidated
> >     on that path - did I miss it? If it is not there it's
> >     a related minor bug.
> 
> 
> It might have a bug. But a question is consider the case without IOMMU. We
> only update mem table (SET_MEM_TABLE), but not vring address. This looks
> like a bug as well?

I think that without an iommu it can only work without races if backend is
stopped or if the vring isn't in guest memory with ring aliasing).


> 
> > 
> > 2. qemu already has a GPA. Discarding it and re-calculating
> >     when logging is on just seems wrong.
> >     However if you would like to *also* keep the HVA in the iotlb
> >     to avoid doing extra translations, that sounds like a
> >     reasonable optimization.
> 
> 
> Yes, traverse GPA->HVA mapping seems unnecessary.
> 
> 
> > 
> > 3. it also means that the hva->gpa translation only runs
> >     when logging is enabled. That is a rarely excercised
> >     path so any bugs there will not be caught.
> 
> 
> I wonder maybe some kind of unit-test may help here.
> 
> 
> > 
> > So I really would like us long term to move away from
> > hva->gpa translations, keep them for legacy userspace only
> > but I don't really mind how we do it.
> > 
> > How about
> > - a new flag to pass an iotlb with *both* a gpa and hva
> > - for legacy userspace, calculate the gpa on iotlb update
> >    so the device then uses a shared code path
> > 
> > what do you think?
> > 
> > 
> 
> I don't object this idea so I can try, just want to figure out why it was a
> must.
> 
> Thanks

Not a must but I think it's a good interface extension.

-- 
MST

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-26  5:43                       ` Jason Wang
  (?)
@ 2018-12-26 13:46                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-12-26 13:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization

On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
> 
> On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
> > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> > > On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > > > On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > > > > > Just to make sure I understand this. It looks to me we should:
> > > > > > > > > 
> > > > > > > > > - allow passing GIOVA->GPA through UAPI
> > > > > > > > > 
> > > > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > > > > > performance
> > > > > > > > > 
> > > > > > > > > Is this what you suggest?
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > > > > > GIOVA->GPA in the IOTLB.
> > > > > > > > 
> > > > > > > > This has advantages for security since a single table needs
> > > > > > > > then to be validated to ensure guest does not corrupt
> > > > > > > > QEMU memory.
> > > > > > > > 
> > > > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > > > > > GIOVA->HVA to vhost. It looks no difference to me.
> > > > > > > 
> > > > > > > Thanks
> > > > > > The difference is in security not in performance.  Getting a bad HVA
> > > > > > corrupts QEMU memory and it might be guest controlled. Very risky.
> > > > > How can this be controlled by guest? HVA was generated from qemu ram blocks
> > > > > which is totally under the control of qemu memory core instead of guest.
> > > > > 
> > > > > 
> > > > > Thanks
> > > > It is ultimately under guest influence as guest supplies IOVA->GPA
> > > > translations.  qemu translates GPA->HVA and gives the translated result
> > > > to the kernel.  If it's not buggy and kernel isn't buggy it's all
> > > > fine.
> > > 
> > > If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
> > > the point why we even want to try this. Buggy qemu code can crash itself in
> > > many ways.
> > > 
> > > 
> > > > But that's the approach that was proven not to work in the 20th century.
> > > > In the 21st century we are trying defence in depth approach.
> > > > 
> > > > My point is that a single code path that is responsible for
> > > > the HVA translations is better than two.
> > > > 
> > > So the difference whether or not use memory table information:
> > > 
> > > Current:
> > > 
> > > 1) SET_MEM_TABLE: GPA->HVA
> > > 
> > > 2) Qemu GIOVA->GPA
> > > 
> > > 3) Qemu GPA->HVA
> > > 
> > > 4) IOTLB_UPDATE: GIOVA->HVA
> > > 
> > > If I understand correctly you want to drop step 3 consider it might be buggy
> > > which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
> > > will ends up:
> > > 
> > > 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
> > > do it during device IOTLB lookup).
> > > 
> > > 2) Extra bits to enable this capability.
> > > 
> > > So this looks need more codes in kernel than what qemu did in userspace.  Is
> > > this really worthwhile?
> > > 
> > > Thanks
> > So there are several points I would like to make
> > 
> > 1. At the moment without an iommu it is possible to
> >     change GPA-HVA mappings and everything keeps working
> >     because a change in memory tables flushes the rings.
> 
> 
> Interesting, I don't know this before. But when can this happen?


It doesn't happen with existing qemu. But it seems like a valid
thing to do to remap memory at a different address.


> 
> >     However I don't see the iotlb cache being invalidated
> >     on that path - did I miss it? If it is not there it's
> >     a related minor bug.
> 
> 
> It might have a bug. But a question is consider the case without IOMMU. We
> only update mem table (SET_MEM_TABLE), but not vring address. This looks
> like a bug as well?

I think that without an iommu it can only work without races if backend is
stopped or if the vring isn't in guest memory with ring aliasing).


> 
> > 
> > 2. qemu already has a GPA. Discarding it and re-calculating
> >     when logging is on just seems wrong.
> >     However if you would like to *also* keep the HVA in the iotlb
> >     to avoid doing extra translations, that sounds like a
> >     reasonable optimization.
> 
> 
> Yes, traverse GPA->HVA mapping seems unnecessary.
> 
> 
> > 
> > 3. it also means that the hva->gpa translation only runs
> >     when logging is enabled. That is a rarely excercised
> >     path so any bugs there will not be caught.
> 
> 
> I wonder maybe some kind of unit-test may help here.
> 
> 
> > 
> > So I really would like us long term to move away from
> > hva->gpa translations, keep them for legacy userspace only
> > but I don't really mind how we do it.
> > 
> > How about
> > - a new flag to pass an iotlb with *both* a gpa and hva
> > - for legacy userspace, calculate the gpa on iotlb update
> >    so the device then uses a shared code path
> > 
> > what do you think?
> > 
> > 
> 
> I don't object this idea so I can try, just want to figure out why it was a
> must.
> 
> Thanks

Not a must but I think it's a good interface extension.

-- 
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-26 13:46                       ` Michael S. Tsirkin
@ 2018-12-27  9:32                         ` Jason Wang
  2018-12-27  9:32                         ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-27  9:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel, Jintack Lim


On 2018/12/26 下午9:46, Michael S. Tsirkin wrote:
> On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
>> On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
>>> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>>>> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>>>>>
>>>>>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>>>>>
>>>>>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>>>>>> performance
>>>>>>>>>>
>>>>>>>>>> Is this what you suggest?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>>>>>> GIOVA->GPA in the IOTLB.
>>>>>>>>>
>>>>>>>>> This has advantages for security since a single table needs
>>>>>>>>> then to be validated to ensure guest does not corrupt
>>>>>>>>> QEMU memory.
>>>>>>>>>
>>>>>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>>>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>>>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> The difference is in security not in performance.  Getting a bad HVA
>>>>>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>>>>>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>>>>>> which is totally under the control of qemu memory core instead of guest.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>> It is ultimately under guest influence as guest supplies IOVA->GPA
>>>>> translations.  qemu translates GPA->HVA and gives the translated result
>>>>> to the kernel.  If it's not buggy and kernel isn't buggy it's all
>>>>> fine.
>>>> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
>>>> the point why we even want to try this. Buggy qemu code can crash itself in
>>>> many ways.
>>>>
>>>>
>>>>> But that's the approach that was proven not to work in the 20th century.
>>>>> In the 21st century we are trying defence in depth approach.
>>>>>
>>>>> My point is that a single code path that is responsible for
>>>>> the HVA translations is better than two.
>>>>>
>>>> So the difference whether or not use memory table information:
>>>>
>>>> Current:
>>>>
>>>> 1) SET_MEM_TABLE: GPA->HVA
>>>>
>>>> 2) Qemu GIOVA->GPA
>>>>
>>>> 3) Qemu GPA->HVA
>>>>
>>>> 4) IOTLB_UPDATE: GIOVA->HVA
>>>>
>>>> If I understand correctly you want to drop step 3 consider it might be buggy
>>>> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
>>>> will ends up:
>>>>
>>>> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
>>>> do it during device IOTLB lookup).
>>>>
>>>> 2) Extra bits to enable this capability.
>>>>
>>>> So this looks need more codes in kernel than what qemu did in userspace.  Is
>>>> this really worthwhile?
>>>>
>>>> Thanks
>>> So there are several points I would like to make
>>>
>>> 1. At the moment without an iommu it is possible to
>>>      change GPA-HVA mappings and everything keeps working
>>>      because a change in memory tables flushes the rings.
>>
>> Interesting, I don't know this before. But when can this happen?
>
> It doesn't happen with existing qemu. But it seems like a valid
> thing to do to remap memory at a different address.
>

Ok.


>>>      However I don't see the iotlb cache being invalidated
>>>      on that path - did I miss it? If it is not there it's
>>>      a related minor bug.
>>
>> It might have a bug. But a question is consider the case without IOMMU. We
>> only update mem table (SET_MEM_TABLE), but not vring address. This looks
>> like a bug as well?
> I think that without an iommu it can only work without races if backend is
> stopped or if the vring isn't in guest memory with ring aliasing).


Right.


>
>>> 2. qemu already has a GPA. Discarding it and re-calculating
>>>      when logging is on just seems wrong.
>>>      However if you would like to *also* keep the HVA in the iotlb
>>>      to avoid doing extra translations, that sounds like a
>>>      reasonable optimization.
>>
>> Yes, traverse GPA->HVA mapping seems unnecessary.
>>
>>
>>> 3. it also means that the hva->gpa translation only runs
>>>      when logging is enabled. That is a rarely excercised
>>>      path so any bugs there will not be caught.
>>
>> I wonder maybe some kind of unit-test may help here.
>>
>>
>>> So I really would like us long term to move away from
>>> hva->gpa translations, keep them for legacy userspace only
>>> but I don't really mind how we do it.
>>>
>>> How about
>>> - a new flag to pass an iotlb with *both* a gpa and hva
>>> - for legacy userspace, calculate the gpa on iotlb update
>>>     so the device then uses a shared code path
>>>
>>> what do you think?
>>>
>>>
>> I don't object this idea so I can try, just want to figure out why it was a
>> must.
>>
>> Thanks
> Not a must but I think it's a good interface extension.
>

Ok. let me try to do this.

Thanks


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net V2 4/4] vhost: log dirty page correctly
  2018-12-26 13:46                       ` Michael S. Tsirkin
  2018-12-27  9:32                         ` Jason Wang
@ 2018-12-27  9:32                         ` Jason Wang
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-27  9:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jintack Lim, netdev, linux-kernel, kvm, virtualization


On 2018/12/26 下午9:46, Michael S. Tsirkin wrote:
> On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
>> On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
>>> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>>>> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>>>>>
>>>>>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>>>>>
>>>>>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>>>>>> performance
>>>>>>>>>>
>>>>>>>>>> Is this what you suggest?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>>>>>> GIOVA->GPA in the IOTLB.
>>>>>>>>>
>>>>>>>>> This has advantages for security since a single table needs
>>>>>>>>> then to be validated to ensure guest does not corrupt
>>>>>>>>> QEMU memory.
>>>>>>>>>
>>>>>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>>>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>>>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> The difference is in security not in performance.  Getting a bad HVA
>>>>>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>>>>>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>>>>>> which is totally under the control of qemu memory core instead of guest.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>> It is ultimately under guest influence as guest supplies IOVA->GPA
>>>>> translations.  qemu translates GPA->HVA and gives the translated result
>>>>> to the kernel.  If it's not buggy and kernel isn't buggy it's all
>>>>> fine.
>>>> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
>>>> the point why we even want to try this. Buggy qemu code can crash itself in
>>>> many ways.
>>>>
>>>>
>>>>> But that's the approach that was proven not to work in the 20th century.
>>>>> In the 21st century we are trying defence in depth approach.
>>>>>
>>>>> My point is that a single code path that is responsible for
>>>>> the HVA translations is better than two.
>>>>>
>>>> So the difference whether or not use memory table information:
>>>>
>>>> Current:
>>>>
>>>> 1) SET_MEM_TABLE: GPA->HVA
>>>>
>>>> 2) Qemu GIOVA->GPA
>>>>
>>>> 3) Qemu GPA->HVA
>>>>
>>>> 4) IOTLB_UPDATE: GIOVA->HVA
>>>>
>>>> If I understand correctly you want to drop step 3 consider it might be buggy
>>>> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
>>>> will ends up:
>>>>
>>>> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
>>>> do it during device IOTLB lookup).
>>>>
>>>> 2) Extra bits to enable this capability.
>>>>
>>>> So this looks need more codes in kernel than what qemu did in userspace.  Is
>>>> this really worthwhile?
>>>>
>>>> Thanks
>>> So there are several points I would like to make
>>>
>>> 1. At the moment without an iommu it is possible to
>>>      change GPA-HVA mappings and everything keeps working
>>>      because a change in memory tables flushes the rings.
>>
>> Interesting, I don't know this before. But when can this happen?
>
> It doesn't happen with existing qemu. But it seems like a valid
> thing to do to remap memory at a different address.
>

Ok.


>>>      However I don't see the iotlb cache being invalidated
>>>      on that path - did I miss it? If it is not there it's
>>>      a related minor bug.
>>
>> It might have a bug. But a question is consider the case without IOMMU. We
>> only update mem table (SET_MEM_TABLE), but not vring address. This looks
>> like a bug as well?
> I think that without an iommu it can only work without races if backend is
> stopped or if the vring isn't in guest memory with ring aliasing).


Right.


>
>>> 2. qemu already has a GPA. Discarding it and re-calculating
>>>      when logging is on just seems wrong.
>>>      However if you would like to *also* keep the HVA in the iotlb
>>>      to avoid doing extra translations, that sounds like a
>>>      reasonable optimization.
>>
>> Yes, traverse GPA->HVA mapping seems unnecessary.
>>
>>
>>> 3. it also means that the hva->gpa translation only runs
>>>      when logging is enabled. That is a rarely excercised
>>>      path so any bugs there will not be caught.
>>
>> I wonder maybe some kind of unit-test may help here.
>>
>>
>>> So I really would like us long term to move away from
>>> hva->gpa translations, keep them for legacy userspace only
>>> but I don't really mind how we do it.
>>>
>>> How about
>>> - a new flag to pass an iotlb with *both* a gpa and hva
>>> - for legacy userspace, calculate the gpa on iotlb update
>>>     so the device then uses a shared code path
>>>
>>> what do you think?
>>>
>>>
>> I don't object this idea so I can try, just want to figure out why it was a
>> must.
>>
>> Thanks
> Not a must but I think it's a good interface extension.
>

Ok. let me try to do this.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net V2 0/4] Fix various issue of vhost
@ 2018-12-12 10:08 Jason Wang
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Wang @ 2018-12-12 10:08 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev, linux-kernel

Hi:

This series tries to fix various issues of vhost:

- Patch 1 adds a missing write barrier between used idx updating and
  logging.
- Patch 2-3 brings back the protection of device IOTLB through vq
  mutex, this fixes possible use after free in device IOTLB entries.
- Patch 4-7 fixes the diry page logging when device IOTLB is
  enabled. We should done through GPA instead of GIOVA, this was done
  through intorudce HVA->GPA reverse mapping and convert HVA to GPA
  during logging dirty pages.

Please consider them for -stable.

Thanks

Changes from V1:
- silent compiler warning for 32bit.
- use mutex_trylock() on slowpath instead of mutex_lock() even on fast
  path.

Jason Wang (4):
  vhost: make sure used idx is seen before log in vhost_add_used_n()
  vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()
  Revert "net: vhost: lock the vqs one by one"
  vhost: log dirty page correctly

 drivers/vhost/net.c   |  11 ++++-
 drivers/vhost/vhost.c | 102 ++++++++++++++++++++++++++++++++++--------
 drivers/vhost/vhost.h |   3 +-
 3 files changed, 95 insertions(+), 21 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-12-27  9:32 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 14:33   ` Michael S. Tsirkin
2018-12-12 14:33   ` Michael S. Tsirkin
2018-12-12 10:08 ` [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 14:20   ` Michael S. Tsirkin
2018-12-12 14:20   ` Michael S. Tsirkin
2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
2018-12-12 14:24   ` Michael S. Tsirkin
2018-12-12 14:24   ` Michael S. Tsirkin
2018-12-13  2:27     ` Jason Wang
2018-12-13  2:27     ` Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
2018-12-12 14:32   ` Michael S. Tsirkin
2018-12-13  2:39     ` Jason Wang
2018-12-13 14:31       ` Michael S. Tsirkin
2018-12-13 14:31       ` Michael S. Tsirkin
2018-12-14  2:43         ` Jason Wang
2018-12-14 13:20           ` Michael S. Tsirkin
2018-12-14 13:20           ` Michael S. Tsirkin
2018-12-24  3:43             ` Jason Wang
2018-12-24  3:43               ` Jason Wang
2018-12-24 17:41               ` Michael S. Tsirkin
2018-12-25  9:43                 ` Jason Wang
2018-12-25 16:25                   ` Michael S. Tsirkin
2018-12-25 16:25                     ` Michael S. Tsirkin
2018-12-26  5:43                     ` Jason Wang
2018-12-26  5:43                       ` Jason Wang
2018-12-26 13:46                       ` Michael S. Tsirkin
2018-12-26 13:46                       ` Michael S. Tsirkin
2018-12-27  9:32                         ` Jason Wang
2018-12-27  9:32                         ` Jason Wang
2018-12-25  9:43                 ` Jason Wang
2018-12-24 17:41               ` Michael S. Tsirkin
2018-12-14  2:43         ` Jason Wang
2018-12-13  2:39     ` Jason Wang
2018-12-12 14:32   ` Michael S. Tsirkin
2018-12-12 10:08 ` Jason Wang
2018-12-12 23:31 ` [PATCH net V2 0/4] Fix various issue of vhost David Miller
2018-12-12 23:31 ` David Miller
2018-12-13  2:42   ` Jason Wang
2018-12-13  2:42   ` Jason Wang
2018-12-12 10:08 Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.