KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V4 0/9] Fixes for metadata accelreation
@ 2019-08-07  7:06 Jason Wang
  2019-08-07  7:06 ` [PATCH V4 1/9] vhost: don't set uaddr for invalid address Jason Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

Hi all:

This series try to fix several issues introduced by meta data
accelreation series. Please review.

Changes from V3:
- remove the unnecessary patch

Changes from V2:
- use seqlck helper to synchronize MMU notifier with vhost worker

Changes from V1:
- try not use RCU to syncrhonize MMU notifier with vhost worker
- set dirty pages after no readers
- return -EAGAIN only when we find the range is overlapped with
  metadata

Jason Wang (9):
  vhost: don't set uaddr for invalid address
  vhost: validate MMU notifier registration
  vhost: fix vhost map leak
  vhost: reset invalidate_count in vhost_set_vring_num_addr()
  vhost: mark dirty pages during map uninit
  vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
  vhost: do not use RCU to synchronize MMU notifier with worker
  vhost: correctly set dirty pages in MMU notifiers callback
  vhost: do not return -EAGAIN for non blocking invalidation too early

 drivers/vhost/vhost.c | 228 +++++++++++++++++++++++++++---------------
 drivers/vhost/vhost.h |   8 +-
 2 files changed, 150 insertions(+), 86 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 1/9] vhost: don't set uaddr for invalid address
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 2/9] vhost: validate MMU notifier registration Jason Wang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

We should not setup uaddr for the invalid address, otherwise we may
try to pin or prefetch mapping of wrong pages.

Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 0536f8526359..488380a581dc 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2082,7 +2082,8 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
 	}
 
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	vhost_setup_vq_uaddr(vq);
+	if (r == 0)
+		vhost_setup_vq_uaddr(vq);
 
 	if (d->mm)
 		mmu_notifier_register(&d->mmu_notifier, d->mm);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 2/9] vhost: validate MMU notifier registration
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
  2019-08-07  7:06 ` [PATCH V4 1/9] vhost: don't set uaddr for invalid address Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 3/9] vhost: fix vhost map leak Jason Wang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

The return value of mmu_notifier_register() is not checked in
vhost_vring_set_num_addr(). This will cause an out of sync between mm
and MMU notifier thus a double free. To solve this, introduce a
boolean flag to track whether MMU notifier is registered and only do
unregistering when it was true.

Reported-and-tested-by:
syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 19 +++++++++++++++----
 drivers/vhost/vhost.h |  1 +
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 488380a581dc..17f6abea192e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -629,6 +629,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->iov_limit = iov_limit;
 	dev->weight = weight;
 	dev->byte_weight = byte_weight;
+	dev->has_notifier = false;
 	init_llist_head(&dev->work_list);
 	init_waitqueue_head(&dev->wait);
 	INIT_LIST_HEAD(&dev->read_list);
@@ -730,6 +731,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	if (err)
 		goto err_mmu_notifier;
 #endif
+	dev->has_notifier = true;
 
 	return 0;
 
@@ -959,7 +961,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 	}
 	if (dev->mm) {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-		mmu_notifier_unregister(&dev->mmu_notifier, dev->mm);
+		if (dev->has_notifier) {
+			mmu_notifier_unregister(&dev->mmu_notifier,
+						dev->mm);
+			dev->has_notifier = false;
+		}
 #endif
 		mmput(dev->mm);
 	}
@@ -2064,8 +2070,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
 	/* Unregister MMU notifer to allow invalidation callback
 	 * can access vq->uaddrs[] without holding a lock.
 	 */
-	if (d->mm)
+	if (d->has_notifier) {
 		mmu_notifier_unregister(&d->mmu_notifier, d->mm);
+		d->has_notifier = false;
+	}
 
 	vhost_uninit_vq_maps(vq);
 #endif
@@ -2085,8 +2093,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
 	if (r == 0)
 		vhost_setup_vq_uaddr(vq);
 
-	if (d->mm)
-		mmu_notifier_register(&d->mmu_notifier, d->mm);
+	if (d->mm) {
+		r = mmu_notifier_register(&d->mmu_notifier, d->mm);
+		if (!r)
+			d->has_notifier = true;
+	}
 #endif
 
 	mutex_unlock(&vq->mutex);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 42a8c2a13ab1..a9a2a93857d2 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -214,6 +214,7 @@ struct vhost_dev {
 	int iov_limit;
 	int weight;
 	int byte_weight;
+	bool has_notifier;
 };
 
 bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 3/9] vhost: fix vhost map leak
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
  2019-08-07  7:06 ` [PATCH V4 1/9] vhost: don't set uaddr for invalid address Jason Wang
  2019-08-07  7:06 ` [PATCH V4 2/9] vhost: validate MMU notifier registration Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 4/9] vhost: reset invalidate_count in vhost_set_vring_num_addr() Jason Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

We don't free map during vhost_map_unprefetch(). This means it could
be leaked. Fixing by free the map.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 17f6abea192e..2a3154976277 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -302,9 +302,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 static void vhost_map_unprefetch(struct vhost_map *map)
 {
 	kfree(map->pages);
-	map->pages = NULL;
-	map->npages = 0;
-	map->addr = NULL;
+	kfree(map);
 }
 
 static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 4/9] vhost: reset invalidate_count in vhost_set_vring_num_addr()
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (2 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 3/9] vhost: fix vhost map leak Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 5/9] vhost: mark dirty pages during map uninit Jason Wang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

The vhost_set_vring_num_addr() could be called in the middle of
invalidate_range_start() and invalidate_range_end(). If we don't reset
invalidate_count after the un-registering of MMU notifier, the
invalidate_cont will run out of sync (e.g never reach zero). This will
in fact disable the fast accessor path. Fixing by reset the count to
zero.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2a3154976277..2a7217c33668 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2073,6 +2073,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
 		d->has_notifier = false;
 	}
 
+	/* reset invalidate_count in case we are in the middle of
+	 * invalidate_start() and invalidate_end().
+	 */
+	vq->invalidate_count = 0;
 	vhost_uninit_vq_maps(vq);
 #endif
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 5/9] vhost: mark dirty pages during map uninit
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (3 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 4/9] vhost: reset invalidate_count in vhost_set_vring_num_addr() Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 6/9] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps() Jason Wang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

We don't mark dirty pages if the map was teared down outside MMU
notifier. This will lead untracked dirty pages. Fixing by marking
dirty pages during map uninit.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2a7217c33668..c12cdadb0855 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -305,6 +305,18 @@ static void vhost_map_unprefetch(struct vhost_map *map)
 	kfree(map);
 }
 
+static void vhost_set_map_dirty(struct vhost_virtqueue *vq,
+				struct vhost_map *map, int index)
+{
+	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
+	int i;
+
+	if (uaddr->write) {
+		for (i = 0; i < map->npages; i++)
+			set_page_dirty(map->pages[i]);
+	}
+}
+
 static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
 {
 	struct vhost_map *map[VHOST_NUM_ADDRS];
@@ -314,8 +326,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
 	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
 		map[i] = rcu_dereference_protected(vq->maps[i],
 				  lockdep_is_held(&vq->mmu_lock));
-		if (map[i])
+		if (map[i]) {
+			vhost_set_map_dirty(vq, map[i], i);
 			rcu_assign_pointer(vq->maps[i], NULL);
+		}
 	}
 	spin_unlock(&vq->mmu_lock);
 
@@ -353,7 +367,6 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
 {
 	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
 	struct vhost_map *map;
-	int i;
 
 	if (!vhost_map_range_overlap(uaddr, start, end))
 		return;
@@ -364,10 +377,7 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
 	map = rcu_dereference_protected(vq->maps[index],
 					lockdep_is_held(&vq->mmu_lock));
 	if (map) {
-		if (uaddr->write) {
-			for (i = 0; i < map->npages; i++)
-				set_page_dirty(map->pages[i]);
-		}
+		vhost_set_map_dirty(vq, map, index);
 		rcu_assign_pointer(vq->maps[index], NULL);
 	}
 	spin_unlock(&vq->mmu_lock);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 6/9] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (4 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 5/9] vhost: mark dirty pages during map uninit Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker Jason Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

There's no need for RCU synchronization in vhost_uninit_vq_maps()
since we've already serialized with readers (memory accessors). This
also avoid the possible userspace DOS through ioctl() because of the
possible high latency caused by synchronize_rcu().

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c12cdadb0855..cfc11f9ed9c9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -333,7 +333,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
 	}
 	spin_unlock(&vq->mmu_lock);
 
-	synchronize_rcu();
+	/* No need for synchronize_rcu() or kfree_rcu() since we are
+	 * serialized with memory accessors (e.g vq mutex held).
+	 */
 
 	for (i = 0; i < VHOST_NUM_ADDRS; i++)
 		if (map[i])
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (5 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 6/9] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps() Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07 12:07   ` Jason Gunthorpe
  2019-08-07  7:06 ` [PATCH V4 8/9] vhost: correctly set dirty pages in MMU notifiers callback Jason Wang
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

We used to use RCU to synchronize MMU notifier with worker. This leads
calling synchronize_rcu() in invalidate_range_start(). But on a busy
system, there would be many factors that may slow down the
synchronize_rcu() which makes it unsuitable to be called in MMU
notifier.

So this patch switches use seqlock counter to track whether or not the
map was used. The counter was increased when vq try to start or finish
uses the map. This means, when it was even, we're sure there's no
readers and MMU notifier is synchronized. When it was odd, it means
there's a reader we need to wait it to be even again then we are
synchronized. Consider the read critical section is pretty small the
synchronization should be done very fast.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 141 ++++++++++++++++++++++++++----------------
 drivers/vhost/vhost.h |   7 ++-
 2 files changed, 90 insertions(+), 58 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index cfc11f9ed9c9..57bfbb60d960 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
 
 	spin_lock(&vq->mmu_lock);
 	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
-		map[i] = rcu_dereference_protected(vq->maps[i],
-				  lockdep_is_held(&vq->mmu_lock));
+		map[i] = vq->maps[i];
 		if (map[i]) {
 			vhost_set_map_dirty(vq, map[i], i);
-			rcu_assign_pointer(vq->maps[i], NULL);
+			vq->maps[i] = NULL;
 		}
 	}
 	spin_unlock(&vq->mmu_lock);
 
-	/* No need for synchronize_rcu() or kfree_rcu() since we are
-	 * serialized with memory accessors (e.g vq mutex held).
+	/* No need for synchronization since we are serialized with
+	 * memory accessors (e.g vq mutex held).
 	 */
 
 	for (i = 0; i < VHOST_NUM_ADDRS; i++)
@@ -362,6 +361,40 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
 	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
 }
 
+static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq)
+{
+	write_seqcount_begin(&vq->seq);
+}
+
+static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
+{
+	write_seqcount_end(&vq->seq);
+}
+
+static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
+{
+	unsigned int seq;
+
+	/* Make sure any changes to map was done before checking seq
+	 * counter. Paired with smp_wmb() in write_seqcount_begin().
+	 */
+	smp_mb();
+	seq = raw_read_seqcount(&vq->seq);
+	/* Odd means the map was currently accessed by vhost worker */
+	if (seq & 0x1) {
+		/* When seq changes, we are sure no reader can see
+		 * previous map */
+		while (raw_read_seqcount(&vq->seq) == seq) {
+			if (need_resched())
+				schedule();
+		}
+	}
+	/* Make sure seq counter was checked before map is
+	 * freed. Paired with smp_wmb() in write_seqcount_end().
+	 */
+	smp_mb();
+}
+
 static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
 				      int index,
 				      unsigned long start,
@@ -376,16 +409,15 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
 	spin_lock(&vq->mmu_lock);
 	++vq->invalidate_count;
 
-	map = rcu_dereference_protected(vq->maps[index],
-					lockdep_is_held(&vq->mmu_lock));
+	map = vq->maps[index];
 	if (map) {
 		vhost_set_map_dirty(vq, map, index);
-		rcu_assign_pointer(vq->maps[index], NULL);
+		vq->maps[index] = NULL;
 	}
 	spin_unlock(&vq->mmu_lock);
 
 	if (map) {
-		synchronize_rcu();
+		vhost_vq_sync_access(vq);
 		vhost_map_unprefetch(map);
 	}
 }
@@ -457,7 +489,7 @@ static void vhost_init_maps(struct vhost_dev *dev)
 	for (i = 0; i < dev->nvqs; ++i) {
 		vq = dev->vqs[i];
 		for (j = 0; j < VHOST_NUM_ADDRS; j++)
-			RCU_INIT_POINTER(vq->maps[j], NULL);
+			vq->maps[j] = NULL;
 	}
 }
 #endif
@@ -655,6 +687,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vq->indirect = NULL;
 		vq->heads = NULL;
 		vq->dev = dev;
+		seqcount_init(&vq->seq);
 		mutex_init(&vq->mutex);
 		spin_lock_init(&vq->mmu_lock);
 		vhost_vq_reset(dev, vq);
@@ -921,7 +954,7 @@ static int vhost_map_prefetch(struct vhost_virtqueue *vq,
 	map->npages = npages;
 	map->pages = pages;
 
-	rcu_assign_pointer(vq->maps[index], map);
+	vq->maps[index] = map;
 	/* No need for a synchronize_rcu(). This function should be
 	 * called by dev->worker so we are serialized with all
 	 * readers.
@@ -1216,18 +1249,18 @@ static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
 	struct vring_used *used;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
+		map = vq->maps[VHOST_ADDR_USED];
 		if (likely(map)) {
 			used = map->addr;
 			*((__virtio16 *)&used->ring[vq->num]) =
 				cpu_to_vhost16(vq, vq->avail_idx);
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1245,18 +1278,18 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
 	size_t size;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
+		map = vq->maps[VHOST_ADDR_USED];
 		if (likely(map)) {
 			used = map->addr;
 			size = count * sizeof(*head);
 			memcpy(used->ring + idx, head, size);
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1272,17 +1305,17 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
 	struct vring_used *used;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
+		map = vq->maps[VHOST_ADDR_USED];
 		if (likely(map)) {
 			used = map->addr;
 			used->flags = cpu_to_vhost16(vq, vq->used_flags);
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1298,17 +1331,17 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 	struct vring_used *used;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
+		map = vq->maps[VHOST_ADDR_USED];
 		if (likely(map)) {
 			used = map->addr;
 			used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1362,17 +1395,17 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
 	struct vring_avail *avail;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
+		map = vq->maps[VHOST_ADDR_AVAIL];
 		if (likely(map)) {
 			avail = map->addr;
 			*idx = avail->idx;
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1387,17 +1420,17 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
 	struct vring_avail *avail;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
+		map = vq->maps[VHOST_ADDR_AVAIL];
 		if (likely(map)) {
 			avail = map->addr;
 			*head = avail->ring[idx & (vq->num - 1)];
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1413,17 +1446,17 @@ static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
 	struct vring_avail *avail;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
+		map = vq->maps[VHOST_ADDR_AVAIL];
 		if (likely(map)) {
 			avail = map->addr;
 			*flags = avail->flags;
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1438,15 +1471,15 @@ static inline int vhost_get_used_event(struct vhost_virtqueue *vq,
 	struct vring_avail *avail;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
+		vhost_vq_access_map_begin(vq);
+		map = vq->maps[VHOST_ADDR_AVAIL];
 		if (likely(map)) {
 			avail = map->addr;
 			*event = (__virtio16)avail->ring[vq->num];
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1461,17 +1494,17 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
 	struct vring_used *used;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
+		map = vq->maps[VHOST_ADDR_USED];
 		if (likely(map)) {
 			used = map->addr;
 			*idx = used->idx;
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1486,17 +1519,17 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
 	struct vring_desc *d;
 
 	if (!vq->iotlb) {
-		rcu_read_lock();
+		vhost_vq_access_map_begin(vq);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_DESC]);
+		map = vq->maps[VHOST_ADDR_DESC];
 		if (likely(map)) {
 			d = map->addr;
 			*desc = *(d + idx);
-			rcu_read_unlock();
+			vhost_vq_access_map_end(vq);
 			return 0;
 		}
 
-		rcu_read_unlock();
+		vhost_vq_access_map_end(vq);
 	}
 #endif
 
@@ -1843,13 +1876,11 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 static void vhost_vq_map_prefetch(struct vhost_virtqueue *vq)
 {
-	struct vhost_map __rcu *map;
+	struct vhost_map *map;
 	int i;
 
 	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
-		rcu_read_lock();
-		map = rcu_dereference(vq->maps[i]);
-		rcu_read_unlock();
+		map = vq->maps[i];
 		if (unlikely(!map))
 			vhost_map_prefetch(vq, i);
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a9a2a93857d2..12399e7c7a61 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -115,16 +115,17 @@ struct vhost_virtqueue {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	/* Read by memory accessors, modified by meta data
 	 * prefetching, MMU notifier and vring ioctl().
-	 * Synchonrized through mmu_lock (writers) and RCU (writers
-	 * and readers).
+	 * Synchonrized through mmu_lock (writers) and seqlock
+         * counters,  see vhost_vq_access_map_{begin|end}().
 	 */
-	struct vhost_map __rcu *maps[VHOST_NUM_ADDRS];
+	struct vhost_map *maps[VHOST_NUM_ADDRS];
 	/* Read by MMU notifier, modified by vring ioctl(),
 	 * synchronized through MMU notifier
 	 * registering/unregistering.
 	 */
 	struct vhost_uaddr uaddrs[VHOST_NUM_ADDRS];
 #endif
+	seqcount_t seq;
 	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
 
 	struct file *kick;
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 8/9] vhost: correctly set dirty pages in MMU notifiers callback
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (6 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-07  7:06 ` [PATCH V4 9/9] vhost: do not return -EAGAIN for non blocking invalidation too early Jason Wang
  2019-08-09  5:15 ` [PATCH V4 0/9] Fixes for metadata accelreation David Miller
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

We need make sure there's no reference on the map before trying to
mark set dirty pages.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 57bfbb60d960..6650a3ff88c1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -410,14 +410,13 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
 	++vq->invalidate_count;
 
 	map = vq->maps[index];
-	if (map) {
-		vhost_set_map_dirty(vq, map, index);
+	if (map)
 		vq->maps[index] = NULL;
-	}
 	spin_unlock(&vq->mmu_lock);
 
 	if (map) {
 		vhost_vq_sync_access(vq);
+		vhost_set_map_dirty(vq, map, index);
 		vhost_map_unprefetch(map);
 	}
 }
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4 9/9] vhost: do not return -EAGAIN for non blocking invalidation too early
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (7 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 8/9] vhost: correctly set dirty pages in MMU notifiers callback Jason Wang
@ 2019-08-07  7:06 ` Jason Wang
  2019-08-09  5:15 ` [PATCH V4 0/9] Fixes for metadata accelreation David Miller
  9 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-07  7:06 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg, Jason Wang

Instead of returning -EAGAIN unconditionally, we'd better do that only
we're sure the range is overlapped with the metadata area.

Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6650a3ff88c1..0271f853fa9c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -395,16 +395,19 @@ static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
 	smp_mb();
 }
 
-static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
-				      int index,
-				      unsigned long start,
-				      unsigned long end)
+static int vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
+				     int index,
+				     unsigned long start,
+				     unsigned long end,
+				     bool blockable)
 {
 	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
 	struct vhost_map *map;
 
 	if (!vhost_map_range_overlap(uaddr, start, end))
-		return;
+		return 0;
+	else if (!blockable)
+		return -EAGAIN;
 
 	spin_lock(&vq->mmu_lock);
 	++vq->invalidate_count;
@@ -419,6 +422,8 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
 		vhost_set_map_dirty(vq, map, index);
 		vhost_map_unprefetch(map);
 	}
+
+	return 0;
 }
 
 static void vhost_invalidate_vq_end(struct vhost_virtqueue *vq,
@@ -439,18 +444,19 @@ static int vhost_invalidate_range_start(struct mmu_notifier *mn,
 {
 	struct vhost_dev *dev = container_of(mn, struct vhost_dev,
 					     mmu_notifier);
-	int i, j;
-
-	if (!mmu_notifier_range_blockable(range))
-		return -EAGAIN;
+	bool blockable = mmu_notifier_range_blockable(range);
+	int i, j, ret;
 
 	for (i = 0; i < dev->nvqs; i++) {
 		struct vhost_virtqueue *vq = dev->vqs[i];
 
-		for (j = 0; j < VHOST_NUM_ADDRS; j++)
-			vhost_invalidate_vq_start(vq, j,
-						  range->start,
-						  range->end);
+		for (j = 0; j < VHOST_NUM_ADDRS; j++) {
+			ret = vhost_invalidate_vq_start(vq, j,
+							range->start,
+							range->end, blockable);
+			if (ret)
+				return ret;
+		}
 	}
 
 	return 0;
-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
  2019-08-07  7:06 ` [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker Jason Wang
@ 2019-08-07 12:07   ` Jason Gunthorpe
  2019-08-07 14:02     ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 12:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm

On Wed, Aug 07, 2019 at 03:06:15AM -0400, Jason Wang wrote:
> We used to use RCU to synchronize MMU notifier with worker. This leads
> calling synchronize_rcu() in invalidate_range_start(). But on a busy
> system, there would be many factors that may slow down the
> synchronize_rcu() which makes it unsuitable to be called in MMU
> notifier.
> 
> So this patch switches use seqlock counter to track whether or not the
> map was used. The counter was increased when vq try to start or finish
> uses the map. This means, when it was even, we're sure there's no
> readers and MMU notifier is synchronized. When it was odd, it means
> there's a reader we need to wait it to be even again then we are
> synchronized. Consider the read critical section is pretty small the
> synchronization should be done very fast.
> 
> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
>  drivers/vhost/vhost.c | 141 ++++++++++++++++++++++++++----------------
>  drivers/vhost/vhost.h |   7 ++-
>  2 files changed, 90 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index cfc11f9ed9c9..57bfbb60d960 100644
> +++ b/drivers/vhost/vhost.c
> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  
>  	spin_lock(&vq->mmu_lock);
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		map[i] = rcu_dereference_protected(vq->maps[i],
> -				  lockdep_is_held(&vq->mmu_lock));
> +		map[i] = vq->maps[i];
>  		if (map[i]) {
>  			vhost_set_map_dirty(vq, map[i], i);
> -			rcu_assign_pointer(vq->maps[i], NULL);
> +			vq->maps[i] = NULL;
>  		}
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
> -	/* No need for synchronize_rcu() or kfree_rcu() since we are
> -	 * serialized with memory accessors (e.g vq mutex held).
> +	/* No need for synchronization since we are serialized with
> +	 * memory accessors (e.g vq mutex held).
>  	 */
>  
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++)
> @@ -362,6 +361,40 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
>  	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
>  }
>  
> +static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq)
> +{
> +	write_seqcount_begin(&vq->seq);
> +}
> +
> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
> +{
> +	write_seqcount_end(&vq->seq);
> +}

The write side of a seqlock only provides write barriers. Access to

	map = vq->maps[VHOST_ADDR_USED];

Still needs a read side barrier, and then I think this will be no
better than a normal spinlock.

It also doesn't seem like this algorithm even needs a seqlock, as this
is just a one bit flag

atomic_set_bit(using map)
smp_mb__after_atomic()
.. maps [...]
atomic_clear_bit(using map)


map = NULL;
smp_mb__before_atomic();
while (atomic_read_bit(using map))
   relax()

Again, not clear this could be faster than a spinlock when the
barriers are correct...

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
  2019-08-07 12:07   ` Jason Gunthorpe
@ 2019-08-07 14:02     ` Jason Wang
  2019-08-08 12:54       ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2019-08-07 14:02 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm


On 2019/8/7 下午8:07, Jason Gunthorpe wrote:
> On Wed, Aug 07, 2019 at 03:06:15AM -0400, Jason Wang wrote:
>> We used to use RCU to synchronize MMU notifier with worker. This leads
>> calling synchronize_rcu() in invalidate_range_start(). But on a busy
>> system, there would be many factors that may slow down the
>> synchronize_rcu() which makes it unsuitable to be called in MMU
>> notifier.
>>
>> So this patch switches use seqlock counter to track whether or not the
>> map was used. The counter was increased when vq try to start or finish
>> uses the map. This means, when it was even, we're sure there's no
>> readers and MMU notifier is synchronized. When it was odd, it means
>> there's a reader we need to wait it to be even again then we are
>> synchronized. Consider the read critical section is pretty small the
>> synchronization should be done very fast.
>>
>> Reported-by: Michael S. Tsirkin <mst@redhat.com>
>> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>   drivers/vhost/vhost.c | 141 ++++++++++++++++++++++++++----------------
>>   drivers/vhost/vhost.h |   7 ++-
>>   2 files changed, 90 insertions(+), 58 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index cfc11f9ed9c9..57bfbb60d960 100644
>> +++ b/drivers/vhost/vhost.c
>> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>>   
>>   	spin_lock(&vq->mmu_lock);
>>   	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
>> -		map[i] = rcu_dereference_protected(vq->maps[i],
>> -				  lockdep_is_held(&vq->mmu_lock));
>> +		map[i] = vq->maps[i];
>>   		if (map[i]) {
>>   			vhost_set_map_dirty(vq, map[i], i);
>> -			rcu_assign_pointer(vq->maps[i], NULL);
>> +			vq->maps[i] = NULL;
>>   		}
>>   	}
>>   	spin_unlock(&vq->mmu_lock);
>>   
>> -	/* No need for synchronize_rcu() or kfree_rcu() since we are
>> -	 * serialized with memory accessors (e.g vq mutex held).
>> +	/* No need for synchronization since we are serialized with
>> +	 * memory accessors (e.g vq mutex held).
>>   	 */
>>   
>>   	for (i = 0; i < VHOST_NUM_ADDRS; i++)
>> @@ -362,6 +361,40 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
>>   	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
>>   }
>>   
>> +static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq)
>> +{
>> +	write_seqcount_begin(&vq->seq);
>> +}
>> +
>> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
>> +{
>> +	write_seqcount_end(&vq->seq);
>> +}
> The write side of a seqlock only provides write barriers. Access to
>
> 	map = vq->maps[VHOST_ADDR_USED];
>
> Still needs a read side barrier, and then I think this will be no
> better than a normal spinlock.
>
> It also doesn't seem like this algorithm even needs a seqlock, as this
> is just a one bit flag


Right, so then I tend to use spinlock first for correctness.


>
> atomic_set_bit(using map)
> smp_mb__after_atomic()
> .. maps [...]
> atomic_clear_bit(using map)
>
>
> map = NULL;
> smp_mb__before_atomic();
> while (atomic_read_bit(using map))
>     relax()
>
> Again, not clear this could be faster than a spinlock when the
> barriers are correct...


Yes, for next release we may want to use the idea from Michael like to 
mitigate the impact of mb.

https://lwn.net/Articles/775871/

Thanks


>
> Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
  2019-08-07 14:02     ` Jason Wang
@ 2019-08-08 12:54       ` Jason Wang
  2019-08-08 13:01         ` Jason Wang
                           ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-08 12:54 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 13623 bytes --]


On 2019/8/7 下午10:02, Jason Wang wrote:
>
> On 2019/8/7 下午8:07, Jason Gunthorpe wrote:
>> On Wed, Aug 07, 2019 at 03:06:15AM -0400, Jason Wang wrote:
>>> We used to use RCU to synchronize MMU notifier with worker. This leads
>>> calling synchronize_rcu() in invalidate_range_start(). But on a busy
>>> system, there would be many factors that may slow down the
>>> synchronize_rcu() which makes it unsuitable to be called in MMU
>>> notifier.
>>>
>>> So this patch switches use seqlock counter to track whether or not the
>>> map was used. The counter was increased when vq try to start or finish
>>> uses the map. This means, when it was even, we're sure there's no
>>> readers and MMU notifier is synchronized. When it was odd, it means
>>> there's a reader we need to wait it to be even again then we are
>>> synchronized. Consider the read critical section is pretty small the
>>> synchronization should be done very fast.
>>>
>>> Reported-by: Michael S. Tsirkin <mst@redhat.com>
>>> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel
>>> virtual address")
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>   drivers/vhost/vhost.c | 141
>>> ++++++++++++++++++++++++++----------------
>>>   drivers/vhost/vhost.h |   7 ++-
>>>   2 files changed, 90 insertions(+), 58 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>> index cfc11f9ed9c9..57bfbb60d960 100644
>>> +++ b/drivers/vhost/vhost.c
>>> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct
>>> vhost_virtqueue *vq)
>>>         spin_lock(&vq->mmu_lock);
>>>       for (i = 0; i < VHOST_NUM_ADDRS; i++) {
>>> -        map[i] = rcu_dereference_protected(vq->maps[i],
>>> -                  lockdep_is_held(&vq->mmu_lock));
>>> +        map[i] = vq->maps[i];
>>>           if (map[i]) {
>>>               vhost_set_map_dirty(vq, map[i], i);
>>> -            rcu_assign_pointer(vq->maps[i], NULL);
>>> +            vq->maps[i] = NULL;
>>>           }
>>>       }
>>>       spin_unlock(&vq->mmu_lock);
>>>   -    /* No need for synchronize_rcu() or kfree_rcu() since we are
>>> -     * serialized with memory accessors (e.g vq mutex held).
>>> +    /* No need for synchronization since we are serialized with
>>> +     * memory accessors (e.g vq mutex held).
>>>        */
>>>         for (i = 0; i < VHOST_NUM_ADDRS; i++)
>>> @@ -362,6 +361,40 @@ static bool vhost_map_range_overlap(struct
>>> vhost_uaddr *uaddr,
>>>       return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 +
>>> uaddr->size);
>>>   }
>>>   +static void inline vhost_vq_access_map_begin(struct
>>> vhost_virtqueue *vq)
>>> +{
>>> +    write_seqcount_begin(&vq->seq);
>>> +}
>>> +
>>> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
>>> +{
>>> +    write_seqcount_end(&vq->seq);
>>> +}
>> The write side of a seqlock only provides write barriers. Access to
>>
>>     map = vq->maps[VHOST_ADDR_USED];
>>
>> Still needs a read side barrier, and then I think this will be no
>> better than a normal spinlock.
>>
>> It also doesn't seem like this algorithm even needs a seqlock, as this
>> is just a one bit flag
>
>
> Right, so then I tend to use spinlock first for correctness.
>
>
>>
>> atomic_set_bit(using map)
>> smp_mb__after_atomic()
>> .. maps [...]
>> atomic_clear_bit(using map)
>>
>>
>> map = NULL;
>> smp_mb__before_atomic();
>> while (atomic_read_bit(using map))
>>     relax()
>>
>> Again, not clear this could be faster than a spinlock when the
>> barriers are correct...
>

I've done some benchmark[1] on x86, and yes it looks even slower. It
looks to me the atomic stuffs is not necessary, so in order to compare
it better with spinlock. I tweak it a little bit through
smp_load_acquire()/store_releaes() + mb() like:

static struct vhost_map *vhost_vq_access_map_begin(struct
vhost_virtqueue
*vq,                                                                          

                                                   unsigned int
type)                                                                                   

{                                                                                                                                                       

       
++vq->counter;                                                                                                                                  

        /* Ensure map was read after incresing the counter.
Paired                                                                                      

         * with smp_mb() in
vhost_vq_sync_access().                                                                                                     

        
*/                                                                                                                                             

       
smp_mb();                                                                                                                                       

        return
vq->maps[type];                                                                                                                          

}                                                                                                                                                       

                                                                                                                                                        

static void inline vhost_vq_access_map_end(struct vhost_virtqueue
*vq)                                                                                  

{                                                                                                                                                       

        /* Ensure all memory access through map was done
before                                                                                         

         * reducing the counter. Paired with smp_load_acquire()
in                                                                                      

         * vhost_vq_sync_access()
*/                                                                                                                    

        smp_store_release(&vq->counter,
--vq->counter);                                                                                                 

}                                                                                                                                                       

                                                                                                                                                        

static void inline vhost_vq_sync_access(struct vhost_virtqueue
*vq)                                                                                     

{                                                                                                                                                       

        /* Ensure new map value is visible before checking counter.
*/                                                                                  

       
smp_mb();                                                                                                                                       

        /* Ensure map was freed after reading counter value,
paired                                                                                     

         * with smp_store_release() in
vhost_vq_access_map_end().                                                                                       

        
*/                                                                                                                                             

        while (smp_load_acquire(&vq->counter))
{                                                                                                        

                if
(need_resched())                                                                                                                     

                       
schedule();                                                                                                                     

       
}                                                                                                                                               

}         


And the result is something like:

         base | direct + atomic bitops | direct + spinlock() | direct +
counter + smp_mb() | direct + RCU     |
SMAP on  | 5.0Mpps | 5.0Mpps     (+0%)      | 5.7Mpps      (+14%)      |
5.9Mpps  (+18%)            | 6.2Mpps  (+24%)  |
SMAP off | 7.0Mpps | 7.0Mpps     (+0%)      | 7.0Mpps   (+0%)     |
7.5Mpps  (+7%)            | 8.2Mpps  (+17%)  |


base: normal copy_to_user()/copy_from_user() path.
direct + atomic bitops: using direct mapping but synchronize through
atomic bitops like you suggested above
direct + spinlock(): using direct mapping but synchronize through spinlocks
direct + counter + smp_mb(): using direct mapping but synchronize
through counter + smp_mb()
direct + RCU: using direct mapping and synchronize through RCU (buggy
and need to be addressed by this series)


So smp_mb() + counter is fastest way. And spinlock can still show some
improvement (+14%) in the case of SMAP, but no the case when SMAP is off.

I don't have any objection to convert  to spinlock() but just want to
know if any case that the above smp_mb() + counter looks good to you?

Thanks


>
> Yes, for next release we may want to use the idea from Michael like to
> mitigate the impact of mb.
>
> https://lwn.net/Articles/775871/
>
> Thanks
>
>
>>
>> Jason

[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 2493 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier       with worker
  2019-08-08 12:54       ` Jason Wang
@ 2019-08-08 13:01         ` Jason Wang
  2019-08-08 13:05         ` Jason Gunthorpe
  2019-08-10 19:12         ` [PATCH V5 0/9] Fixes for vhost metadata acceleration Michael S. Tsirkin
  2 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-08 13:01 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, mst, netdev, linux-kernel, virtualization, linux-mm



----- Original Message -----
> 
> On 2019/8/7 下午10:02, Jason Wang wrote:
> >
> > On 2019/8/7 下午8:07, Jason Gunthorpe wrote:
> >> On Wed, Aug 07, 2019 at 03:06:15AM -0400, Jason Wang wrote:
> >>> We used to use RCU to synchronize MMU notifier with worker. This leads
> >>> calling synchronize_rcu() in invalidate_range_start(). But on a busy
> >>> system, there would be many factors that may slow down the
> >>> synchronize_rcu() which makes it unsuitable to be called in MMU
> >>> notifier.
> >>>
> >>> So this patch switches use seqlock counter to track whether or not the
> >>> map was used. The counter was increased when vq try to start or finish
> >>> uses the map. This means, when it was even, we're sure there's no
> >>> readers and MMU notifier is synchronized. When it was odd, it means
> >>> there's a reader we need to wait it to be even again then we are
> >>> synchronized. Consider the read critical section is pretty small the
> >>> synchronization should be done very fast.
> >>>
> >>> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> >>> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel
> >>> virtual address")
> >>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>   drivers/vhost/vhost.c | 141
> >>> ++++++++++++++++++++++++++----------------
> >>>   drivers/vhost/vhost.h |   7 ++-
> >>>   2 files changed, 90 insertions(+), 58 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>> index cfc11f9ed9c9..57bfbb60d960 100644
> >>> +++ b/drivers/vhost/vhost.c
> >>> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct
> >>> vhost_virtqueue *vq)
> >>>         spin_lock(&vq->mmu_lock);
> >>>       for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> >>> -        map[i] = rcu_dereference_protected(vq->maps[i],
> >>> -                  lockdep_is_held(&vq->mmu_lock));
> >>> +        map[i] = vq->maps[i];
> >>>           if (map[i]) {
> >>>               vhost_set_map_dirty(vq, map[i], i);
> >>> -            rcu_assign_pointer(vq->maps[i], NULL);
> >>> +            vq->maps[i] = NULL;
> >>>           }
> >>>       }
> >>>       spin_unlock(&vq->mmu_lock);
> >>>   -    /* No need for synchronize_rcu() or kfree_rcu() since we are
> >>> -     * serialized with memory accessors (e.g vq mutex held).
> >>> +    /* No need for synchronization since we are serialized with
> >>> +     * memory accessors (e.g vq mutex held).
> >>>        */
> >>>         for (i = 0; i < VHOST_NUM_ADDRS; i++)
> >>> @@ -362,6 +361,40 @@ static bool vhost_map_range_overlap(struct
> >>> vhost_uaddr *uaddr,
> >>>       return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 +
> >>> uaddr->size);
> >>>   }
> >>>   +static void inline vhost_vq_access_map_begin(struct
> >>> vhost_virtqueue *vq)
> >>> +{
> >>> +    write_seqcount_begin(&vq->seq);
> >>> +}
> >>> +
> >>> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
> >>> +{
> >>> +    write_seqcount_end(&vq->seq);
> >>> +}
> >> The write side of a seqlock only provides write barriers. Access to
> >>
> >>     map = vq->maps[VHOST_ADDR_USED];
> >>
> >> Still needs a read side barrier, and then I think this will be no
> >> better than a normal spinlock.
> >>
> >> It also doesn't seem like this algorithm even needs a seqlock, as this
> >> is just a one bit flag
> >
> >
> > Right, so then I tend to use spinlock first for correctness.
> >
> >
> >>
> >> atomic_set_bit(using map)
> >> smp_mb__after_atomic()
> >> .. maps [...]
> >> atomic_clear_bit(using map)
> >>
> >>
> >> map = NULL;
> >> smp_mb__before_atomic();
> >> while (atomic_read_bit(using map))
> >>     relax()
> >>
> >> Again, not clear this could be faster than a spinlock when the
> >> barriers are correct...
> >
> 
> I've done some benchmark[1] on x86, and yes it looks even slower. It
> looks to me the atomic stuffs is not necessary, so in order to compare
> it better with spinlock. I tweak it a little bit through
> smp_load_acquire()/store_releaes() + mb() like:
> 

Sorry the format is messed up:

The code should be something like:

static struct vhost_map *vhost_vq_access_map_begin(struct vhost_virtqueue *vq,
                                                   unsigned int type)
{
        ++vq->counter;
        /* Ensure map was read after incresing the counter. Paired
         * with smp_mb() in vhost_vq_sync_access().
         */
        smp_mb();
        return vq->maps[type];
}

static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
{
 	/* Ensure all memory access through map was done before
         * reducing the counter. Paired with smp_load_acquire() in
         * vhost_vq_sync_access() */
        smp_store_release(&vq->counter, --vq->counter);
}

static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
{
        /* Ensure new map value is visible before checking counter. */
        smp_mb();
        /* Ensure map was freed after reading counter value, paired
         * with smp_store_release() in vhost_vq_access_map_end().
         */
        while (smp_load_acquire(&vq->counter)) {
                if (need_resched())
                        schedule();
        }
}

And the benchmark result is:

         | base    | direct + atomic bitops | direct + spinlock() | direct + counter + smp_mb() | direct + RCU     |
SMAP on  | 5.0Mpps | 5.0Mpps     (+0%)      | 5.7Mpps  	(+14%)	  | 5.9Mpps  (+18%)	        | 6.2Mpps  (+24%)  |
SMAP off | 7.0Mpps | 7.0Mpps     (+0%)      | 7.0Mpps   (+0%)     | 7.5Mpps  (+7%)	        | 8.2Mpps  (+17%)  |


> 
> 
> base: normal copy_to_user()/copy_from_user() path.
> direct + atomic bitops: using direct mapping but synchronize through
> atomic bitops like you suggested above
> direct + spinlock(): using direct mapping but synchronize through spinlocks
> direct + counter + smp_mb(): using direct mapping but synchronize
> through counter + smp_mb()
> direct + RCU: using direct mapping and synchronize through RCU (buggy
> and need to be addressed by this series)
> 
> 
> So smp_mb() + counter is fastest way. And spinlock can still show some
> improvement (+14%) in the case of SMAP, but no the case when SMAP is off.
> 
> I don't have any objection to convert  to spinlock() but just want to
> know if any case that the above smp_mb() + counter looks good to you?
> 
> Thanks
> 
> 
> >
> > Yes, for next release we may want to use the idea from Michael like to
> > mitigate the impact of mb.
> >
> > https://lwn.net/Articles/775871/
> >
> > Thanks
> >
> >
> >>
> >> Jason
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
  2019-08-08 12:54       ` Jason Wang
  2019-08-08 13:01         ` Jason Wang
@ 2019-08-08 13:05         ` Jason Gunthorpe
  2019-08-10 19:12         ` [PATCH V5 0/9] Fixes for vhost metadata acceleration Michael S. Tsirkin
  2 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-08 13:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm

On Thu, Aug 08, 2019 at 08:54:54PM +0800, Jason Wang wrote:

> I don't have any objection to convert  to spinlock() but just want to
> know if any case that the above smp_mb() + counter looks good to you?

This email is horribly mangled, but I don't think mixing smb_mb() and
smp_load_acquire() would be considerd a best-practice, and using
smp_store_release() instead would be the wrong barrier.

spinlock does seem to be the only existing locking primitive that does
what is needed here.

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 0/9] Fixes for metadata accelreation
  2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
                   ` (8 preceding siblings ...)
  2019-08-07  7:06 ` [PATCH V4 9/9] vhost: do not return -EAGAIN for non blocking invalidation too early Jason Wang
@ 2019-08-09  5:15 ` David Miller
  2019-08-09  5:35   ` Jason Wang
  9 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2019-08-09  5:15 UTC (permalink / raw)
  To: jasowang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm, jgg

From: Jason Wang <jasowang@redhat.com>
Date: Wed,  7 Aug 2019 03:06:08 -0400

> This series try to fix several issues introduced by meta data
> accelreation series. Please review.
 ...

My impression is that patch #7 will be changed to use spinlocks so there
will be a v5.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4 0/9] Fixes for metadata accelreation
  2019-08-09  5:15 ` [PATCH V4 0/9] Fixes for metadata accelreation David Miller
@ 2019-08-09  5:35   ` Jason Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-09  5:35 UTC (permalink / raw)
  To: David Miller
  Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm, jgg


On 2019/8/9 下午1:15, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed,  7 Aug 2019 03:06:08 -0400
>
>> This series try to fix several issues introduced by meta data
>> accelreation series. Please review.
>   ...
>
> My impression is that patch #7 will be changed to use spinlocks so there
> will be a v5.
>

Yes. V5 is on the way.

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-08 12:54       ` Jason Wang
  2019-08-08 13:01         ` Jason Wang
  2019-08-08 13:05         ` Jason Gunthorpe
@ 2019-08-10 19:12         ` Michael S. Tsirkin
  2 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2019-08-10 19:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jason Gunthorpe, kvm, virtualization, netdev, linux-kernel, linux-mm

On Thu, Aug 08, 2019 at 08:54:54PM +0800, Jason Wang wrote:
> I don't have any objection to convert  to spinlock() but just want to
> know if any case that the above smp_mb() + counter looks good to you?

So how about we try this:
- revert the original patch for this release
- new safe patch with a spinlock for the next release
- whatever improvements we can come up with on top

Thoughts?

Because I think this needs much more scrutiny than we can
give an incremental patch.

-- 
MST

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-15  3:26             ` Jason Wang
@ 2019-08-15 19:33               ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-15 19:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel, linux-mm

On Thu, Aug 15, 2019 at 11:26:46AM +0800, Jason Wang wrote:
> 
> On 2019/8/13 下午7:57, Jason Gunthorpe wrote:
> > On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
> > 
> > > What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> > > notifier in this series.
> > A GUP that can't sleep can't pagefault which makes it a really weird
> > pattern
> 
> 
> My understanding is __get_user_pages_fast() assumes caller can fail or have
> fallback. And we have graceful fallback to copy_{to|from}_user().

My point is that if you can fall back to copy_user then it is weird to
call the special non-sleeping GUP under a spinlock.

AFAIK the only reason this is done is because of the way the notifier
is being locked...

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-13 16:41             ` Christoph Hellwig
@ 2019-08-15  3:28               ` Jason Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-15  3:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel, linux-mm


On 2019/8/14 上午12:41, Christoph Hellwig wrote:
> On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote:
>> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
>>
>>> What kind of issues do you see? Spinlock is to synchronize GUP with MMU
>>> notifier in this series.
>> A GUP that can't sleep can't pagefault which makes it a really weird
>> pattern
> get_user_pages/get_user_pages_fast must not be called under a spinlock.
> We have the somewhat misnamed __get_user_page_fast that just does a
> lookup for existing pages and never faults for a few places that need
> to do that lookup from contexts where we can't sleep.


Yes, I do use __get_user_pages_fast() in the code.

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-13 11:57           ` Jason Gunthorpe
  2019-08-13 16:41             ` Christoph Hellwig
@ 2019-08-15  3:26             ` Jason Wang
  2019-08-15 19:33               ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Wang @ 2019-08-15  3:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel, linux-mm


On 2019/8/13 下午7:57, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
>
>> What kind of issues do you see? Spinlock is to synchronize GUP with MMU
>> notifier in this series.
> A GUP that can't sleep can't pagefault which makes it a really weird
> pattern


My understanding is __get_user_pages_fast() assumes caller can fail or 
have fallback. And we have graceful fallback to copy_{to|from}_user().


>
>> Btw, back to the original question. May I know why synchronize_rcu() is not
>> suitable? Consider:
> We already went over this. You'd need to determine it doesn't somehow
> deadlock the mm on reclaim paths. Maybe it is OK, the rcq_gq_wq is
> marked WQ_MEM_RECLAIM at least..


Yes, will take a look at this.


>
> I also think Michael was concerned about the latency spikes a long RCU
> delay would cause.


I don't think it's a real problem consider MMU notifier could be 
preempted or blocked.

Thanks


>
> Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-13 11:57           ` Jason Gunthorpe
@ 2019-08-13 16:41             ` Christoph Hellwig
  2019-08-15  3:28               ` Jason Wang
  2019-08-15  3:26             ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2019-08-13 16:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Michael S. Tsirkin, kvm, virtualization, netdev,
	linux-kernel, linux-mm

On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
> 
> > What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> > notifier in this series.
> 
> A GUP that can't sleep can't pagefault which makes it a really weird
> pattern

get_user_pages/get_user_pages_fast must not be called under a spinlock.
We have the somewhat misnamed __get_user_page_fast that just does a
lookup for existing pages and never faults for a few places that need
to do that lookup from contexts where we can't sleep.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-13  8:31         ` Jason Wang
@ 2019-08-13 11:57           ` Jason Gunthorpe
  2019-08-13 16:41             ` Christoph Hellwig
  2019-08-15  3:26             ` Jason Wang
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-13 11:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel, linux-mm

On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:

> What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> notifier in this series.

A GUP that can't sleep can't pagefault which makes it a really weird
pattern

> Btw, back to the original question. May I know why synchronize_rcu() is not
> suitable? Consider:

We already went over this. You'd need to determine it doesn't somehow
deadlock the mm on reclaim paths. Maybe it is OK, the rcq_gq_wq is
marked WQ_MEM_RECLAIM at least..

I also think Michael was concerned about the latency spikes a long RCU
delay would cause.

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-12 13:02       ` Jason Gunthorpe
@ 2019-08-13  8:31         ` Jason Wang
  2019-08-13 11:57           ` Jason Gunthorpe
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2019-08-13  8:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, linux-mm


On 2019/8/12 下午9:02, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2019 at 05:49:08AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
>>> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>>>> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>>>>> Hi all:
>>>>>
>>>>> This series try to fix several issues introduced by meta data
>>>>> accelreation series. Please review.
>>>>>
>>>>> Changes from V4:
>>>>> - switch to use spinlock synchronize MMU notifier with accessors
>>>>>
>>>>> Changes from V3:
>>>>> - remove the unnecessary patch
>>>>>
>>>>> Changes from V2:
>>>>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>>>>
>>>>> Changes from V1:
>>>>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>>>>> - set dirty pages after no readers
>>>>> - return -EAGAIN only when we find the range is overlapped with
>>>>>     metadata
>>>>>
>>>>> Jason Wang (9):
>>>>>     vhost: don't set uaddr for invalid address
>>>>>     vhost: validate MMU notifier registration
>>>>>     vhost: fix vhost map leak
>>>>>     vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>>>>     vhost: mark dirty pages during map uninit
>>>>>     vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>>>>     vhost: do not use RCU to synchronize MMU notifier with worker
>>>>>     vhost: correctly set dirty pages in MMU notifiers callback
>>>>>     vhost: do not return -EAGAIN for non blocking invalidation too early
>>>>>
>>>>>    drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>>>>    drivers/vhost/vhost.h |   6 +-
>>>>>    2 files changed, 122 insertions(+), 86 deletions(-)
>>>> This generally looks more solid.
>>>>
>>>> But this amounts to a significant overhaul of the code.
>>>>
>>>> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
>>>> for this release, and then re-apply a corrected version
>>>> for the next one?
>>>
>>> If possible, consider we've actually disabled the feature. How about just
>>> queued those patches for next release?
>>>
>>> Thanks
>> Sorry if I was unclear. My idea is that
>> 1. I revert the disabled code
>> 2. You send a patch readding it with all the fixes squashed
>> 3. Maybe optimizations on top right away?
>> 4. We queue *that* for next and see what happens.
>>
>> And the advantage over the patchy approach is that the current patches
>> are hard to review. E.g.  it's not reasonable to ask RCU guys to review
>> the whole of vhost for RCU usage but it's much more reasonable to ask
>> about a specific patch.
> I think there are other problems here too, I don't like that the use
> of mmu notifiers is so different from every other driver, or that GUP
> is called under spinlock.


What kind of issues do you see? Spinlock is to synchronize GUP with MMU 
notifier in this series.

Btw, back to the original question. May I know why synchronize_rcu() is 
not suitable? Consider:

- MMU notifier are allowed to sleep
- MMU notifier could be preempted

If you mean something that prevents RCU grace period from running. I'm 
afraid MMU notifier is not the only victim.  But it should be no more 
worse than some one is holding a lock for very long time. If the only 
concern is the preemption of vhost kthread, I can switch to use 
rcu_read_lock_bh() instead.

Thanks


>
> So I favor the revert and try again approach as well. It is hard to
> get a clear picture with these endless bug fix patches
>
> Jason


Ok.

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-12  9:49     ` Michael S. Tsirkin
  2019-08-12 13:02       ` Jason Gunthorpe
@ 2019-08-13  8:12       ` Jason Wang
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-13  8:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, linux-mm, jgg


On 2019/8/12 下午5:49, Michael S. Tsirkin wrote:
> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
>> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>>> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>>>> Hi all:
>>>>
>>>> This series try to fix several issues introduced by meta data
>>>> accelreation series. Please review.
>>>>
>>>> Changes from V4:
>>>> - switch to use spinlock synchronize MMU notifier with accessors
>>>>
>>>> Changes from V3:
>>>> - remove the unnecessary patch
>>>>
>>>> Changes from V2:
>>>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>>>
>>>> Changes from V1:
>>>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>>>> - set dirty pages after no readers
>>>> - return -EAGAIN only when we find the range is overlapped with
>>>>     metadata
>>>>
>>>> Jason Wang (9):
>>>>     vhost: don't set uaddr for invalid address
>>>>     vhost: validate MMU notifier registration
>>>>     vhost: fix vhost map leak
>>>>     vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>>>     vhost: mark dirty pages during map uninit
>>>>     vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>>>     vhost: do not use RCU to synchronize MMU notifier with worker
>>>>     vhost: correctly set dirty pages in MMU notifiers callback
>>>>     vhost: do not return -EAGAIN for non blocking invalidation too early
>>>>
>>>>    drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>>>    drivers/vhost/vhost.h |   6 +-
>>>>    2 files changed, 122 insertions(+), 86 deletions(-)
>>> This generally looks more solid.
>>>
>>> But this amounts to a significant overhaul of the code.
>>>
>>> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
>>> for this release, and then re-apply a corrected version
>>> for the next one?
>>
>> If possible, consider we've actually disabled the feature. How about just
>> queued those patches for next release?
>>
>> Thanks
> Sorry if I was unclear. My idea is that
> 1. I revert the disabled code
> 2. You send a patch readding it with all the fixes squashed
> 3. Maybe optimizations on top right away?
> 4. We queue *that* for next and see what happens.
>
> And the advantage over the patchy approach is that the current patches
> are hard to review. E.g.  it's not reasonable to ask RCU guys to review
> the whole of vhost for RCU usage but it's much more reasonable to ask
> about a specific patch.


Ok. Then I agree to revert.

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-12  9:49     ` Michael S. Tsirkin
@ 2019-08-12 13:02       ` Jason Gunthorpe
  2019-08-13  8:31         ` Jason Wang
  2019-08-13  8:12       ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-12 13:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel, linux-mm

On Mon, Aug 12, 2019 at 05:49:08AM -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
> > 
> > On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
> > > On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
> > > > Hi all:
> > > > 
> > > > This series try to fix several issues introduced by meta data
> > > > accelreation series. Please review.
> > > > 
> > > > Changes from V4:
> > > > - switch to use spinlock synchronize MMU notifier with accessors
> > > > 
> > > > Changes from V3:
> > > > - remove the unnecessary patch
> > > > 
> > > > Changes from V2:
> > > > - use seqlck helper to synchronize MMU notifier with vhost worker
> > > > 
> > > > Changes from V1:
> > > > - try not use RCU to syncrhonize MMU notifier with vhost worker
> > > > - set dirty pages after no readers
> > > > - return -EAGAIN only when we find the range is overlapped with
> > > >    metadata
> > > > 
> > > > Jason Wang (9):
> > > >    vhost: don't set uaddr for invalid address
> > > >    vhost: validate MMU notifier registration
> > > >    vhost: fix vhost map leak
> > > >    vhost: reset invalidate_count in vhost_set_vring_num_addr()
> > > >    vhost: mark dirty pages during map uninit
> > > >    vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
> > > >    vhost: do not use RCU to synchronize MMU notifier with worker
> > > >    vhost: correctly set dirty pages in MMU notifiers callback
> > > >    vhost: do not return -EAGAIN for non blocking invalidation too early
> > > > 
> > > >   drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
> > > >   drivers/vhost/vhost.h |   6 +-
> > > >   2 files changed, 122 insertions(+), 86 deletions(-)
> > > This generally looks more solid.
> > > 
> > > But this amounts to a significant overhaul of the code.
> > > 
> > > At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
> > > for this release, and then re-apply a corrected version
> > > for the next one?
> > 
> > 
> > If possible, consider we've actually disabled the feature. How about just
> > queued those patches for next release?
> > 
> > Thanks
> 
> Sorry if I was unclear. My idea is that
> 1. I revert the disabled code
> 2. You send a patch readding it with all the fixes squashed
> 3. Maybe optimizations on top right away?
> 4. We queue *that* for next and see what happens.
> 
> And the advantage over the patchy approach is that the current patches
> are hard to review. E.g.  it's not reasonable to ask RCU guys to review
> the whole of vhost for RCU usage but it's much more reasonable to ask
> about a specific patch.

I think there are other problems here too, I don't like that the use
of mmu notifiers is so different from every other driver, or that GUP
is called under spinlock.

So I favor the revert and try again approach as well. It is hard to
get a clear picture with these endless bug fix patches

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-12  2:44   ` Jason Wang
  2019-08-12  4:13     ` David Miller
@ 2019-08-12  9:49     ` Michael S. Tsirkin
  2019-08-12 13:02       ` Jason Gunthorpe
  2019-08-13  8:12       ` Jason Wang
  1 sibling, 2 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2019-08-12  9:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, linux-mm, jgg

On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
> 
> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
> > On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
> > > Hi all:
> > > 
> > > This series try to fix several issues introduced by meta data
> > > accelreation series. Please review.
> > > 
> > > Changes from V4:
> > > - switch to use spinlock synchronize MMU notifier with accessors
> > > 
> > > Changes from V3:
> > > - remove the unnecessary patch
> > > 
> > > Changes from V2:
> > > - use seqlck helper to synchronize MMU notifier with vhost worker
> > > 
> > > Changes from V1:
> > > - try not use RCU to syncrhonize MMU notifier with vhost worker
> > > - set dirty pages after no readers
> > > - return -EAGAIN only when we find the range is overlapped with
> > >    metadata
> > > 
> > > Jason Wang (9):
> > >    vhost: don't set uaddr for invalid address
> > >    vhost: validate MMU notifier registration
> > >    vhost: fix vhost map leak
> > >    vhost: reset invalidate_count in vhost_set_vring_num_addr()
> > >    vhost: mark dirty pages during map uninit
> > >    vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
> > >    vhost: do not use RCU to synchronize MMU notifier with worker
> > >    vhost: correctly set dirty pages in MMU notifiers callback
> > >    vhost: do not return -EAGAIN for non blocking invalidation too early
> > > 
> > >   drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
> > >   drivers/vhost/vhost.h |   6 +-
> > >   2 files changed, 122 insertions(+), 86 deletions(-)
> > This generally looks more solid.
> > 
> > But this amounts to a significant overhaul of the code.
> > 
> > At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
> > for this release, and then re-apply a corrected version
> > for the next one?
> 
> 
> If possible, consider we've actually disabled the feature. How about just
> queued those patches for next release?
> 
> Thanks

Sorry if I was unclear. My idea is that
1. I revert the disabled code
2. You send a patch readding it with all the fixes squashed
3. Maybe optimizations on top right away?
4. We queue *that* for next and see what happens.

And the advantage over the patchy approach is that the current patches
are hard to review. E.g.  it's not reasonable to ask RCU guys to review
the whole of vhost for RCU usage but it's much more reasonable to ask
about a specific patch.


-- 
MST

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-12  2:44   ` Jason Wang
@ 2019-08-12  4:13     ` David Miller
  2019-08-12  9:49     ` Michael S. Tsirkin
  1 sibling, 0 replies; 31+ messages in thread
From: David Miller @ 2019-08-12  4:13 UTC (permalink / raw)
  To: jasowang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm, jgg

From: Jason Wang <jasowang@redhat.com>
Date: Mon, 12 Aug 2019 10:44:51 +0800

> On 2019/8/11 ^[$B>e8a^[(B1:52, Michael S. Tsirkin wrote:
>> At this point how about we revert
>> 7f466032dc9e5a61217f22ea34b2df932786bbfc
>> for this release, and then re-apply a corrected version
>> for the next one?
> 
> If possible, consider we've actually disabled the feature. How about
> just queued those patches for next release?

I'm tossing this series while you and Michael decide how to move forward.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-10 17:52 ` Michael S. Tsirkin
@ 2019-08-12  2:44   ` Jason Wang
  2019-08-12  4:13     ` David Miller
  2019-08-12  9:49     ` Michael S. Tsirkin
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Wang @ 2019-08-12  2:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, linux-mm, jgg


On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>> Hi all:
>>
>> This series try to fix several issues introduced by meta data
>> accelreation series. Please review.
>>
>> Changes from V4:
>> - switch to use spinlock synchronize MMU notifier with accessors
>>
>> Changes from V3:
>> - remove the unnecessary patch
>>
>> Changes from V2:
>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>
>> Changes from V1:
>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>> - set dirty pages after no readers
>> - return -EAGAIN only when we find the range is overlapped with
>>    metadata
>>
>> Jason Wang (9):
>>    vhost: don't set uaddr for invalid address
>>    vhost: validate MMU notifier registration
>>    vhost: fix vhost map leak
>>    vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>    vhost: mark dirty pages during map uninit
>>    vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>    vhost: do not use RCU to synchronize MMU notifier with worker
>>    vhost: correctly set dirty pages in MMU notifiers callback
>>    vhost: do not return -EAGAIN for non blocking invalidation too early
>>
>>   drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>   drivers/vhost/vhost.h |   6 +-
>>   2 files changed, 122 insertions(+), 86 deletions(-)
> This generally looks more solid.
>
> But this amounts to a significant overhaul of the code.
>
> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
> for this release, and then re-apply a corrected version
> for the next one?


If possible, consider we've actually disabled the feature. How about 
just queued those patches for next release?

Thanks


>
>> -- 
>> 2.18.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
  2019-08-09  5:48 [PATCH V5 0/9] Fixes for vhost metadata acceleration Jason Wang
@ 2019-08-10 17:52 ` Michael S. Tsirkin
  2019-08-12  2:44   ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2019-08-10 17:52 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel, linux-mm, jgg

On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
> Hi all:
> 
> This series try to fix several issues introduced by meta data
> accelreation series. Please review.
> 
> Changes from V4:
> - switch to use spinlock synchronize MMU notifier with accessors
> 
> Changes from V3:
> - remove the unnecessary patch
> 
> Changes from V2:
> - use seqlck helper to synchronize MMU notifier with vhost worker
> 
> Changes from V1:
> - try not use RCU to syncrhonize MMU notifier with vhost worker
> - set dirty pages after no readers
> - return -EAGAIN only when we find the range is overlapped with
>   metadata
> 
> Jason Wang (9):
>   vhost: don't set uaddr for invalid address
>   vhost: validate MMU notifier registration
>   vhost: fix vhost map leak
>   vhost: reset invalidate_count in vhost_set_vring_num_addr()
>   vhost: mark dirty pages during map uninit
>   vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>   vhost: do not use RCU to synchronize MMU notifier with worker
>   vhost: correctly set dirty pages in MMU notifiers callback
>   vhost: do not return -EAGAIN for non blocking invalidation too early
> 
>  drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>  drivers/vhost/vhost.h |   6 +-
>  2 files changed, 122 insertions(+), 86 deletions(-)

This generally looks more solid.

But this amounts to a significant overhaul of the code.

At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
for this release, and then re-apply a corrected version
for the next one?


> -- 
> 2.18.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V5 0/9] Fixes for vhost metadata acceleration
@ 2019-08-09  5:48 Jason Wang
  2019-08-10 17:52 ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2019-08-09  5:48 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: linux-mm, jgg, Jason Wang

Hi all:

This series try to fix several issues introduced by meta data
accelreation series. Please review.

Changes from V4:
- switch to use spinlock synchronize MMU notifier with accessors

Changes from V3:
- remove the unnecessary patch

Changes from V2:
- use seqlck helper to synchronize MMU notifier with vhost worker

Changes from V1:
- try not use RCU to syncrhonize MMU notifier with vhost worker
- set dirty pages after no readers
- return -EAGAIN only when we find the range is overlapped with
  metadata

Jason Wang (9):
  vhost: don't set uaddr for invalid address
  vhost: validate MMU notifier registration
  vhost: fix vhost map leak
  vhost: reset invalidate_count in vhost_set_vring_num_addr()
  vhost: mark dirty pages during map uninit
  vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
  vhost: do not use RCU to synchronize MMU notifier with worker
  vhost: correctly set dirty pages in MMU notifiers callback
  vhost: do not return -EAGAIN for non blocking invalidation too early

 drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
 drivers/vhost/vhost.h |   6 +-
 2 files changed, 122 insertions(+), 86 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-07  7:06 [PATCH V4 0/9] Fixes for metadata accelreation Jason Wang
2019-08-07  7:06 ` [PATCH V4 1/9] vhost: don't set uaddr for invalid address Jason Wang
2019-08-07  7:06 ` [PATCH V4 2/9] vhost: validate MMU notifier registration Jason Wang
2019-08-07  7:06 ` [PATCH V4 3/9] vhost: fix vhost map leak Jason Wang
2019-08-07  7:06 ` [PATCH V4 4/9] vhost: reset invalidate_count in vhost_set_vring_num_addr() Jason Wang
2019-08-07  7:06 ` [PATCH V4 5/9] vhost: mark dirty pages during map uninit Jason Wang
2019-08-07  7:06 ` [PATCH V4 6/9] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps() Jason Wang
2019-08-07  7:06 ` [PATCH V4 7/9] vhost: do not use RCU to synchronize MMU notifier with worker Jason Wang
2019-08-07 12:07   ` Jason Gunthorpe
2019-08-07 14:02     ` Jason Wang
2019-08-08 12:54       ` Jason Wang
2019-08-08 13:01         ` Jason Wang
2019-08-08 13:05         ` Jason Gunthorpe
2019-08-10 19:12         ` [PATCH V5 0/9] Fixes for vhost metadata acceleration Michael S. Tsirkin
2019-08-07  7:06 ` [PATCH V4 8/9] vhost: correctly set dirty pages in MMU notifiers callback Jason Wang
2019-08-07  7:06 ` [PATCH V4 9/9] vhost: do not return -EAGAIN for non blocking invalidation too early Jason Wang
2019-08-09  5:15 ` [PATCH V4 0/9] Fixes for metadata accelreation David Miller
2019-08-09  5:35   ` Jason Wang
2019-08-09  5:48 [PATCH V5 0/9] Fixes for vhost metadata acceleration Jason Wang
2019-08-10 17:52 ` Michael S. Tsirkin
2019-08-12  2:44   ` Jason Wang
2019-08-12  4:13     ` David Miller
2019-08-12  9:49     ` Michael S. Tsirkin
2019-08-12 13:02       ` Jason Gunthorpe
2019-08-13  8:31         ` Jason Wang
2019-08-13 11:57           ` Jason Gunthorpe
2019-08-13 16:41             ` Christoph Hellwig
2019-08-15  3:28               ` Jason Wang
2019-08-15  3:26             ` Jason Wang
2019-08-15 19:33               ` Jason Gunthorpe
2019-08-13  8:12       ` Jason Wang

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox