All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/6] mm/vmalloc: Assorted fixes and improvements
@ 2023-05-23 14:02 Thomas Gleixner
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
                   ` (6 more replies)
  0 siblings, 7 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

Following up to the discussion about excessive TLB flushes

   https://lore.kernel.org/all/87a5y5a6kj.ffs@tglx

this series addresses the following issues:


  1) Prevent the stale TLB problem related to fully utilized vmap blocks


  2) Avoid the double per CPU list walk in _vm_unmap_aliases()


  3) Avoid flushing dirty space over and over


  4) Add a lockless quickcheck in vb_alloc() and add missing
     READ/WRITE_ONCE() annotations

  5) Prevent overeager purging of usable vmap_blocks if
     not under memory/address space pressure.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
@ 2023-05-23 14:02 ` Thomas Gleixner
  2023-05-23 15:17   ` Christoph Hellwig
                     ` (3 more replies)
  2023-05-23 14:02 ` [patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
                   ` (5 subsequent siblings)
  6 siblings, 4 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

_vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
page are left in the system. This is required due to the lazy TLB flush
mechanism in vmalloc.

This is tried to achieve by walking the per CPU free lists, but those do
not contain fully utilized vmap blocks because they are removed from the
free list once the blocks free space became zero.

So the per CPU list iteration does not find the block and if the page was
mapped via such a block and the TLB has not yet been flushed, the guarantee
of _vm_unmap_aliases() that there are no stale TLBs after returning is
broken:

x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
vb_free(x)     // Unmaps page and marks in dirty_min/max range

// Page is reused
vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL

So instead of walking the per CPU free lists, walk the per CPU xarrays
which hold pointers to _all_ active blocks in the system including those
removed from the free lists.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
 	for_each_possible_cpu(cpu) {
 		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
 		struct vmap_block *vb;
+		unsigned long idx;
 
 		rcu_read_lock();
-		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
+		xa_for_each(&vbq->vmap_blocks, idx, vb) {
 			spin_lock(&vb->lock);
 			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
 				unsigned long va_start = vb->va->va_start;



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
@ 2023-05-23 14:02 ` Thomas Gleixner
  2023-05-23 15:21   ` Christoph Hellwig
  2023-05-23 14:02 ` [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

_vunmap_aliases() walks the per CPU xarrays to find partially unmapped
blocks and then walks the per cpu free lists to purge fragmented blocks.

Arguably that's waste of CPU cycles and cache lines as the full xarray walk
already touches every block.

Avoid this double iteration:

  - Split out the code to purge one block and the code to free the local
    purge list into helper functions.

  - Try to purge the fragmented blocks in the xarray walk before looking at
    their dirty space.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |   66 ++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 43 insertions(+), 23 deletions(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2086,39 +2086,52 @@ static void free_vmap_block(struct vmap_
 	kfree_rcu(vb, rcu_head);
 }
 
+static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
+				   struct list_head *purge_list)
+{
+	if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
+		return false;
+
+	 /* prevent further allocs after releasing lock */
+	vb->free = 0;
+	 /* prevent purging it again */
+	vb->dirty = VMAP_BBMAP_BITS;
+	vb->dirty_min = 0;
+	vb->dirty_max = VMAP_BBMAP_BITS;
+	spin_lock(&vbq->lock);
+	list_del_rcu(&vb->free_list);
+	spin_unlock(&vbq->lock);
+	list_add_tail(&vb->purge, purge_list);
+	return true;
+}
+
+static void free_purged_blocks(struct list_head *purge_list)
+{
+	struct vmap_block *vb, *n_vb;
+
+	list_for_each_entry_safe(vb, n_vb, purge_list, purge) {
+		list_del(&vb->purge);
+		free_vmap_block(vb);
+	}
+}
+
 static void purge_fragmented_blocks(int cpu)
 {
 	LIST_HEAD(purge);
 	struct vmap_block *vb;
-	struct vmap_block *n_vb;
 	struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-
 		if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
 			continue;
 
 		spin_lock(&vb->lock);
-		if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
-			vb->free = 0; /* prevent further allocs after releasing lock */
-			vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
-			vb->dirty_min = 0;
-			vb->dirty_max = VMAP_BBMAP_BITS;
-			spin_lock(&vbq->lock);
-			list_del_rcu(&vb->free_list);
-			spin_unlock(&vbq->lock);
-			spin_unlock(&vb->lock);
-			list_add_tail(&vb->purge, &purge);
-		} else
-			spin_unlock(&vb->lock);
+		purge_fragmented_block(vb, vbq, &purge);
+		spin_unlock(&vb->lock);
 	}
 	rcu_read_unlock();
-
-	list_for_each_entry_safe(vb, n_vb, &purge, purge) {
-		list_del(&vb->purge);
-		free_vmap_block(vb);
-	}
+	free_purged_blocks(&purge);
 }
 
 static void purge_fragmented_blocks_allcpus(void)
@@ -2226,12 +2239,13 @@ static void vb_free(unsigned long addr,
 
 static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
 {
+	LIST_HEAD(purge_list);
 	int cpu;
 
 	if (unlikely(!vmap_initialized))
 		return;
 
-	might_sleep();
+	mutex_lock(&vmap_purge_lock);
 
 	for_each_possible_cpu(cpu) {
 		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
@@ -2241,7 +2255,14 @@ static void _vm_unmap_aliases(unsigned l
 		rcu_read_lock();
 		xa_for_each(&vbq->vmap_blocks, idx, vb) {
 			spin_lock(&vb->lock);
-			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
+
+			/*
+			 * Try to purge a fragmented block first. If it's
+			 * not purgeable, check whether there is dirty
+			 * space to be flushed.
+			 */
+			if (!purge_fragmented_block(vb, vbq, &purge_list) &&
+			    vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
 				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
 
@@ -2257,9 +2278,8 @@ static void _vm_unmap_aliases(unsigned l
 		}
 		rcu_read_unlock();
 	}
+	free_purged_blocks(&purge_list);
 
-	mutex_lock(&vmap_purge_lock);
-	purge_fragmented_blocks_allcpus();
 	if (!__purge_vmap_area_lazy(start, end) && flush)
 		flush_tlb_kernel_range(start, end);
 	mutex_unlock(&vmap_purge_lock);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
  2023-05-23 14:02 ` [patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
@ 2023-05-23 14:02 ` Thomas Gleixner
  2023-05-23 15:27   ` Christoph Hellwig
  2023-05-24  9:43   ` Baoquan He
  2023-05-23 14:02 ` [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

vmap blocks which have active mappings cannot be purged. Allocations which
have been freed are accounted for in vmap_block::dirty_min/max, so that
they can be detected in _vm_unmap_aliases() as potentially stale TLBs.

If there are several invocations of _vm_unmap_aliases() then each of them
will flush the dirty range. That's pointless and just increases the
probability of full TLB flushes.

Avoid that by resetting the flush range after accounting for it. That's
safe versus other invocations of _vm_unmap_aliases() because this is all
serialized with vmap_purge_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2224,7 +2224,7 @@ static void vb_free(unsigned long addr,
 
 	spin_lock(&vb->lock);
 
-	/* Expand dirty range */
+	/* Expand the not yet TLB flushed dirty range */
 	vb->dirty_min = min(vb->dirty_min, offset);
 	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
 
@@ -2262,7 +2262,7 @@ static void _vm_unmap_aliases(unsigned l
 			 * space to be flushed.
 			 */
 			if (!purge_fragmented_block(vb, vbq, &purge_list) &&
-			    vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
+			    vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
 				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
 
@@ -2272,6 +2272,10 @@ static void _vm_unmap_aliases(unsigned l
 				start = min(s, start);
 				end   = max(e, end);
 
+				/* Prevent that this is flushed more than once */
+				vb->dirty_min = VMAP_BBMAP_BITS;
+				vb->dirty_max = 0;
+
 				flush = 1;
 			}
 			spin_unlock(&vb->lock);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
                   ` (2 preceding siblings ...)
  2023-05-23 14:02 ` [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
@ 2023-05-23 14:02 ` Thomas Gleixner
  2023-05-23 15:29   ` Christoph Hellwig
  2023-05-23 14:02 ` [patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

vb_alloc() unconditionally locks a vmap_block on the free list to check the
free space.

This can be done locklessly because vmap_block::free never increases, it's
only decreased on allocations.

Check the free space lockless and only if that succeeds, recheck under the
lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2166,6 +2166,9 @@ static void *vb_alloc(unsigned long size
 	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
 		unsigned long pages_off;
 
+		if (READ_ONCE(vb->free) < (1UL << order))
+			continue;
+
 		spin_lock(&vb->lock);
 		if (vb->free < (1UL << order)) {
 			spin_unlock(&vb->lock);
@@ -2174,7 +2177,7 @@ static void *vb_alloc(unsigned long size
 
 		pages_off = VMAP_BBMAP_BITS - vb->free;
 		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
-		vb->free -= 1UL << order;
+		WRITE_ONCE(vb->free, vb->free - (1UL << order));
 		bitmap_set(vb->used_map, pages_off, (1UL << order));
 		if (vb->free == 0) {
 			spin_lock(&vbq->lock);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
                   ` (3 preceding siblings ...)
  2023-05-23 14:02 ` [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
@ 2023-05-23 14:02 ` Thomas Gleixner
  2023-05-24  9:15   ` Uladzislau Rezki
  2023-05-23 14:02 ` [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
  2023-05-23 16:24 ` [patch 0/6] mm/vmalloc: Assorted fixes and improvements Uladzislau Rezki
  6 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty
lockless for a quick check.

Add the missing READ/WRITE_ONCE() annotations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2093,9 +2093,9 @@ static bool purge_fragmented_block(struc
 		return false;
 
 	 /* prevent further allocs after releasing lock */
-	vb->free = 0;
+	WRITE_ONCE(vb->free, 0);
 	 /* prevent purging it again */
-	vb->dirty = VMAP_BBMAP_BITS;
+	WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
 	vb->dirty_min = 0;
 	vb->dirty_max = VMAP_BBMAP_BITS;
 	spin_lock(&vbq->lock);
@@ -2123,7 +2123,10 @@ static void purge_fragmented_blocks(int
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-		if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
+		unsigned long free = READ_ONCE(vb->free);
+		unsigned long dirty = READ_ONCE(vb->dirty);
+
+		if (!(free + dirty == VMAP_BBMAP_BITS && dirty != VMAP_BBMAP_BITS))
 			continue;
 
 		spin_lock(&vb->lock);
@@ -2231,7 +2234,7 @@ static void vb_free(unsigned long addr,
 	vb->dirty_min = min(vb->dirty_min, offset);
 	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
 
-	vb->dirty += 1UL << order;
+	WRITE_ONCE(vb->dirty, vb->dirty + (1UL << order));
 	if (vb->dirty == VMAP_BBMAP_BITS) {
 		BUG_ON(vb->free);
 		spin_unlock(&vb->lock);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
                   ` (4 preceding siblings ...)
  2023-05-23 14:02 ` [patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
@ 2023-05-23 14:02 ` Thomas Gleixner
  2023-05-23 15:30   ` Christoph Hellwig
  2023-05-24 10:34   ` Baoquan He
  2023-05-23 16:24 ` [patch 0/6] mm/vmalloc: Assorted fixes and improvements Uladzislau Rezki
  6 siblings, 2 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 14:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

Purging fragmented blocks is done unconditionally in several contexts:

  1) From drain_vmap_area_work(), when the number of lazy to be freed
     vmap_areas reached the threshold

  2) Reclaiming vmalloc address space from pcpu_get_vm_areas()

  3) _unmap_aliases()

#1 There is no reason to zap fragmented vmap blocks unconditionally, simply
   because reclaiming all lazy areas drains at least

      32MB * fls(num_online_cpus())

   per invocation which is plenty.

#2 Reclaiming when running out of space or due to memory pressure makes a
   lot of sense

#3 _unmap_aliases() requires to touch everything because the caller has no
   clue which vmap_area used a particular page last and the vmap_area lost
   that information too.

   Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
   vmap area first and then cares about the flush. That in turn requires
   a full walk of _all_ vmap areas including the one which was just
   added to the purge list.

   But as this has to be flushed anyway this is an opportunity to combine
   outstanding TLB flushes and do the housekeeping of purging freed areas,
   but like #1 there is no real good reason to zap usable vmap blocks
   unconditionally.

Add a @force_purge argument to the relevant functions and if not true only
purge fragmented blocks which have less than 1/4 of their capacity left.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |   34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -791,7 +791,7 @@ get_subtree_max_size(struct rb_node *nod
 RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
 	struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
 
-static void purge_vmap_area_lazy(void);
+static void purge_vmap_area_lazy(bool force_purge);
 static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
 static void drain_vmap_area_work(struct work_struct *work);
 static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
@@ -1649,7 +1649,7 @@ static struct vmap_area *alloc_vmap_area
 
 overflow:
 	if (!purged) {
-		purge_vmap_area_lazy();
+		purge_vmap_area_lazy(true);
 		purged = 1;
 		goto retry;
 	}
@@ -1717,7 +1717,7 @@ static atomic_long_t vmap_lazy_nr = ATOM
 static DEFINE_MUTEX(vmap_purge_lock);
 
 /* for per-CPU blocks */
-static void purge_fragmented_blocks_allcpus(void);
+static void purge_fragmented_blocks_allcpus(bool force_purge);
 
 /*
  * Purges all lazily-freed vmap areas.
@@ -1787,10 +1787,10 @@ static bool __purge_vmap_area_lazy(unsig
 /*
  * Kick off a purge of the outstanding lazy areas.
  */
-static void purge_vmap_area_lazy(void)
+static void purge_vmap_area_lazy(bool force_purge)
 {
 	mutex_lock(&vmap_purge_lock);
-	purge_fragmented_blocks_allcpus();
+	purge_fragmented_blocks_allcpus(force_purge);
 	__purge_vmap_area_lazy(ULONG_MAX, 0);
 	mutex_unlock(&vmap_purge_lock);
 }
@@ -1908,6 +1908,12 @@ static struct vmap_area *find_unlink_vma
 
 #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
 
+/*
+ * Purge threshold to prevent overeager purging of fragmented blocks for
+ * regular operations: Purge if vb->free is less than 1/4 of the capacity.
+ */
+#define VMAP_PURGE_THRESHOLD	(VMAP_BBMAP_BITS / 4)
+
 #define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
 #define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
 #define VMAP_FLAGS_MASK		0x3
@@ -2087,12 +2093,16 @@ static void free_vmap_block(struct vmap_
 }
 
 static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
-				   struct list_head *purge_list)
+				   struct list_head *purge_list, bool force_purge)
 {
 	if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
 		return false;
 
-	 /* prevent further allocs after releasing lock */
+	/* Don't overeagerly purge usable blocks unless requested */
+	if (!force_purge && vb->free < VMAP_PURGE_THRESHOLD)
+		return false;
+
+	/* prevent further allocs after releasing lock */
 	WRITE_ONCE(vb->free, 0);
 	 /* prevent purging it again */
 	WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
@@ -2115,7 +2125,7 @@ static void free_purged_blocks(struct li
 	}
 }
 
-static void purge_fragmented_blocks(int cpu)
+static void purge_fragmented_blocks(int cpu, bool force_purge)
 {
 	LIST_HEAD(purge);
 	struct vmap_block *vb;
@@ -2130,19 +2140,19 @@ static void purge_fragmented_blocks(int
 			continue;
 
 		spin_lock(&vb->lock);
-		purge_fragmented_block(vb, vbq, &purge);
+		purge_fragmented_block(vb, vbq, &purge, force_purge);
 		spin_unlock(&vb->lock);
 	}
 	rcu_read_unlock();
 	free_purged_blocks(&purge);
 }
 
-static void purge_fragmented_blocks_allcpus(void)
+static void purge_fragmented_blocks_allcpus(bool force_purge)
 {
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		purge_fragmented_blocks(cpu);
+		purge_fragmented_blocks(cpu, force_purge);
 }
 
 static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
@@ -4173,7 +4183,7 @@ struct vm_struct **pcpu_get_vm_areas(con
 overflow:
 	spin_unlock(&free_vmap_area_lock);
 	if (!purged) {
-		purge_vmap_area_lazy();
+		purge_vmap_area_lazy(true);
 		purged = true;
 
 		/* Before "retry", check if we recover. */



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
@ 2023-05-23 15:17   ` Christoph Hellwig
  2023-05-23 16:40     ` Thomas Gleixner
  2023-05-23 19:18   ` Lorenzo Stoakes
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2023-05-23 15:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice
  2023-05-23 14:02 ` [patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
@ 2023-05-23 15:21   ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2023-05-23 15:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 04:02:12PM +0200, Thomas Gleixner wrote:
> _vunmap_aliases() walks the per CPU xarrays to find partially unmapped
> blocks and then walks the per cpu free lists to purge fragmented blocks.
> 
> Arguably that's waste of CPU cycles and cache lines as the full xarray walk
> already touches every block.
> 
> Avoid this double iteration:
> 
>   - Split out the code to purge one block and the code to free the local
>     purge list into helper functions.
> 
>   - Try to purge the fragmented blocks in the xarray walk before looking at
>     their dirty space.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |   66 ++++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 43 insertions(+), 23 deletions(-)
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2086,39 +2086,52 @@ static void free_vmap_block(struct vmap_
>  	kfree_rcu(vb, rcu_head);
>  }
>  
> +static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
> +				   struct list_head *purge_list)

Please stick to 80 character lines for the vmalloc code.  And while
it's personal preference, two tab indents for the continuation make
prototypes like this a lot more readable.
a lot easier if you 

> +	if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
> +		return false;

This comes from the old code, but it does looks almost intentionally
obsfucated..

	if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
	    vb->dirty == VMAP_BBMAP_BITS)

actually gets the logic across much better.

> +	 /* prevent further allocs after releasing lock */
> +	vb->free = 0;
> +	 /* prevent purging it again */

extra spaces before the comments.

Otherwise the refactoring looks nice, thanks.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
  2023-05-23 14:02 ` [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
@ 2023-05-23 15:27   ` Christoph Hellwig
  2023-05-23 16:10     ` Thomas Gleixner
  2023-05-24  9:43   ` Baoquan He
  1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2023-05-23 15:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 04:02:13PM +0200, Thomas Gleixner wrote:
> vmap blocks which have active mappings cannot be purged. Allocations which
> have been freed are accounted for in vmap_block::dirty_min/max, so that
> they can be detected in _vm_unmap_aliases() as potentially stale TLBs.
> 
> If there are several invocations of _vm_unmap_aliases() then each of them
> will flush the dirty range. That's pointless and just increases the
> probability of full TLB flushes.
> 
> Avoid that by resetting the flush range after accounting for it. That's
> safe versus other invocations of _vm_unmap_aliases() because this is all
> serialized with vmap_purge_lock.

Just nitpicking, but isn't vb->lock the actually relevant lock here?
vmap_purge_lock is only taken after the loop.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
  2023-05-23 14:02 ` [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
@ 2023-05-23 15:29   ` Christoph Hellwig
  2023-05-23 16:17     ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2023-05-23 15:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 04:02:14PM +0200, Thomas Gleixner wrote:
> +		if (READ_ONCE(vb->free) < (1UL << order))
> +			continue;
> +
>  		spin_lock(&vb->lock);
>  		if (vb->free < (1UL << order)) {
>  			spin_unlock(&vb->lock);
> @@ -2174,7 +2177,7 @@ static void *vb_alloc(unsigned long size
>  
>  		pages_off = VMAP_BBMAP_BITS - vb->free;
>  		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
> -		vb->free -= 1UL << order;
> +		WRITE_ONCE(vb->free, vb->free - (1UL << order));

Maybe just a matter of preference, but wouldn't an atomic_t be
better here?  We'd have another locked instruction in the alloc
path, but I always find the READ_ONCE/WRITE_ONCE usage a bit
fragile that I'd rather reserve them to well documented hot
path code.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily
  2023-05-23 14:02 ` [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
@ 2023-05-23 15:30   ` Christoph Hellwig
  2023-05-24 10:34   ` Baoquan He
  1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2023-05-23 15:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
  2023-05-23 15:27   ` Christoph Hellwig
@ 2023-05-23 16:10     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 16:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23 2023 at 17:27, Christoph Hellwig wrote:

> On Tue, May 23, 2023 at 04:02:13PM +0200, Thomas Gleixner wrote:
>> vmap blocks which have active mappings cannot be purged. Allocations which
>> have been freed are accounted for in vmap_block::dirty_min/max, so that
>> they can be detected in _vm_unmap_aliases() as potentially stale TLBs.
>> 
>> If there are several invocations of _vm_unmap_aliases() then each of them
>> will flush the dirty range. That's pointless and just increases the
>> probability of full TLB flushes.
>> 
>> Avoid that by resetting the flush range after accounting for it. That's
>> safe versus other invocations of _vm_unmap_aliases() because this is all
>> serialized with vmap_purge_lock.
>
> Just nitpicking, but isn't vb->lock the actually relevant lock here?
> vmap_purge_lock is only taken after the loop.

No. The avoid double list iteration change moves the purge lock up
before the loop as it needs to protect against a concurrent purge
attempt.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
  2023-05-23 15:29   ` Christoph Hellwig
@ 2023-05-23 16:17     ` Thomas Gleixner
  2023-05-24  9:20       ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 16:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23 2023 at 17:29, Christoph Hellwig wrote:
> On Tue, May 23, 2023 at 04:02:14PM +0200, Thomas Gleixner wrote:
>> +		if (READ_ONCE(vb->free) < (1UL << order))
>> +			continue;
>> +
>>  		spin_lock(&vb->lock);
>>  		if (vb->free < (1UL << order)) {
>>  			spin_unlock(&vb->lock);
>> @@ -2174,7 +2177,7 @@ static void *vb_alloc(unsigned long size
>>  
>>  		pages_off = VMAP_BBMAP_BITS - vb->free;
>>  		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
>> -		vb->free -= 1UL << order;
>> +		WRITE_ONCE(vb->free, vb->free - (1UL << order));
>
> Maybe just a matter of preference, but wouldn't an atomic_t be
> better here?  We'd have another locked instruction in the alloc
> path, but I always find the READ_ONCE/WRITE_ONCE usage a bit
> fragile that I'd rather reserve them to well documented hot
> path code.

I don't see a problem with these lockless quickchecks, especially not
in this particular case, but no strong opinion either.

Thanks

        tglx









^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
                   ` (5 preceding siblings ...)
  2023-05-23 14:02 ` [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
@ 2023-05-23 16:24 ` Uladzislau Rezki
  2023-05-23 17:33   ` Thomas Gleixner
  6 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 16:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 04:02:09PM +0200, Thomas Gleixner wrote:
> Following up to the discussion about excessive TLB flushes
> 
>    https://lore.kernel.org/all/87a5y5a6kj.ffs@tglx
> 
> this series addresses the following issues:
> 
> 
>   1) Prevent the stale TLB problem related to fully utilized vmap blocks
> 
> 
>   2) Avoid the double per CPU list walk in _vm_unmap_aliases()
> 
> 
>   3) Avoid flushing dirty space over and over
> 
> 
>   4) Add a lockless quickcheck in vb_alloc() and add missing
>      READ/WRITE_ONCE() annotations
> 
>   5) Prevent overeager purging of usable vmap_blocks if
>      not under memory/address space pressure.
> 
> Thanks,
> 
> 	tglx
Thank you for fixing it!

<snip>
urezki@pc638:~/data/raid0/coding/linux.git$ make -j64 bzImage
  DESCEND objtool
  INSTALL libsubcmd_headers
  CALL    scripts/checksyscalls.sh
  CC      mm/vmalloc.o
In file included from ./include/linux/list_lru.h:14,
                 from ./include/linux/fs.h:13,
                 from ./include/linux/huge_mm.h:8,
                 from ./include/linux/mm.h:855,
                 from mm/vmalloc.c:12:
mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
 2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
      |                   ^~
./include/linux/xarray.h:449:23: note: in definition of macro ‘xa_for_each_range’
  449 |       entry = xa_find(xa, &index, last, XA_PRESENT);  \
      |                       ^~
./include/linux/xarray.h:501:2: note: in expansion of macro ‘xa_for_each_start’
  501 |  xa_for_each_start(xa, index, entry, 0)
      |  ^~~~~~~~~~~~~~~~~
mm/vmalloc.c:2220:3: note: in expansion of macro ‘xa_for_each’
 2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
      |   ^~~~~~~~~~~
mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
 2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
      |                   ^~
./include/linux/xarray.h:451:29: note: in definition of macro ‘xa_for_each_range’
  451 |       entry = xa_find_after(xa, &index, last, XA_PRESENT))
      |                             ^~
./include/linux/xarray.h:501:2: note: in expansion of macro ‘xa_for_each_start’
  501 |  xa_for_each_start(xa, index, entry, 0)
      |  ^~~~~~~~~~~~~~~~~
mm/vmalloc.c:2220:3: note: in expansion of macro ‘xa_for_each’
 2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
      |   ^~~~~~~~~~~
mm/vmalloc.c:2228:9: error: too few arguments to function ‘purge_fragmented_block’
 2228 |    if (!purge_fragmented_block(vb, vbq, &purge_list) &&
      |         ^~~~~~~~~~~~~~~~~~~~~~
mm/vmalloc.c:2047:13: note: declared here
 2047 | static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
      |             ^~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [scripts/Makefile.build:252: mm/vmalloc.o] Error 1
make[1]: *** [scripts/Makefile.build:494: mm] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:2025: .] Error 2
urezki@pc638:~/data/raid0/coding/linux.git$ ^C
<snip>

Could please fix it? :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 15:17   ` Christoph Hellwig
@ 2023-05-23 16:40     ` Thomas Gleixner
  2023-05-23 16:47       ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 16:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23 2023 at 17:17, Christoph Hellwig wrote:
> Looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

This probably wants a Fixes tag and if I'm not completely off then this
problem exists since 2.6.28: db64fe02258f ("mm: rewrite vmap layer")



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 16:40     ` Thomas Gleixner
@ 2023-05-23 16:47       ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 16:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, linux-mm, Andrew Morton, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 06:40:40PM +0200, Thomas Gleixner wrote:
> On Tue, May 23 2023 at 17:17, Christoph Hellwig wrote:
> > Looks good:
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Thanks!
> 
> This probably wants a Fixes tag and if I'm not completely off then this
> problem exists since 2.6.28: db64fe02258f ("mm: rewrite vmap layer")
> 
Yes. This code is quite old and the commit, you pointed looks correct.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 16:24 ` [patch 0/6] mm/vmalloc: Assorted fixes and improvements Uladzislau Rezki
@ 2023-05-23 17:33   ` Thomas Gleixner
  2023-05-23 17:39     ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 17:33 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23 2023 at 18:24, Uladzislau Rezki wrote:
> On Tue, May 23, 2023 at 04:02:09PM +0200, Thomas Gleixner wrote:
> mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
>       |                   ^~

Duh. I surely had that compile fail fixed before I boot tested that
pile. And then I did something stupid obviously.






^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 17:33   ` Thomas Gleixner
@ 2023-05-23 17:39     ` Thomas Gleixner
  2023-05-23 17:48       ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 17:39 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23 2023 at 19:33, Thomas Gleixner wrote:

> On Tue, May 23 2023 at 18:24, Uladzislau Rezki wrote:
>> On Tue, May 23, 2023 at 04:02:09PM +0200, Thomas Gleixner wrote:
>> mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
>> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
>>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
>>       |                   ^~
>
> Duh. I surely had that compile fail fixed before I boot tested that
> pile. And then I did something stupid obviously.

No. This one not. I only had the one in the last patch (missing
force_purge argument)

And this one makes me scratch my head:

struct vmap_block_queue {
	spinlock_t lock;
	struct list_head free;

	/*
	 * An xarray requires an extra memory dynamically to
	 * be allocated. If it is an issue, we can use rb-tree
	 * instead.
	 */
	struct xarray vmap_blocks;
};

So how can your compiler complain?

>> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
>>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {

Mine does not, but I might be missing something.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 17:39     ` Thomas Gleixner
@ 2023-05-23 17:48       ` Uladzislau Rezki
  2023-05-23 17:51         ` Uladzislau Rezki
  2023-05-23 17:55         ` Uladzislau Rezki
  0 siblings, 2 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 17:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 07:39:26PM +0200, Thomas Gleixner wrote:
> On Tue, May 23 2023 at 19:33, Thomas Gleixner wrote:
> 
> > On Tue, May 23 2023 at 18:24, Uladzislau Rezki wrote:
> >> On Tue, May 23, 2023 at 04:02:09PM +0200, Thomas Gleixner wrote:
> >> mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
> >> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
> >>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
> >>       |                   ^~
> >
> > Duh. I surely had that compile fail fixed before I boot tested that
> > pile. And then I did something stupid obviously.
> 
> No. This one not. I only had the one in the last patch (missing
> force_purge argument)
> 
> And this one makes me scratch my head:
> 
> struct vmap_block_queue {
> 	spinlock_t lock;
> 	struct list_head free;
> 
> 	/*
> 	 * An xarray requires an extra memory dynamically to
> 	 * be allocated. If it is an issue, we can use rb-tree
> 	 * instead.
> 	 */
> 	struct xarray vmap_blocks;
> };
>
> 
> So how can your compiler complain?
> 
> >> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
> >>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
> 
> Mine does not, but I might be missing something.
> 
Right. I have applied your patches on the v6.3 what is not correct. I
thought it should be fine, because that part was not touched quite a
lot of time. Apparently, me, Lorenzo and Baoquan placed the vmap_blocks
under the vmap_block_queue structure.

So, v6.3 does not contain that patch. I have to use the next instead.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 17:48       ` Uladzislau Rezki
@ 2023-05-23 17:51         ` Uladzislau Rezki
  2023-05-23 17:55         ` Uladzislau Rezki
  1 sibling, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 17:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thomas Gleixner, linux-mm, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 07:48:09PM +0200, Uladzislau Rezki wrote:
> On Tue, May 23, 2023 at 07:39:26PM +0200, Thomas Gleixner wrote:
> > On Tue, May 23 2023 at 19:33, Thomas Gleixner wrote:
> > 
> > > On Tue, May 23 2023 at 18:24, Uladzislau Rezki wrote:
> > >> On Tue, May 23, 2023 at 04:02:09PM +0200, Thomas Gleixner wrote:
> > >> mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
> > >> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
> > >>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
> > >>       |                   ^~
> > >
> > > Duh. I surely had that compile fail fixed before I boot tested that
> > > pile. And then I did something stupid obviously.
> > 
> > No. This one not. I only had the one in the last patch (missing
> > force_purge argument)
> > 
> > And this one makes me scratch my head:
> > 
> > struct vmap_block_queue {
> > 	spinlock_t lock;
> > 	struct list_head free;
> > 
> > 	/*
> > 	 * An xarray requires an extra memory dynamically to
> > 	 * be allocated. If it is an issue, we can use rb-tree
> > 	 * instead.
> > 	 */
> > 	struct xarray vmap_blocks;
> > };
> >
> > 
> > So how can your compiler complain?
> > 
> > >> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
> > >>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
> > 
> > Mine does not, but I might be missing something.
> > 
> Right. I have applied your patches on the v6.3 what is not correct. I
> thought it should be fine, because that part was not touched quite a
> lot of time. Apparently, me, Lorenzo and Baoquan placed the vmap_blocks
> under the vmap_block_queue structure.
> 
> So, v6.3 does not contain that patch. I have to use the next instead.
> 
My fault :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 17:48       ` Uladzislau Rezki
  2023-05-23 17:51         ` Uladzislau Rezki
@ 2023-05-23 17:55         ` Uladzislau Rezki
  2023-05-23 18:40           ` Thomas Gleixner
  1 sibling, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 17:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thomas Gleixner, linux-mm, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 07:48:09PM +0200, Uladzislau Rezki wrote:
> On Tue, May 23, 2023 at 07:39:26PM +0200, Thomas Gleixner wrote:
> > On Tue, May 23 2023 at 19:33, Thomas Gleixner wrote:
> > 
> > > On Tue, May 23 2023 at 18:24, Uladzislau Rezki wrote:
> > >> On Tue, May 23, 2023 at 04:02:09PM +0200, Thomas Gleixner wrote:
> > >> mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
> > >> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
> > >>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
> > >>       |                   ^~
> > >
> > > Duh. I surely had that compile fail fixed before I boot tested that
> > > pile. And then I did something stupid obviously.
> > 
> > No. This one not. I only had the one in the last patch (missing
> > force_purge argument)
> > 
> > And this one makes me scratch my head:
> > 
> > struct vmap_block_queue {
> > 	spinlock_t lock;
> > 	struct list_head free;
> > 
> > 	/*
> > 	 * An xarray requires an extra memory dynamically to
> > 	 * be allocated. If it is an issue, we can use rb-tree
> > 	 * instead.
> > 	 */
> > 	struct xarray vmap_blocks;
> > };
> >
> > 
> > So how can your compiler complain?
> > 
> > >> mm/vmalloc.c:2220:19: error: ‘struct vmap_block_queue’ has no member named ‘vmap_blocks’
> > >>  2220 |   xa_for_each(&vbq->vmap_blocks, idx, vb) {
> > 
> > Mine does not, but I might be missing something.
> > 
> Right. I have applied your patches on the v6.3 what is not correct. I
> thought it should be fine, because that part was not touched quite a
> lot of time. Apparently, me, Lorenzo and Baoquan placed the vmap_blocks
> under the vmap_block_queue structure.
> 
> So, v6.3 does not contain that patch. I have to use the next instead.
> 
next-20230523:

mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
mm/vmalloc.c:2280:9: error: too few arguments to function ‘purge_fragmented_block’
 2280 |    if (!purge_fragmented_block(vb, vbq, &purge_list) &&
      |         ^~~~~~~~~~~~~~~~~~~~~~
mm/vmalloc.c:2095:13: note: declared here
 2095 | static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
      |             ^~~~~~~~~~~~~~~~~~~~~~
  CC      drivers/acpi/pmic/intel_pmic_bxtwc.o
 

there is only one complain in fact.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 0/6] mm/vmalloc: Assorted fixes and improvements
  2023-05-23 17:55         ` Uladzislau Rezki
@ 2023-05-23 18:40           ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-23 18:40 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Lorenzo Stoakes,
	Peter Zijlstra, Baoquan He

On Tue, May 23 2023 at 19:55, Uladzislau Rezki wrote:
> On Tue, May 23, 2023 at 07:48:09PM +0200, Uladzislau Rezki wrote:
>> So, v6.3 does not contain that patch. I have to use the next instead.
>> 
> next-20230523:

It works against linus tree just fine.

> mm/vmalloc.c: In function ‘_vm_unmap_aliases’:
> mm/vmalloc.c:2280:9: error: too few arguments to function ‘purge_fragmented_block’
>  2280 |    if (!purge_fragmented_block(vb, vbq, &purge_list) &&
>       |         ^~~~~~~~~~~~~~~~~~~~~~
> mm/vmalloc.c:2095:13: note: declared here
>  2095 | static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
>       |             ^~~~~~~~~~~~~~~~~~~~~~
>   CC      drivers/acpi/pmic/intel_pmic_bxtwc.o
>  
>
> there is only one complain in fact.

That's the one where I failed to pull the refreshed patch back from the
test machine.

---
Subject: mm/vmalloc: Don't purge usable blocks unnecessarily
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 22 May 2023 19:38:32 +0200

Purging fragmented blocks is done unconditionally in several contexts:

  1) From drain_vmap_area_work(), when the number of lazy to be freed
     vmap_areas reached the threshold

  2) Reclaiming vmalloc address space from pcpu_get_vm_areas()

  3) _unmap_aliases()

#1 There is no reason to zap fragmented vmap blocks unconditionally, simply
   because reclaiming all lazy areas drains at least

      32MB * fls(num_online_cpus())

   per invocation which is plenty.

#2 Reclaiming when running out of space or due to memory pressure makes a
   lot of sense

#3 _unmap_aliases() requires to touch everything because the caller has no
   clue which vmap_area used a particular page last and the vmap_area lost
   that information too.

   Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
   vmap area first and then cares about the flush. That in turn requires
   a full walk of _all_ vmap areas including the one which was just
   added to the purge list.

   But as this has to be flushed anyway this is an opportunity to combine
   outstanding TLB flushes and do the housekeeping of purging freed areas,
   but like #1 there is no real good reason to zap usable vmap blocks
   unconditionally.

Add a @force_purge argument to the relevant functions and if not true only
purge fragmented blocks which have less than 1/4 of their capacity left.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/vmalloc.c |   36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -791,7 +791,7 @@ get_subtree_max_size(struct rb_node *nod
 RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
 	struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
 
-static void purge_vmap_area_lazy(void);
+static void purge_vmap_area_lazy(bool force_purge);
 static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
 static void drain_vmap_area_work(struct work_struct *work);
 static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
@@ -1649,7 +1649,7 @@ static struct vmap_area *alloc_vmap_area
 
 overflow:
 	if (!purged) {
-		purge_vmap_area_lazy();
+		purge_vmap_area_lazy(true);
 		purged = 1;
 		goto retry;
 	}
@@ -1717,7 +1717,7 @@ static atomic_long_t vmap_lazy_nr = ATOM
 static DEFINE_MUTEX(vmap_purge_lock);
 
 /* for per-CPU blocks */
-static void purge_fragmented_blocks_allcpus(void);
+static void purge_fragmented_blocks_allcpus(bool force_purge);
 
 /*
  * Purges all lazily-freed vmap areas.
@@ -1787,10 +1787,10 @@ static bool __purge_vmap_area_lazy(unsig
 /*
  * Kick off a purge of the outstanding lazy areas.
  */
-static void purge_vmap_area_lazy(void)
+static void purge_vmap_area_lazy(bool force_purge)
 {
 	mutex_lock(&vmap_purge_lock);
-	purge_fragmented_blocks_allcpus();
+	purge_fragmented_blocks_allcpus(force_purge);
 	__purge_vmap_area_lazy(ULONG_MAX, 0);
 	mutex_unlock(&vmap_purge_lock);
 }
@@ -1908,6 +1908,12 @@ static struct vmap_area *find_unlink_vma
 
 #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
 
+/*
+ * Purge threshold to prevent overeager purging of fragmented blocks for
+ * regular operations: Purge if vb->free is less than 1/4 of the capacity.
+ */
+#define VMAP_PURGE_THRESHOLD	(VMAP_BBMAP_BITS / 4)
+
 #define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
 #define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
 #define VMAP_FLAGS_MASK		0x3
@@ -2087,12 +2093,16 @@ static void free_vmap_block(struct vmap_
 }
 
 static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
-				   struct list_head *purge_list)
+				   struct list_head *purge_list, bool force_purge)
 {
 	if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
 		return false;
 
-	 /* prevent further allocs after releasing lock */
+	/* Don't overeagerly purge usable blocks unless requested */
+	if (!force_purge && vb->free < VMAP_PURGE_THRESHOLD)
+		return false;
+
+	/* prevent further allocs after releasing lock */
 	WRITE_ONCE(vb->free, 0);
 	 /* prevent purging it again */
 	WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
@@ -2115,7 +2125,7 @@ static void free_purged_blocks(struct li
 	}
 }
 
-static void purge_fragmented_blocks(int cpu)
+static void purge_fragmented_blocks(int cpu, bool force_purge)
 {
 	LIST_HEAD(purge);
 	struct vmap_block *vb;
@@ -2130,19 +2140,19 @@ static void purge_fragmented_blocks(int
 			continue;
 
 		spin_lock(&vb->lock);
-		purge_fragmented_block(vb, vbq, &purge);
+		purge_fragmented_block(vb, vbq, &purge, force_purge);
 		spin_unlock(&vb->lock);
 	}
 	rcu_read_unlock();
 	free_purged_blocks(&purge);
 }
 
-static void purge_fragmented_blocks_allcpus(void)
+static void purge_fragmented_blocks_allcpus(bool force_purge)
 {
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		purge_fragmented_blocks(cpu);
+		purge_fragmented_blocks(cpu, force_purge);
 }
 
 static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
@@ -2267,7 +2277,7 @@ static void _vm_unmap_aliases(unsigned l
 			 * not purgeable, check whether there is dirty
 			 * space to be flushed.
 			 */
-			if (!purge_fragmented_block(vb, vbq, &purge_list) &&
+			if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
 			    vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
 				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
@@ -4173,7 +4183,7 @@ struct vm_struct **pcpu_get_vm_areas(con
 overflow:
 	spin_unlock(&free_vmap_area_lock);
 	if (!purged) {
-		purge_vmap_area_lazy();
+		purge_vmap_area_lazy(true);
 		purged = true;
 
 		/* Before "retry", check if we recover. */


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
  2023-05-23 15:17   ` Christoph Hellwig
@ 2023-05-23 19:18   ` Lorenzo Stoakes
  2023-05-24  9:19     ` Uladzislau Rezki
  2023-05-24  9:25   ` Baoquan He
  2023-05-24  9:32   ` Baoquan He
  3 siblings, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2023-05-23 19:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 04:02:11PM +0200, Thomas Gleixner wrote:
> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> page are left in the system. This is required due to the lazy TLB flush
> mechanism in vmalloc.
>
> This is tried to achieve by walking the per CPU free lists, but those do
> not contain fully utilized vmap blocks because they are removed from the
> free list once the blocks free space became zero.

Oh Lord.

>
> So the per CPU list iteration does not find the block and if the page was
> mapped via such a block and the TLB has not yet been flushed, the guarantee
> of _vm_unmap_aliases() that there are no stale TLBs after returning is
> broken:
>
> x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
> vb_free(x)     // Unmaps page and marks in dirty_min/max range
>
> // Page is reused
> vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL
>
> So instead of walking the per CPU free lists, walk the per CPU xarrays
> which hold pointers to _all_ active blocks in the system including those
> removed from the free lists.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
>  	for_each_possible_cpu(cpu) {
>  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
>  		struct vmap_block *vb;
> +		unsigned long idx;
>
>  		rcu_read_lock();
> -		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> +		xa_for_each(&vbq->vmap_blocks, idx, vb) {
>  			spin_lock(&vb->lock);
>  			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
>  				unsigned long va_start = vb->va->va_start;
>
>

LGTM.

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations
  2023-05-23 14:02 ` [patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
@ 2023-05-24  9:15   ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-24  9:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 04:02:15PM +0200, Thomas Gleixner wrote:
> purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty
> lockless for a quick check.
> 
> Add the missing READ/WRITE_ONCE() annotations.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |   11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2093,9 +2093,9 @@ static bool purge_fragmented_block(struc
>  		return false;
>  
>  	 /* prevent further allocs after releasing lock */
> -	vb->free = 0;
> +	WRITE_ONCE(vb->free, 0);
>  	 /* prevent purging it again */
> -	vb->dirty = VMAP_BBMAP_BITS;
> +	WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
>  	vb->dirty_min = 0;
>  	vb->dirty_max = VMAP_BBMAP_BITS;
>  	spin_lock(&vbq->lock);
> @@ -2123,7 +2123,10 @@ static void purge_fragmented_blocks(int
>  
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> -		if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
> +		unsigned long free = READ_ONCE(vb->free);
> +		unsigned long dirty = READ_ONCE(vb->dirty);
> +
> +		if (!(free + dirty == VMAP_BBMAP_BITS && dirty != VMAP_BBMAP_BITS))
>  			continue;
>  
>  		spin_lock(&vb->lock);
> @@ -2231,7 +2234,7 @@ static void vb_free(unsigned long addr,
>  	vb->dirty_min = min(vb->dirty_min, offset);
>  	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
>  
> -	vb->dirty += 1UL << order;
> +	WRITE_ONCE(vb->dirty, vb->dirty + (1UL << order));
>  	if (vb->dirty == VMAP_BBMAP_BITS) {
>  		BUG_ON(vb->free);
>  		spin_unlock(&vb->lock);
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 19:18   ` Lorenzo Stoakes
@ 2023-05-24  9:19     ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-24  9:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thomas Gleixner, linux-mm, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 08:18:10PM +0100, Lorenzo Stoakes wrote:
> On Tue, May 23, 2023 at 04:02:11PM +0200, Thomas Gleixner wrote:
> > _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> > page are left in the system. This is required due to the lazy TLB flush
> > mechanism in vmalloc.
> >
> > This is tried to achieve by walking the per CPU free lists, but those do
> > not contain fully utilized vmap blocks because they are removed from the
> > free list once the blocks free space became zero.
> 
> Oh Lord.
> 
> >
> > So the per CPU list iteration does not find the block and if the page was
> > mapped via such a block and the TLB has not yet been flushed, the guarantee
> > of _vm_unmap_aliases() that there are no stale TLBs after returning is
> > broken:
> >
> > x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
> > vb_free(x)     // Unmaps page and marks in dirty_min/max range
> >
> > // Page is reused
> > vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL
> >
> > So instead of walking the per CPU free lists, walk the per CPU xarrays
> > which hold pointers to _all_ active blocks in the system including those
> > removed from the free lists.
> >
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  mm/vmalloc.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
> >  	for_each_possible_cpu(cpu) {
> >  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
> >  		struct vmap_block *vb;
> > +		unsigned long idx;
> >
> >  		rcu_read_lock();
> > -		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> > +		xa_for_each(&vbq->vmap_blocks, idx, vb) {
> >  			spin_lock(&vb->lock);
> >  			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> >  				unsigned long va_start = vb->va->va_start;
> >
> >
> 
> LGTM.
> 
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
  2023-05-23 16:17     ` Thomas Gleixner
@ 2023-05-24  9:20       ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-24  9:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, linux-mm, Andrew Morton, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra, Baoquan He

On Tue, May 23, 2023 at 06:17:42PM +0200, Thomas Gleixner wrote:
> On Tue, May 23 2023 at 17:29, Christoph Hellwig wrote:
> > On Tue, May 23, 2023 at 04:02:14PM +0200, Thomas Gleixner wrote:
> >> +		if (READ_ONCE(vb->free) < (1UL << order))
> >> +			continue;
> >> +
> >>  		spin_lock(&vb->lock);
> >>  		if (vb->free < (1UL << order)) {
> >>  			spin_unlock(&vb->lock);
> >> @@ -2174,7 +2177,7 @@ static void *vb_alloc(unsigned long size
> >>  
> >>  		pages_off = VMAP_BBMAP_BITS - vb->free;
> >>  		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
> >> -		vb->free -= 1UL << order;
> >> +		WRITE_ONCE(vb->free, vb->free - (1UL << order));
> >
> > Maybe just a matter of preference, but wouldn't an atomic_t be
> > better here?  We'd have another locked instruction in the alloc
> > path, but I always find the READ_ONCE/WRITE_ONCE usage a bit
> > fragile that I'd rather reserve them to well documented hot
> > path code.
> 
> I don't see a problem with these lockless quickchecks, especially not
> in this particular case, but no strong opinion either.
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
  2023-05-23 15:17   ` Christoph Hellwig
  2023-05-23 19:18   ` Lorenzo Stoakes
@ 2023-05-24  9:25   ` Baoquan He
  2023-05-24  9:51     ` Thomas Gleixner
  2023-05-24  9:32   ` Baoquan He
  3 siblings, 1 reply; 43+ messages in thread
From: Baoquan He @ 2023-05-24  9:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> page are left in the system. This is required due to the lazy TLB flush
> mechanism in vmalloc.
> 
> This is tried to achieve by walking the per CPU free lists, but those do
> not contain fully utilized vmap blocks because they are removed from the
> free list once the blocks free space became zero.

The problem description is not accurate. This is tried to achieve for
va associated with vmap_block by walking the per CPU free lists, those
fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy()
by calculating the [min:max] of purge_vmap_area_list, because va of
vmap_blocks will be added to purge_vmap_area_list too via vb_free().

If we finally exclude the vmap_block va from purge list in
__purge_vmap_area_lazy(), then we can say that stale TLBs is missed to
flush. No?

IMHO, this is a preparation work for the vmap_block va excluding from
purge list flushing. Please correct me if I am wrong.

> 
> So the per CPU list iteration does not find the block and if the page was
> mapped via such a block and the TLB has not yet been flushed, the guarantee
> of _vm_unmap_aliases() that there are no stale TLBs after returning is
> broken:
> 
> x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
> vb_free(x)     // Unmaps page and marks in dirty_min/max range
> 
> // Page is reused
> vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL
> 
> So instead of walking the per CPU free lists, walk the per CPU xarrays
> which hold pointers to _all_ active blocks in the system including those
> removed from the free lists.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
>  	for_each_possible_cpu(cpu) {
>  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
>  		struct vmap_block *vb;
> +		unsigned long idx;
>  
>  		rcu_read_lock();
> -		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> +		xa_for_each(&vbq->vmap_blocks, idx, vb) {
>  			spin_lock(&vb->lock);
>  			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
>  				unsigned long va_start = vb->va->va_start;
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
                     ` (2 preceding siblings ...)
  2023-05-24  9:25   ` Baoquan He
@ 2023-05-24  9:32   ` Baoquan He
  2023-05-24  9:52     ` Thomas Gleixner
  3 siblings, 1 reply; 43+ messages in thread
From: Baoquan He @ 2023-05-24  9:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> page are left in the system. This is required due to the lazy TLB flush
> mechanism in vmalloc.
> 
> This is tried to achieve by walking the per CPU free lists, but those do
> not contain fully utilized vmap blocks because they are removed from the
> free list once the blocks free space became zero.
> 
> So the per CPU list iteration does not find the block and if the page was
> mapped via such a block and the TLB has not yet been flushed, the guarantee
> of _vm_unmap_aliases() that there are no stale TLBs after returning is
> broken:
> 
> x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
> vb_free(x)     // Unmaps page and marks in dirty_min/max range
> 
> // Page is reused
> vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL
> 
> So instead of walking the per CPU free lists, walk the per CPU xarrays
> which hold pointers to _all_ active blocks in the system including those
> removed from the free lists.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
>  	for_each_possible_cpu(cpu) {
>  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
>  		struct vmap_block *vb;
> +		unsigned long idx;
>  
>  		rcu_read_lock();

Do we need to remove this rcu_read_xx() pair since it marks the RCU
read-side critical section on vbq-free list?

> -		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> +		xa_for_each(&vbq->vmap_blocks, idx, vb) {
>  			spin_lock(&vb->lock);
>  			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
>  				unsigned long va_start = vb->va->va_start;
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
  2023-05-23 14:02 ` [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
  2023-05-23 15:27   ` Christoph Hellwig
@ 2023-05-24  9:43   ` Baoquan He
  1 sibling, 0 replies; 43+ messages in thread
From: Baoquan He @ 2023-05-24  9:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> vmap blocks which have active mappings cannot be purged. Allocations which
> have been freed are accounted for in vmap_block::dirty_min/max, so that
> they can be detected in _vm_unmap_aliases() as potentially stale TLBs.
> 
> If there are several invocations of _vm_unmap_aliases() then each of them
> will flush the dirty range. That's pointless and just increases the
> probability of full TLB flushes.
> 
> Avoid that by resetting the flush range after accounting for it. That's
> safe versus other invocations of _vm_unmap_aliases() because this is all
> serialized with vmap_purge_lock.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2224,7 +2224,7 @@ static void vb_free(unsigned long addr,
>  
>  	spin_lock(&vb->lock);
>  
> -	/* Expand dirty range */
> +	/* Expand the not yet TLB flushed dirty range */
>  	vb->dirty_min = min(vb->dirty_min, offset);
>  	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
>  
> @@ -2262,7 +2262,7 @@ static void _vm_unmap_aliases(unsigned l
>  			 * space to be flushed.
>  			 */
>  			if (!purge_fragmented_block(vb, vbq, &purge_list) &&
> -			    vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> +			    vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
>  				unsigned long va_start = vb->va->va_start;
>  				unsigned long s, e;
>  
> @@ -2272,6 +2272,10 @@ static void _vm_unmap_aliases(unsigned l
>  				start = min(s, start);
>  				end   = max(e, end);
>  
> +				/* Prevent that this is flushed more than once */
> +				vb->dirty_min = VMAP_BBMAP_BITS;
> +				vb->dirty_max = 0;
> +

This is really a great catch and improvement.

Reviewed-by: Baoquan He <bhe@redhat.com>

>  				flush = 1;
>  			}
>  			spin_unlock(&vb->lock);
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24  9:25   ` Baoquan He
@ 2023-05-24  9:51     ` Thomas Gleixner
  2023-05-24 11:24       ` Baoquan He
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24  9:51 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 17:25, Baoquan He wrote:
> On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
>> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
>> page are left in the system. This is required due to the lazy TLB flush
>> mechanism in vmalloc.
>> 
>> This is tried to achieve by walking the per CPU free lists, but those do
>> not contain fully utilized vmap blocks because they are removed from the
>> free list once the blocks free space became zero.
>
> The problem description is not accurate. This is tried to achieve for
> va associated with vmap_block by walking the per CPU free lists, those
> fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy()
> by calculating the [min:max] of purge_vmap_area_list, because va of
> vmap_blocks will be added to purge_vmap_area_list too via vb_free().

No. The fully utilized block cannot be purged when there are still
active mappings on it. Again:

  X = vb_alloc()
...  
  Y = vb_alloc()
    vb->free -= order;
    if (!vb->vb_free)
       list_del(vb->free_list);
...
  vb_free(Y)
    vb->dirty += order;
    if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
       free_block(); 

So because $X is not yet unmapped the block is neither on the free list
nor on purge_vmap_area_list.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24  9:32   ` Baoquan He
@ 2023-05-24  9:52     ` Thomas Gleixner
  2023-05-24 14:10       ` Baoquan He
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24  9:52 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 17:32, Baoquan He wrote:
> On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
>> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
>>  	for_each_possible_cpu(cpu) {
>>  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
>>  		struct vmap_block *vb;
>> +		unsigned long idx;
>>  
>>  		rcu_read_lock();
>
> Do we need to remove this rcu_read_xx() pair since it marks the RCU
> read-side critical section on vbq-free list?

And what protects the xarray lookup?

>> -		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
>> +		xa_for_each(&vbq->vmap_blocks, idx, vb) {
>>  			spin_lock(&vb->lock);
>>  			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
>>  				unsigned long va_start = vb->va->va_start;
>> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily
  2023-05-23 14:02 ` [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
  2023-05-23 15:30   ` Christoph Hellwig
@ 2023-05-24 10:34   ` Baoquan He
  2023-05-24 12:55     ` Thomas Gleixner
  1 sibling, 1 reply; 43+ messages in thread
From: Baoquan He @ 2023-05-24 10:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> Purging fragmented blocks is done unconditionally in several contexts:
> 
>   1) From drain_vmap_area_work(), when the number of lazy to be freed
>      vmap_areas reached the threshold
> 
>   2) Reclaiming vmalloc address space from pcpu_get_vm_areas()
> 
>   3) _unmap_aliases()
> 
> #1 There is no reason to zap fragmented vmap blocks unconditionally, simply
>    because reclaiming all lazy areas drains at least
> 
>       32MB * fls(num_online_cpus())
> 
>    per invocation which is plenty.
> 
> #2 Reclaiming when running out of space or due to memory pressure makes a
>    lot of sense
> 
> #3 _unmap_aliases() requires to touch everything because the caller has no
>    clue which vmap_area used a particular page last and the vmap_area lost
>    that information too.
> 
>    Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
>    vmap area first and then cares about the flush. That in turn requires
>    a full walk of _all_ vmap areas including the one which was just
>    added to the purge list.
> 
>    But as this has to be flushed anyway this is an opportunity to combine
>    outstanding TLB flushes and do the housekeeping of purging freed areas,
>    but like #1 there is no real good reason to zap usable vmap blocks
>    unconditionally.
> 
> Add a @force_purge argument to the relevant functions and if not true only
> purge fragmented blocks which have less than 1/4 of their capacity left.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  mm/vmalloc.c |   34 ++++++++++++++++++++++------------
>  1 file changed, 22 insertions(+), 12 deletions(-)
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -791,7 +791,7 @@ get_subtree_max_size(struct rb_node *nod
>  RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
>  	struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
>  
> -static void purge_vmap_area_lazy(void);
> +static void purge_vmap_area_lazy(bool force_purge);
>  static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
>  static void drain_vmap_area_work(struct work_struct *work);
>  static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
> @@ -1649,7 +1649,7 @@ static struct vmap_area *alloc_vmap_area
>  
>  overflow:
>  	if (!purged) {
> -		purge_vmap_area_lazy();
> +		purge_vmap_area_lazy(true);
>  		purged = 1;
>  		goto retry;
>  	}
> @@ -1717,7 +1717,7 @@ static atomic_long_t vmap_lazy_nr = ATOM
>  static DEFINE_MUTEX(vmap_purge_lock);
>  
>  /* for per-CPU blocks */
> -static void purge_fragmented_blocks_allcpus(void);
> +static void purge_fragmented_blocks_allcpus(bool force_purge);
>  
>  /*
>   * Purges all lazily-freed vmap areas.
> @@ -1787,10 +1787,10 @@ static bool __purge_vmap_area_lazy(unsig
>  /*
>   * Kick off a purge of the outstanding lazy areas.
>   */
> -static void purge_vmap_area_lazy(void)
> +static void purge_vmap_area_lazy(bool force_purge)
>  {
>  	mutex_lock(&vmap_purge_lock);
> -	purge_fragmented_blocks_allcpus();
> +	purge_fragmented_blocks_allcpus(force_purge);
>  	__purge_vmap_area_lazy(ULONG_MAX, 0);
>  	mutex_unlock(&vmap_purge_lock);
>  }
> @@ -1908,6 +1908,12 @@ static struct vmap_area *find_unlink_vma
>  
>  #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
>  
> +/*
> + * Purge threshold to prevent overeager purging of fragmented blocks for
> + * regular operations: Purge if vb->free is less than 1/4 of the capacity.
> + */
> +#define VMAP_PURGE_THRESHOLD	(VMAP_BBMAP_BITS / 4)
> +
>  #define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
>  #define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
>  #define VMAP_FLAGS_MASK		0x3
> @@ -2087,12 +2093,16 @@ static void free_vmap_block(struct vmap_
>  }
>  
>  static bool purge_fragmented_block(struct vmap_block *vb, struct vmap_block_queue *vbq,
> -				   struct list_head *purge_list)
> +				   struct list_head *purge_list, bool force_purge)
>  {
>  	if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
>  		return false;
>  
> -	 /* prevent further allocs after releasing lock */
> +	/* Don't overeagerly purge usable blocks unless requested */
> +	if (!force_purge && vb->free < VMAP_PURGE_THRESHOLD)
> +		return false;
> +
> +	/* prevent further allocs after releasing lock */
>  	WRITE_ONCE(vb->free, 0);
>  	 /* prevent purging it again */
>  	WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
> @@ -2115,7 +2125,7 @@ static void free_purged_blocks(struct li
>  	}
>  }
>  
> -static void purge_fragmented_blocks(int cpu)
> +static void purge_fragmented_blocks(int cpu, bool force_purge)
>  {
>  	LIST_HEAD(purge);
>  	struct vmap_block *vb;
> @@ -2130,19 +2140,19 @@ static void purge_fragmented_blocks(int
>  			continue;
>  
>  		spin_lock(&vb->lock);
> -		purge_fragmented_block(vb, vbq, &purge);
> +		purge_fragmented_block(vb, vbq, &purge, force_purge);
>  		spin_unlock(&vb->lock);
>  	}
>  	rcu_read_unlock();
>  	free_purged_blocks(&purge);
>  }
>  
> -static void purge_fragmented_blocks_allcpus(void)
> +static void purge_fragmented_blocks_allcpus(bool force_purge)
>  {
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu)
> -		purge_fragmented_blocks(cpu);
> +		purge_fragmented_blocks(cpu, force_purge);
>  }
>  
>  static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
> @@ -4173,7 +4183,7 @@ struct vm_struct **pcpu_get_vm_areas(con
>  overflow:
>  	spin_unlock(&free_vmap_area_lock);
>  	if (!purged) {
> -		purge_vmap_area_lazy();
> +		purge_vmap_area_lazy(true);
>  		purged = true;
>  
>  		/* Before "retry", check if we recover. */

Wondering why bothering to add 'force_purge' to purge_vmap_area_lazy(),
purge_fragmented_blocks_allcpus() if they are all true. Can't we set
'force_purge' as true for purge_fragmented_block() in
purge_fragmented_blocks()?

alloc_vmap_area()
pcpu_get_vm_areas()
-->purge_vmap_area_lazy(true)
   -->purge_fragmented_blocks_allcpus(force_purge=true)
      -->purge_fragmented_block(force_purge=true)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 062f4a86b049..c812f8afa985 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2140,7 +2140,7 @@ static void purge_fragmented_blocks(int cpu, bool force_purge)
                        continue;
 
                spin_lock(&vb->lock);
-               purge_fragmented_block(vb, vbq, &purge, force_purge);
+               purge_fragmented_block(vb, vbq, &purge, true);
                spin_unlock(&vb->lock);
        }
        rcu_read_unlock();

And one place of change is missing, it will fail building. 

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 062f4a86b049..0453bc66812e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2277,7 +2277,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
                         * not purgeable, check whether there is dirty
                         * space to be flushed.
                         */
-                       if (!purge_fragmented_block(vb, vbq, &purge_list) &&
+                       if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
                            vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
                                unsigned long va_start = vb->va->va_start;
                                unsigned long s, e;

> 



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24  9:51     ` Thomas Gleixner
@ 2023-05-24 11:24       ` Baoquan He
  2023-05-24 11:26         ` Baoquan He
                           ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Baoquan He @ 2023-05-24 11:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/24/23 at 11:51am, Thomas Gleixner wrote:
> On Wed, May 24 2023 at 17:25, Baoquan He wrote:
> > On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> >> page are left in the system. This is required due to the lazy TLB flush
> >> mechanism in vmalloc.
> >> 
> >> This is tried to achieve by walking the per CPU free lists, but those do
> >> not contain fully utilized vmap blocks because they are removed from the
> >> free list once the blocks free space became zero.
> >
> > The problem description is not accurate. This is tried to achieve for
> > va associated with vmap_block by walking the per CPU free lists, those
> > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy()
> > by calculating the [min:max] of purge_vmap_area_list, because va of
> > vmap_blocks will be added to purge_vmap_area_list too via vb_free().
> 
> No. The fully utilized block cannot be purged when there are still
> active mappings on it. Again:
> 
>   X = vb_alloc()
> ...  
>   Y = vb_alloc()
>     vb->free -= order;
>     if (!vb->vb_free)
>        list_del(vb->free_list);
> ...
>   vb_free(Y)
>     vb->dirty += order;
>     if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
>        free_block(); 


   vb_free(Y)
     vb->dirty += order;
     if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
        free_vmap_block(); 
        -->free_vmap_area_noflush()
           -->merge_or_add_vmap_area(va,
                &purge_vmap_area_root, &purge_vmap_area_list);

The last mapped region will be freed and added to purge list via
vb_free(), it will be flushed with other va in purge list. When it's
mapped via vb_alloc(), it's detached from vbq->free list. When it's
freed via vb_alloc(), it's added to purge list, and flushed, the thing
is duplicated flushing, no missing flush seen here?

> 
> So because $X is not yet unmapped the block is neither on the free list
> nor on purge_vmap_area_list.

Yeah, because $X is not yet unmapped, the block could have unmapped part
flushed, or unflushed. For unflushed part, it's got flushed with $X
altogether in the purge list.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 11:24       ` Baoquan He
@ 2023-05-24 11:26         ` Baoquan He
  2023-05-24 11:36         ` Uladzislau Rezki
  2023-05-24 12:44         ` Thomas Gleixner
  2 siblings, 0 replies; 43+ messages in thread
From: Baoquan He @ 2023-05-24 11:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/24/23 at 07:24pm, Baoquan He wrote:
> On 05/24/23 at 11:51am, Thomas Gleixner wrote:
> > On Wed, May 24 2023 at 17:25, Baoquan He wrote:
> > > On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> > >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> > >> page are left in the system. This is required due to the lazy TLB flush
> > >> mechanism in vmalloc.
> > >> 
> > >> This is tried to achieve by walking the per CPU free lists, but those do
> > >> not contain fully utilized vmap blocks because they are removed from the
> > >> free list once the blocks free space became zero.
> > >
> > > The problem description is not accurate. This is tried to achieve for
> > > va associated with vmap_block by walking the per CPU free lists, those
> > > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy()
> > > by calculating the [min:max] of purge_vmap_area_list, because va of
> > > vmap_blocks will be added to purge_vmap_area_list too via vb_free().
> > 
> > No. The fully utilized block cannot be purged when there are still
> > active mappings on it. Again:
> > 
> >   X = vb_alloc()
> > ...  
> >   Y = vb_alloc()
> >     vb->free -= order;
> >     if (!vb->vb_free)
> >        list_del(vb->free_list);
> > ...
> >   vb_free(Y)
> >     vb->dirty += order;
> >     if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
> >        free_block(); 
> 
> 
>    vb_free(Y)
>      vb->dirty += order;
>      if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
>         free_vmap_block(); 
>         -->free_vmap_area_noflush()
>            -->merge_or_add_vmap_area(va,
>                 &purge_vmap_area_root, &purge_vmap_area_list);
> 
> The last mapped region will be freed and added to purge list via
> vb_free(), it will be flushed with other va in purge list. When it's
> mapped via vb_alloc(), it's detached from vbq->free list. When it's
> freed via vb_alloc(), it's added to purge list, and flushed, the thing
            ~~~~~~~~vb_free(), typo, 
> is duplicated flushing, no missing flush seen here?
> 
> > 
> > So because $X is not yet unmapped the block is neither on the free list
> > nor on purge_vmap_area_list.
> 
> Yeah, because $X is not yet unmapped, the block could have unmapped part
> flushed, or unflushed. For unflushed part, it's got flushed with $X
> altogether in the purge list.
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 11:24       ` Baoquan He
  2023-05-24 11:26         ` Baoquan He
@ 2023-05-24 11:36         ` Uladzislau Rezki
  2023-05-24 12:49           ` Thomas Gleixner
  2023-05-24 12:44         ` Thomas Gleixner
  2 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2023-05-24 11:36 UTC (permalink / raw)
  To: Baoquan He
  Cc: Thomas Gleixner, linux-mm, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24, 2023 at 07:24:43PM +0800, Baoquan He wrote:
> On 05/24/23 at 11:51am, Thomas Gleixner wrote:
> > On Wed, May 24 2023 at 17:25, Baoquan He wrote:
> > > On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> > >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
> > >> page are left in the system. This is required due to the lazy TLB flush
> > >> mechanism in vmalloc.
> > >> 
> > >> This is tried to achieve by walking the per CPU free lists, but those do
> > >> not contain fully utilized vmap blocks because they are removed from the
> > >> free list once the blocks free space became zero.
> > >
> > > The problem description is not accurate. This is tried to achieve for
> > > va associated with vmap_block by walking the per CPU free lists, those
> > > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy()
> > > by calculating the [min:max] of purge_vmap_area_list, because va of
> > > vmap_blocks will be added to purge_vmap_area_list too via vb_free().
> > 
> > No. The fully utilized block cannot be purged when there are still
> > active mappings on it. Again:
> > 
> >   X = vb_alloc()
> > ...  
> >   Y = vb_alloc()
> >     vb->free -= order;
> >     if (!vb->vb_free)
> >        list_del(vb->free_list);
> > ...
> >   vb_free(Y)
> >     vb->dirty += order;
> >     if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
> >        free_block(); 
> 
> 
>    vb_free(Y)
>      vb->dirty += order;
>      if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
>         free_vmap_block(); 
>         -->free_vmap_area_noflush()
>            -->merge_or_add_vmap_area(va,
>                 &purge_vmap_area_root, &purge_vmap_area_list);
> 
> The last mapped region will be freed and added to purge list via
> vb_free(), it will be flushed with other va in purge list. When it's
> mapped via vb_alloc(), it's detached from vbq->free list. When it's
> freed via vb_alloc(), it's added to purge list, and flushed, the thing
> is duplicated flushing, no missing flush seen here?
> 
> > 
> > So because $X is not yet unmapped the block is neither on the free list
> > nor on purge_vmap_area_list.
> 
> Yeah, because $X is not yet unmapped, the block could have unmapped part
> flushed, or unflushed. For unflushed part, it's got flushed with $X
> altogether in the purge list.
> 
If we gurantee that vb is fully flushed, then just do not add it it
purge list:

@@ -2086,12 +2090,7 @@ static void free_vmap_block(struct vmap_block *vb)
        BUG_ON(tmp != vb);

        z = addr_to_cvz(vb->va->va_start);
-
-       spin_lock(&z->busy.lock);
-       unlink_va(vb->va, &z->busy.root);
-       spin_unlock(&z->busy.lock);
-
-       free_vmap_area_noflush(vb->va);
+       free_vmap_area(vb->va);
        kfree_rcu(vb, rcu_head);
 }

and directly return back into global vmap heap.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 11:24       ` Baoquan He
  2023-05-24 11:26         ` Baoquan He
  2023-05-24 11:36         ` Uladzislau Rezki
@ 2023-05-24 12:44         ` Thomas Gleixner
  2023-05-24 13:41           ` Baoquan He
  2 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24 12:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 19:24, Baoquan He wrote:
> On 05/24/23 at 11:51am, Thomas Gleixner wrote:
>    vb_free(Y)
>      vb->dirty += order;
>      if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
>         free_vmap_block(); 
>         -->free_vmap_area_noflush()
>            -->merge_or_add_vmap_area(va,
>                 &purge_vmap_area_root, &purge_vmap_area_list);

This is irrelevant. The path is _NOT_ taken. You even copied the
comment:

       if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_

Did you actually read what I wrote?

Again: It _CANNOT_ be on the purge list because it has active mappings:

1  X = vb_alloc()
   ...  
   Y = vb_alloc()
     vb->free -= order;               // Free space goes to 0
     if (!vb->vb_free)
2      list_del(vb->free_list);       // Block is removed from free list
   ...
   vb_free(Y)
     vb->dirty += order;
3    if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
                                       // because #1 $X is still mapped
                                       // so block is _NOT_ freed and
                                       // _NOT_ put on the purge list

4   unmap_aliases()
     walk_free_list()           // Does not find it because of #2
     walk_purge_list()          // Does not find it because of #3

If the resulting flush range is not covering the $Y TLBs then stale TLBs
stay around.

The xarray walk finds it and guarantees that the TLBs are gone when
unmap_aliases() returns, which is the whole purpose of that function.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 11:36         ` Uladzislau Rezki
@ 2023-05-24 12:49           ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24 12:49 UTC (permalink / raw)
  To: Uladzislau Rezki, Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 13:36, Uladzislau Rezki wrote:
> On Wed, May 24, 2023 at 07:24:43PM +0800, Baoquan He wrote:
>> Yeah, because $X is not yet unmapped, the block could have unmapped part
>> flushed, or unflushed. For unflushed part, it's got flushed with $X
>> altogether in the purge list.
>> 
> If we gurantee that vb is fully flushed, then just do not add it it
> purge list:

There is no such guarantee today and there wont be one even with patch
3/6 applied because:

unmap_aliases()
  flush(dirty_min/max)
  reset(dirty_min/max)

vb_free(lastmapping)
  vunmap_range_noflush();
  vb->dirty += order;
  if (vb->dirty == VMAP_BBMAP_BITS)
     free_block(vb);    <-- the TLBs related to @lastmapping are
                            _NOT_ yet flushed.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily
  2023-05-24 10:34   ` Baoquan He
@ 2023-05-24 12:55     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24 12:55 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 18:34, Baoquan He wrote:
> On 05/23/23 at 04:02pm, Thomas Gleixner wrote:

Can you please trim your replies? It's a pain to find them.

>>  static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>> @@ -4173,7 +4183,7 @@ struct vm_struct **pcpu_get_vm_areas(con
>>  overflow:
>>  	spin_unlock(&free_vmap_area_lock);
>>  	if (!purged) {
>> -		purge_vmap_area_lazy();
>> +		purge_vmap_area_lazy(true);
>>  		purged = true;
>>  
>>  		/* Before "retry", check if we recover. */
>
> Wondering why bothering to add 'force_purge' to purge_vmap_area_lazy(),
> purge_fragmented_blocks_allcpus() if they are all true. Can't we set
> 'force_purge' as true for purge_fragmented_block() in
> purge_fragmented_blocks()?
>
> alloc_vmap_area()
> pcpu_get_vm_areas()
> -->purge_vmap_area_lazy(true)
>    -->purge_fragmented_blocks_allcpus(force_purge=true)
>       -->purge_fragmented_block(force_purge=true)

Yes. purge_fragmented_blocks_allcpus() has only the
purge_vmap_area_lazy() caller left.

> And one place of change is missing, it will fail building.

Already reported by Uladzislau and I sent a fix in reply.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 12:44         ` Thomas Gleixner
@ 2023-05-24 13:41           ` Baoquan He
  2023-05-24 14:31             ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Baoquan He @ 2023-05-24 13:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/24/23 at 02:44pm, Thomas Gleixner wrote:
> On Wed, May 24 2023 at 19:24, Baoquan He wrote:
> > On 05/24/23 at 11:51am, Thomas Gleixner wrote:
> >    vb_free(Y)
> >      vb->dirty += order;
> >      if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
> >         free_vmap_block(); 
> >         -->free_vmap_area_noflush()
> >            -->merge_or_add_vmap_area(va,
> >                 &purge_vmap_area_root, &purge_vmap_area_list);
> 
> This is irrelevant. The path is _NOT_ taken. You even copied the
> comment:

Ah, just copied the whole, didn't notice that. I am not a scrupulous
person.

> 
>        if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
> 
> Did you actually read what I wrote?
> 
> Again: It _CANNOT_ be on the purge list because it has active mappings:
> 
> 1  X = vb_alloc()
>    ...  
>    Y = vb_alloc()
>      vb->free -= order;               // Free space goes to 0
>      if (!vb->vb_free)
> 2      list_del(vb->free_list);       // Block is removed from free list
>    ...
>    vb_free(Y)
>      vb->dirty += order;
> 3    if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
>                                        // because #1 $X is still mapped
>                                        // so block is _NOT_ freed and
>                                        // _NOT_ put on the purge list

So what if $X is unmapped via vb_free($X)? Does the condition satisfied
and can the vb put into purge list?

In your above example, $Y's flush is deferred, but not missed?


> 
> 4   unmap_aliases()
>      walk_free_list()           // Does not find it because of #2
>      walk_purge_list()          // Does not find it because of #3
> 
> If the resulting flush range is not covering the $Y TLBs then stale TLBs
> stay around.

OK, your mean the TLB of $Y will stay around after vb_free() until
the whole vb becomes dirty, and fix that in this patch, you are right.
vm_unmap_aliases() may need try to flush all unmapped ranges in
this case but failed on $Y, while the page which is being reused has the
old alias of $Y.

My thought was attracted to the repeated flush of vmap_block va on purge
list.

By the way, you don't fix issue that in vm_reset_perms(), the direct map 
range will be accumulated with vb va and purge va and could produce
flushing range including huge gap, do you still plan to fix that? I
remember you said you will use array to gather ranges and flush them one
by one.

> 
> The xarray walk finds it and guarantees that the TLBs are gone when
> unmap_aliases() returns, which is the whole purpose of that function.
> 
> Thanks,
> 
>         tglx
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24  9:52     ` Thomas Gleixner
@ 2023-05-24 14:10       ` Baoquan He
  2023-05-24 14:35         ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Baoquan He @ 2023-05-24 14:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On 05/24/23 at 11:52am, Thomas Gleixner wrote:
> On Wed, May 24 2023 at 17:32, Baoquan He wrote:
> > On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
> >> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
> >>  	for_each_possible_cpu(cpu) {
> >>  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
> >>  		struct vmap_block *vb;
> >> +		unsigned long idx;
> >>  
> >>  		rcu_read_lock();
> >
> > Do we need to remove this rcu_read_xx() pair since it marks the RCU
> > read-side critical section on vbq-free list?
> 
> And what protects the xarray lookup?

We have put rcu_read_lock() pair around the xas_find(). And it will find
into xarray for each iteration item. We won't lose the connection to the
next element like list adding or deleting? not very sure, I could be
wrong.

xa_for_each()
-->xa_for_each_start()
   -->xa_for_each_range()
      -->xa_find()
       
void *xa_find(struct xarray *xa, unsigned long *indexp,
                        unsigned long max, xa_mark_t filter)
{
...... 
        rcu_read_lock();
        do {
                if ((__force unsigned int)filter < XA_MAX_MARKS)
                        entry = xas_find_marked(&xas, max, filter);
                else
                        entry = xas_find(&xas, max);
        } while (xas_retry(&xas, entry));
        rcu_read_unlock();
 
        if (entry)
                *indexp = xas.xa_index;
        return entry;
}

> 
> >> -		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> >> +		xa_for_each(&vbq->vmap_blocks, idx, vb) {
> >>  			spin_lock(&vb->lock);
> >>  			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> >>  				unsigned long va_start = vb->va->va_start;
> >> 
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 13:41           ` Baoquan He
@ 2023-05-24 14:31             ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24 14:31 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 21:41, Baoquan He wrote:
> On 05/24/23 at 02:44pm, Thomas Gleixner wrote:
>> On Wed, May 24 2023 at 19:24, Baoquan He wrote:
>> Again: It _CANNOT_ be on the purge list because it has active mappings:
>> 
>> 1  X = vb_alloc()
>>    ...  
>>    Y = vb_alloc()
>>      vb->free -= order;               // Free space goes to 0
>>      if (!vb->vb_free)
>> 2      list_del(vb->free_list);       // Block is removed from free list
>>    ...
>>    vb_free(Y)
>>      vb->dirty += order;
>> 3    if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_
>>                                        // because #1 $X is still mapped
>>                                        // so block is _NOT_ freed and
>>                                        // _NOT_ put on the purge list
>
> So what if $X is unmapped via vb_free($X)? Does the condition satisfied
> and can the vb put into purge list?

Yes, but it is _irrelevant_ for the problem at hand.

> In your above example, $Y's flush is deferred, but not missed?

Yes, but that violates the guarantee of vm_unmap_aliases():

 * The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily
 * to amortize TLB flushing overheads. What this means is that any page you
 * have now, may, in a former life, have been mapped into kernel virtual
 * address by the vmap layer and so there might be some CPUs with TLB entries
 * still referencing that page (additional to the regular 1:1 kernel mapping).
 *
 * vm_unmap_aliases flushes all such lazy mappings. After it returns, we can
 * be sure that none of the pages we have control over will have any aliases
 * from the vmap layer.

>> 4   unmap_aliases()
>>      walk_free_list()           // Does not find it because of #2
>>      walk_purge_list()          // Does not find it because of #3
>> 
>> If the resulting flush range is not covering the $Y TLBs then stale TLBs
>> stay around.
>
> OK, your mean the TLB of $Y will stay around after vb_free() until
> the whole vb becomes dirty, and fix that in this patch, you are right.
> vm_unmap_aliases() may need try to flush all unmapped ranges in
> this case but failed on $Y, while the page which is being reused has the
> old alias of $Y.

vm_unmap_aliases() _must_ guarantee that the old TLBs for $Y are gone.

> My thought was attracted to the repeated flush of vmap_block va on purge
> list.
>
> By the way, you don't fix issue that in vm_reset_perms(), the direct map 
> range will be accumulated with vb va and purge va and could produce
> flushing range including huge gap, do you still plan to fix that? I
> remember you said you will use array to gather ranges and flush them one
> by one.

One thing at a time. This series is a prerequisite.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
  2023-05-24 14:10       ` Baoquan He
@ 2023-05-24 14:35         ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2023-05-24 14:35 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
	Lorenzo Stoakes, Peter Zijlstra

On Wed, May 24 2023 at 22:10, Baoquan He wrote:
> On 05/24/23 at 11:52am, Thomas Gleixner wrote:
>> On Wed, May 24 2023 at 17:32, Baoquan He wrote:
>> > On 05/23/23 at 04:02pm, Thomas Gleixner wrote:
>> >> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
>> >>  	for_each_possible_cpu(cpu) {
>> >>  		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
>> >>  		struct vmap_block *vb;
>> >> +		unsigned long idx;
>> >>  
>> >>  		rcu_read_lock();
>> >
>> > Do we need to remove this rcu_read_xx() pair since it marks the RCU
>> > read-side critical section on vbq-free list?
>> 
>> And what protects the xarray lookup?
>
> We have put rcu_read_lock() pair around the xas_find(). And it will find
> into xarray for each iteration item. We won't lose the connection to the
> next element like list adding or deleting? not very sure, I could be
> wrong.
>
> xa_for_each()
> -->xa_for_each_start()
>    -->xa_for_each_range()
>       -->xa_find()

I know how xarray works. No need to copy the code.

rcu_read_lock() inside xa_find() protects the search, but it does not
protect the returned entry, which might go away right after xa_find()
does rcu_read_unlock().

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2023-05-24 14:35 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 14:02 [patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
2023-05-23 14:02 ` [patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
2023-05-23 15:17   ` Christoph Hellwig
2023-05-23 16:40     ` Thomas Gleixner
2023-05-23 16:47       ` Uladzislau Rezki
2023-05-23 19:18   ` Lorenzo Stoakes
2023-05-24  9:19     ` Uladzislau Rezki
2023-05-24  9:25   ` Baoquan He
2023-05-24  9:51     ` Thomas Gleixner
2023-05-24 11:24       ` Baoquan He
2023-05-24 11:26         ` Baoquan He
2023-05-24 11:36         ` Uladzislau Rezki
2023-05-24 12:49           ` Thomas Gleixner
2023-05-24 12:44         ` Thomas Gleixner
2023-05-24 13:41           ` Baoquan He
2023-05-24 14:31             ` Thomas Gleixner
2023-05-24  9:32   ` Baoquan He
2023-05-24  9:52     ` Thomas Gleixner
2023-05-24 14:10       ` Baoquan He
2023-05-24 14:35         ` Thomas Gleixner
2023-05-23 14:02 ` [patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
2023-05-23 15:21   ` Christoph Hellwig
2023-05-23 14:02 ` [patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
2023-05-23 15:27   ` Christoph Hellwig
2023-05-23 16:10     ` Thomas Gleixner
2023-05-24  9:43   ` Baoquan He
2023-05-23 14:02 ` [patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
2023-05-23 15:29   ` Christoph Hellwig
2023-05-23 16:17     ` Thomas Gleixner
2023-05-24  9:20       ` Uladzislau Rezki
2023-05-23 14:02 ` [patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
2023-05-24  9:15   ` Uladzislau Rezki
2023-05-23 14:02 ` [patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
2023-05-23 15:30   ` Christoph Hellwig
2023-05-24 10:34   ` Baoquan He
2023-05-24 12:55     ` Thomas Gleixner
2023-05-23 16:24 ` [patch 0/6] mm/vmalloc: Assorted fixes and improvements Uladzislau Rezki
2023-05-23 17:33   ` Thomas Gleixner
2023-05-23 17:39     ` Thomas Gleixner
2023-05-23 17:48       ` Uladzislau Rezki
2023-05-23 17:51         ` Uladzislau Rezki
2023-05-23 17:55         ` Uladzislau Rezki
2023-05-23 18:40           ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.