linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Mitigate a vmap lock contention
@ 2023-05-22 11:08 Uladzislau Rezki (Sony)
  2023-05-22 11:08 ` [PATCH 1/9] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
                   ` (9 more replies)
  0 siblings, 10 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Hello, folk.

1. This is a followup of the vmap topic that was highlighted at the LSFMMBPF-2023
conference. This small serial attempts to mitigate the contention across the
vmap/vmalloc code. The problem is described here:

wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf

The material is tagged as a v2 version. It contains extra slides about testing
the throughput, steps and comparison with a current approach.

2. Motivation.

- The vmap code is not scalled to number of CPUs and this should be fixed;
- XFS folk has complained several times that vmalloc might be contented on
  their workloads:

<snip>
commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jan 4 17:22:18 2022 -0800

    xfs: reduce kvmalloc overhead for CIL shadow buffers
    
    Oh, let me count the ways that the kvmalloc API sucks dog eggs.
    
    The problem is when we are logging lots of large objects, we hit
    kvmalloc really damn hard with costly order allocations, and
    behaviour utterly sucks:
...
<snip>

- If we align with per-cpu allocator in terms of performance we can
  remove it to simplify the vmap code. Currently we have 3 allocators
  See:

<snip>
/*** Per cpu kva allocator ***/

/*
 * vmap space is limited especially on 32 bit architectures. Ensure there is
 * room for at least 16 percpu vmap blocks per CPU.
 */
/*
 * If we had a constant VMALLOC_START and VMALLOC_END, we'd like to be able
 * to #define VMALLOC_SPACE		(VMALLOC_END-VMALLOC_START). Guess
 * instead (we just need a rough idea)
 */
<snip>

3. Test

On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:

    1-page     1-page-this-patch
1  0.576131   vs   0.555889
2   2.68376   vs    1.07895
3   4.26502   vs    1.01739
4   6.04306   vs    1.28924
5   8.04786   vs    1.57616
6   9.38844   vs    1.78142
7   9.53481   vs    2.00172
8   10.4609   vs    2.15964
9   10.6089   vs      2.484
10  11.7372   vs    2.40443
11  11.5969   vs    2.71635
12   13.053   vs     2.6162
13  12.2973   vs      2.843
14  13.1429   vs    2.85714
15  13.7348   vs    2.90691
16  14.3687   vs     3.0285
17  14.8823   vs    3.05717
18  14.9571   vs    2.98018
19  14.9127   vs     3.0951
20  16.0786   vs    3.19521
21  15.8055   vs    3.24915
22  16.8087   vs     3.2521
23  16.7298   vs    3.25698
24   17.244   vs    3.36586
25  17.8416   vs    3.39384
26  18.8008   vs    3.40907
27  18.5591   vs     3.5254
28   19.761   vs    3.55468
29    20.06   vs    3.59869
30  20.4353   vs     3.6991
31  20.9082   vs    3.73028
32  21.0865   vs    3.82904

1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
pair throughput.

The series is based on the v6.3 tag and considered as beta version. Please note
it does not support vread() functionality yet. So it means that it is not fully
complete.

Any input/thoughts are welcome.

Uladzislau Rezki (Sony) (9):
  mm: vmalloc: Add va_alloc() helper
  mm: vmalloc: Rename adjust_va_to_fit_type() function
  mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c
  mm: vmalloc: Add a per-CPU-zone infrastructure
  mm: vmalloc: Insert busy-VA per-cpu zone
  mm: vmalloc: Support multiple zones in vmallocinfo
  mm: vmalloc: Insert lazy-VA per-cpu zone
  mm: vmalloc: Offload free_vmap_area_lock global lock
  mm: vmalloc: Scale and activate cvz_size

 mm/vmalloc.c | 641 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 448 insertions(+), 193 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/9] mm: vmalloc: Add va_alloc() helper
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-23  6:05   ` Christoph Hellwig
  2023-05-27 19:55   ` Lorenzo Stoakes
  2023-05-22 11:08 ` [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Currently __alloc_vmap_area() function contains an open codded
logic that finds and adjusts a VA based on allocation request.

Introduce a va_alloc() helper that adjusts found VA only. It
will be used later at least in two places.

There is no a functional change as a result of this patch.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 31ff782d368b..409285b68a67 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1482,6 +1482,32 @@ adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
 	return 0;
 }
 
+static unsigned long
+va_alloc(struct vmap_area *va,
+	struct rb_root *root, struct list_head *head,
+	unsigned long size, unsigned long align,
+	unsigned long vstart, unsigned long vend)
+{
+	unsigned long nva_start_addr;
+	int ret;
+
+	if (va->va_start > vstart)
+		nva_start_addr = ALIGN(va->va_start, align);
+	else
+		nva_start_addr = ALIGN(vstart, align);
+
+	/* Check the "vend" restriction. */
+	if (nva_start_addr + size > vend)
+		return vend;
+
+	/* Update the free vmap_area. */
+	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
+	if (WARN_ON_ONCE(ret))
+		return vend;
+
+	return nva_start_addr;
+}
+
 /*
  * Returns a start address of the newly allocated area, if success.
  * Otherwise a vend is returned that indicates failure.
@@ -1494,7 +1520,6 @@ __alloc_vmap_area(struct rb_root *root, struct list_head *head,
 	bool adjust_search_size = true;
 	unsigned long nva_start_addr;
 	struct vmap_area *va;
-	int ret;
 
 	/*
 	 * Do not adjust when:
@@ -1512,18 +1537,8 @@ __alloc_vmap_area(struct rb_root *root, struct list_head *head,
 	if (unlikely(!va))
 		return vend;
 
-	if (va->va_start > vstart)
-		nva_start_addr = ALIGN(va->va_start, align);
-	else
-		nva_start_addr = ALIGN(vstart, align);
-
-	/* Check the "vend" restriction. */
-	if (nva_start_addr + size > vend)
-		return vend;
-
-	/* Update the free vmap_area. */
-	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
-	if (WARN_ON_ONCE(ret))
+	nva_start_addr = va_alloc(va, root, head, size, align, vstart, vend);
+	if (nva_start_addr == vend)
 		return vend;
 
 #if DEBUG_AUGMENT_LOWEST_MATCH_CHECK
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
  2023-05-22 11:08 ` [PATCH 1/9] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-23  6:06   ` Christoph Hellwig
                     ` (2 more replies)
  2023-05-22 11:08 ` [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

This patch renames the adjust_va_to_fit_type() function
to a shorter variant and more expressive. A new name is
va_clip().

There is no a functional change as a result of this patch.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 409285b68a67..5f900efec6a9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1383,9 +1383,9 @@ classify_va_fit_type(struct vmap_area *va,
 }
 
 static __always_inline int
-adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
-		      struct vmap_area *va, unsigned long nva_start_addr,
-		      unsigned long size)
+va_clip(struct rb_root *root, struct list_head *head,
+	struct vmap_area *va, unsigned long nva_start_addr,
+	unsigned long size)
 {
 	struct vmap_area *lva = NULL;
 	enum fit_type type = classify_va_fit_type(va, nva_start_addr, size);
@@ -1501,7 +1501,7 @@ va_alloc(struct vmap_area *va,
 		return vend;
 
 	/* Update the free vmap_area. */
-	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
+	ret = va_clip(root, head, va, nva_start_addr, size);
 	if (WARN_ON_ONCE(ret))
 		return vend;
 
@@ -3979,9 +3979,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 			/* It is a BUG(), but trigger recovery instead. */
 			goto recovery;
 
-		ret = adjust_va_to_fit_type(&free_vmap_area_root,
-					    &free_vmap_area_list,
-					    va, start, size);
+		ret = va_clip(&free_vmap_area_root,
+			&free_vmap_area_list, va, start, size);
 		if (WARN_ON_ONCE(unlikely(ret)))
 			/* It is a BUG(), but trigger recovery instead. */
 			goto recovery;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
  2023-05-22 11:08 ` [PATCH 1/9] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
  2023-05-22 11:08 ` [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-23  6:06   ` Christoph Hellwig
  2023-05-27 21:52   ` Lorenzo Stoakes
  2023-05-22 11:08 ` [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure Uladzislau Rezki (Sony)
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

A vmap_init_free_space() is a function that setups a vmap space
and is considered as part of initialization phase. Since a main
entry which is vmalloc_init(), has been moved down in vmalloc.c
it makes sense to follow the pattern.

There is no a functional change as a result of this patch.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 82 ++++++++++++++++++++++++++--------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5f900efec6a9..19dcffb0d563 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2416,47 +2416,6 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
 	kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
 }
 
-static void vmap_init_free_space(void)
-{
-	unsigned long vmap_start = 1;
-	const unsigned long vmap_end = ULONG_MAX;
-	struct vmap_area *busy, *free;
-
-	/*
-	 *     B     F     B     B     B     F
-	 * -|-----|.....|-----|-----|-----|.....|-
-	 *  |           The KVA space           |
-	 *  |<--------------------------------->|
-	 */
-	list_for_each_entry(busy, &vmap_area_list, list) {
-		if (busy->va_start - vmap_start > 0) {
-			free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
-			if (!WARN_ON_ONCE(!free)) {
-				free->va_start = vmap_start;
-				free->va_end = busy->va_start;
-
-				insert_vmap_area_augment(free, NULL,
-					&free_vmap_area_root,
-						&free_vmap_area_list);
-			}
-		}
-
-		vmap_start = busy->va_end;
-	}
-
-	if (vmap_end - vmap_start > 0) {
-		free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
-		if (!WARN_ON_ONCE(!free)) {
-			free->va_start = vmap_start;
-			free->va_end = vmap_end;
-
-			insert_vmap_area_augment(free, NULL,
-				&free_vmap_area_root,
-					&free_vmap_area_list);
-		}
-	}
-}
-
 static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
 	struct vmap_area *va, unsigned long flags, const void *caller)
 {
@@ -4271,6 +4230,47 @@ module_init(proc_vmalloc_init);
 
 #endif
 
+static void vmap_init_free_space(void)
+{
+	unsigned long vmap_start = 1;
+	const unsigned long vmap_end = ULONG_MAX;
+	struct vmap_area *busy, *free;
+
+	/*
+	 *     B     F     B     B     B     F
+	 * -|-----|.....|-----|-----|-----|.....|-
+	 *  |           The KVA space           |
+	 *  |<--------------------------------->|
+	 */
+	list_for_each_entry(busy, &vmap_area_list, list) {
+		if (busy->va_start - vmap_start > 0) {
+			free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
+			if (!WARN_ON_ONCE(!free)) {
+				free->va_start = vmap_start;
+				free->va_end = busy->va_start;
+
+				insert_vmap_area_augment(free, NULL,
+					&free_vmap_area_root,
+						&free_vmap_area_list);
+			}
+		}
+
+		vmap_start = busy->va_end;
+	}
+
+	if (vmap_end - vmap_start > 0) {
+		free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
+		if (!WARN_ON_ONCE(!free)) {
+			free->va_start = vmap_start;
+			free->va_end = vmap_end;
+
+			insert_vmap_area_augment(free, NULL,
+				&free_vmap_area_root,
+					&free_vmap_area_list);
+		}
+	}
+}
+
 void __init vmalloc_init(void)
 {
 	struct vmap_area *va;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (2 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-23  6:08   ` Christoph Hellwig
  2023-05-22 11:08 ` [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone Uladzislau Rezki (Sony)
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce a basic skeleton that defines a cache entity for
a CPU. It consist of bookkeeping data which belong to a CPU
zone. A CPU-zone defines a virtual address range with its
fixed size.

This approach gives a possibility to map any address to a
particular CPU-zone it belongs to. Below is a high-level
description how it is used in alloc and free paths:

alloc-path:
 in-front-pre-fetch occurs if a cache is empty. A chunk is
 placed to a free-tree of this CPU. Finally this CPU cache
 is clipped in order to accomplish a request. A fresh VA
 resides in a busy-tree of specific CPU-zone it is bound
 to.

free-path:
 va->va_start is converted to a zone, so a lookup occurs in
 a bookkeeping zone it belongs to. Once it gets removed a VA
 is moved to a lazy-tree. It is freed after a TLB flushing.

A "cvz_size", that reflects a zone size, is set to ULONG_MAX
by this patch. The reason is we would like to follow an old
behavior until all dependent changes are in place.

There is no a functional change as a result of this patch.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 19dcffb0d563..f6da2590b4de 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -773,6 +773,61 @@ static struct rb_root free_vmap_area_root = RB_ROOT;
  */
 static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node);
 
+/*
+ * A per-cpu effective vmap cache logic. It allows to pre-fetch
+ * a memory vmap chunk with a specified size. A bookkeeping data
+ * is stored and accessed per-CPU.
+ */
+enum { FREE = 0, BUSY, LAZY, NFBL };
+
+#define fbl(z, i, m) z->fbl[i].m
+#define fbl_root(z, i) fbl(z, i, root)
+#define fbl_head(z, i) fbl(z, i, head)
+
+#define fbl_lock(z, i) spin_lock(&fbl(z, i, lock))
+#define fbl_unlock(z, i) spin_unlock(&fbl(z, i, lock))
+
+struct cpu_vmap_zone {
+	/*
+	 * FREE, BUSY, LAZY bookkeeping data of this CPU zone.
+	 */
+	struct {
+		struct rb_root root;
+		struct list_head head;
+		spinlock_t lock;
+	} fbl[NFBL];
+
+	/*
+	 * A list of outstanding lazily-freed areas, ready to
+	 * be drained, which belong(address space) to this CPU
+	 * zone.
+	 */
+	struct list_head purge_list;
+
+	/*
+	 * It controls that only one CPU can pre-fetch this
+	 * CPU zone.
+	 */
+	atomic_t fill_in_progress;
+};
+
+static DEFINE_PER_CPU(struct cpu_vmap_zone, cpu_vmap_zone);
+
+/* Disable a per-cpu caching. */
+static __read_mostly unsigned long cvz_size = ULONG_MAX;
+
+static inline unsigned int
+addr_to_cpu(unsigned long addr)
+{
+	return (addr / cvz_size) % num_possible_cpus();
+}
+
+static inline struct cpu_vmap_zone *
+__maybe_unused addr_to_cvz(unsigned long addr)
+{
+	return &per_cpu(cpu_vmap_zone, addr_to_cpu(addr));
+}
+
 static __always_inline unsigned long
 va_size(struct vmap_area *va)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (3 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-23  6:12   ` Christoph Hellwig
  2023-05-22 11:08 ` [PATCH 6/9] mm: vmalloc: Support multiple zones in vmallocinfo Uladzislau Rezki (Sony)
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Store busy-VA objects per a CPU zone. A va->va_start address is
converted into a correct zone where it is placed and resides. An
addr_to_cvz() function is used to do a proper address conversion.

Such approach balances VAs across CPUs. That is why an access
becomes scalable to number of CPUs in a system.

Please note:

Since a zone size is set to ULONG_MAX, i.e. everything is bound
thus accessed to the CPU_0 so far, this patch does not give any
difference comparing with a current behavior.

The global vmap_area_lock, vmap_area_root are removed as there
is no need in it anymore. The vmap_area_list is still kept and
is _empty_. It is exported for a kexec only.

The vmallocinfo and vread() have to be reworked to be able to
handle multiple zones. As a result of this patch it can handle
only one zone, i.e. when cache is disabled.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 127 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 84 insertions(+), 43 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f6da2590b4de..a9170fe19909 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -729,11 +729,9 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0
 
 
-static DEFINE_SPINLOCK(vmap_area_lock);
 static DEFINE_SPINLOCK(free_vmap_area_lock);
 /* Export for kexec only */
 LIST_HEAD(vmap_area_list);
-static struct rb_root vmap_area_root = RB_ROOT;
 static bool vmap_initialized __read_mostly;
 
 static struct rb_root purge_vmap_area_root = RB_ROOT;
@@ -823,7 +821,7 @@ addr_to_cpu(unsigned long addr)
 }
 
 static inline struct cpu_vmap_zone *
-__maybe_unused addr_to_cvz(unsigned long addr)
+addr_to_cvz(unsigned long addr)
 {
 	return &per_cpu(cpu_vmap_zone, addr_to_cpu(addr));
 }
@@ -859,10 +857,10 @@ unsigned long vmalloc_nr_pages(void)
 }
 
 /* Look up the first VA which satisfies addr < va_end, NULL if none. */
-static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr)
+static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root)
 {
 	struct vmap_area *va = NULL;
-	struct rb_node *n = vmap_area_root.rb_node;
+	struct rb_node *n = root->rb_node;
 
 	addr = (unsigned long)kasan_reset_tag((void *)addr);
 
@@ -1608,12 +1606,14 @@ __alloc_vmap_area(struct rb_root *root, struct list_head *head,
  */
 static void free_vmap_area(struct vmap_area *va)
 {
+	struct cpu_vmap_zone *z = addr_to_cvz(va->va_start);
+
 	/*
 	 * Remove from the busy tree/list.
 	 */
-	spin_lock(&vmap_area_lock);
-	unlink_va(va, &vmap_area_root);
-	spin_unlock(&vmap_area_lock);
+	fbl_lock(z, BUSY);
+	unlink_va(va, &fbl_root(z, BUSY));
+	fbl_unlock(z, BUSY);
 
 	/*
 	 * Insert/Merge it back to the free tree/list.
@@ -1656,6 +1656,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 				int node, gfp_t gfp_mask,
 				unsigned long va_flags)
 {
+	struct cpu_vmap_zone *z;
 	struct vmap_area *va;
 	unsigned long freed;
 	unsigned long addr;
@@ -1701,9 +1702,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	va->vm = NULL;
 	va->flags = va_flags;
 
-	spin_lock(&vmap_area_lock);
-	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
-	spin_unlock(&vmap_area_lock);
+	z = addr_to_cvz(va->va_start);
+
+	fbl_lock(z, BUSY);
+	insert_vmap_area(va, &fbl_root(z, BUSY), &fbl_head(z, BUSY));
+	fbl_unlock(z, BUSY);
 
 	BUG_ON(!IS_ALIGNED(va->va_start, align));
 	BUG_ON(va->va_start < vstart);
@@ -1926,24 +1929,26 @@ static void free_unmap_vmap_area(struct vmap_area *va)
 
 struct vmap_area *find_vmap_area(unsigned long addr)
 {
+	struct cpu_vmap_zone *z = addr_to_cvz(addr);
 	struct vmap_area *va;
 
-	spin_lock(&vmap_area_lock);
-	va = __find_vmap_area(addr, &vmap_area_root);
-	spin_unlock(&vmap_area_lock);
+	fbl_lock(z, BUSY);
+	va = __find_vmap_area(addr, &fbl_root(z, BUSY));
+	fbl_unlock(z, BUSY);
 
 	return va;
 }
 
 static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
 {
+	struct cpu_vmap_zone *z = addr_to_cvz(addr);
 	struct vmap_area *va;
 
-	spin_lock(&vmap_area_lock);
-	va = __find_vmap_area(addr, &vmap_area_root);
+	fbl_lock(z, BUSY);
+	va = __find_vmap_area(addr, &fbl_root(z, BUSY));
 	if (va)
-		unlink_va(va, &vmap_area_root);
-	spin_unlock(&vmap_area_lock);
+		unlink_va(va, &fbl_root(z, BUSY));
+	fbl_unlock(z, BUSY);
 
 	return va;
 }
@@ -2095,14 +2100,17 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 
 static void free_vmap_block(struct vmap_block *vb)
 {
+	struct cpu_vmap_zone *z;
 	struct vmap_block *tmp;
 
 	tmp = xa_erase(&vmap_blocks, addr_to_vb_idx(vb->va->va_start));
 	BUG_ON(tmp != vb);
 
-	spin_lock(&vmap_area_lock);
-	unlink_va(vb->va, &vmap_area_root);
-	spin_unlock(&vmap_area_lock);
+	z = addr_to_cvz(vb->va->va_start);
+
+	fbl_lock(z, BUSY);
+	unlink_va(vb->va, &fbl_root(z, BUSY));
+	fbl_unlock(z, BUSY);
 
 	free_vmap_area_noflush(vb->va);
 	kfree_rcu(vb, rcu_head);
@@ -2484,9 +2492,11 @@ static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
 static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
 			      unsigned long flags, const void *caller)
 {
-	spin_lock(&vmap_area_lock);
+	struct cpu_vmap_zone *z = addr_to_cvz(va->va_start);
+
+	fbl_lock(z, BUSY);
 	setup_vmalloc_vm_locked(vm, va, flags, caller);
-	spin_unlock(&vmap_area_lock);
+	fbl_unlock(z, BUSY);
 }
 
 static void clear_vm_uninitialized_flag(struct vm_struct *vm)
@@ -3605,6 +3615,7 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags
  */
 long vread(char *buf, char *addr, unsigned long count)
 {
+	struct cpu_vmap_zone *z;
 	struct vmap_area *va;
 	struct vm_struct *vm;
 	char *vaddr, *buf_start = buf;
@@ -3617,8 +3628,11 @@ long vread(char *buf, char *addr, unsigned long count)
 	if ((unsigned long) addr + count < count)
 		count = -(unsigned long) addr;
 
-	spin_lock(&vmap_area_lock);
-	va = find_vmap_area_exceed_addr((unsigned long)addr);
+	/* Hooked to CPU0 because a cache is not activated. */
+	z = &per_cpu(cpu_vmap_zone, 0);
+	fbl_lock(z, BUSY);
+
+	va = find_vmap_area_exceed_addr((unsigned long)addr, &fbl_root(z, BUSY));
 	if (!va)
 		goto finished;
 
@@ -3626,7 +3640,7 @@ long vread(char *buf, char *addr, unsigned long count)
 	if ((unsigned long)addr + count <= va->va_start)
 		goto finished;
 
-	list_for_each_entry_from(va, &vmap_area_list, list) {
+	list_for_each_entry_from(va, &fbl_head(z, BUSY), list) {
 		if (!count)
 			break;
 
@@ -3674,7 +3688,7 @@ long vread(char *buf, char *addr, unsigned long count)
 		count -= n;
 	}
 finished:
-	spin_unlock(&vmap_area_lock);
+	fbl_unlock(z, BUSY);
 
 	if (buf == buf_start)
 		return 0;
@@ -4014,14 +4028,15 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 	}
 
 	/* insert all vm's */
-	spin_lock(&vmap_area_lock);
 	for (area = 0; area < nr_vms; area++) {
-		insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list);
+		struct cpu_vmap_zone *z = addr_to_cvz(vas[area]->va_start);
 
+		fbl_lock(z, BUSY);
+		insert_vmap_area(vas[area], &fbl_root(z, BUSY), &fbl_head(z, BUSY));
 		setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC,
 				 pcpu_get_vm_areas);
+		fbl_unlock(z, BUSY);
 	}
-	spin_unlock(&vmap_area_lock);
 
 	/*
 	 * Mark allocated areas as accessible. Do it now as a best-effort
@@ -4145,24 +4160,24 @@ bool vmalloc_dump_obj(void *object)
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)
 	__acquires(&vmap_purge_lock)
-	__acquires(&vmap_area_lock)
+	__acquires(&fbl(&per_cpu(cpu_vmap_zone, 0), BUSY, lock))
 {
 	mutex_lock(&vmap_purge_lock);
-	spin_lock(&vmap_area_lock);
+	fbl_lock((&per_cpu(cpu_vmap_zone, 0)), BUSY);
 
-	return seq_list_start(&vmap_area_list, *pos);
+	return seq_list_start(&fbl_head((&per_cpu(cpu_vmap_zone, 0)), BUSY), *pos);
 }
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-	return seq_list_next(p, &vmap_area_list, pos);
+	return seq_list_next(p, &fbl_head((&per_cpu(cpu_vmap_zone, 0)), BUSY), pos);
 }
 
 static void s_stop(struct seq_file *m, void *p)
-	__releases(&vmap_area_lock)
+	__releases(&fbl(&per_cpu(cpu_vmap_zone, 0), BUSY, lock))
 	__releases(&vmap_purge_lock)
 {
-	spin_unlock(&vmap_area_lock);
+	fbl_unlock((&per_cpu(cpu_vmap_zone, 0)), BUSY);
 	mutex_unlock(&vmap_purge_lock);
 }
 
@@ -4258,7 +4273,7 @@ static int s_show(struct seq_file *m, void *p)
 	 * As a final step, dump "unpurged" areas.
 	 */
 final:
-	if (list_is_last(&va->list, &vmap_area_list))
+	if (list_is_last(&va->list, &fbl_head((&per_cpu(cpu_vmap_zone, 0)), BUSY)))
 		show_purge_info(m);
 
 	return 0;
@@ -4289,7 +4304,8 @@ static void vmap_init_free_space(void)
 {
 	unsigned long vmap_start = 1;
 	const unsigned long vmap_end = ULONG_MAX;
-	struct vmap_area *busy, *free;
+	struct vmap_area *free;
+	struct vm_struct *busy;
 
 	/*
 	 *     B     F     B     B     B     F
@@ -4297,12 +4313,12 @@ static void vmap_init_free_space(void)
 	 *  |           The KVA space           |
 	 *  |<--------------------------------->|
 	 */
-	list_for_each_entry(busy, &vmap_area_list, list) {
-		if (busy->va_start - vmap_start > 0) {
+	for (busy = vmlist; busy; busy = busy->next) {
+		if (busy->addr - vmap_start > 0) {
 			free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
 			if (!WARN_ON_ONCE(!free)) {
 				free->va_start = vmap_start;
-				free->va_end = busy->va_start;
+				free->va_end = (unsigned long) busy->addr;
 
 				insert_vmap_area_augment(free, NULL,
 					&free_vmap_area_root,
@@ -4310,7 +4326,7 @@ static void vmap_init_free_space(void)
 			}
 		}
 
-		vmap_start = busy->va_end;
+		vmap_start = (unsigned long) busy->addr + busy->size;
 	}
 
 	if (vmap_end - vmap_start > 0) {
@@ -4326,6 +4342,22 @@ static void vmap_init_free_space(void)
 	}
 }
 
+static void vmap_init_pcpu_zones(void)
+{
+	struct cpu_vmap_zone *z;
+	int i, j;
+
+	for_each_possible_cpu(i) {
+		z = per_cpu_ptr(&cpu_vmap_zone, i);
+
+		for (j = 0; j < ARRAY_SIZE(z->fbl); j++) {
+			INIT_LIST_HEAD(&z->fbl[j].head);
+			z->fbl[j].root = RB_ROOT;
+			spin_lock_init(&z->fbl[j].lock);
+		}
+	}
+}
+
 void __init vmalloc_init(void)
 {
 	struct vmap_area *va;
@@ -4349,8 +4381,15 @@ void __init vmalloc_init(void)
 		INIT_WORK(&p->wq, delayed_vfree_work);
 	}
 
+	/*
+	 * Setup per-cpu data before importing vmlist.
+	 */
+	vmap_init_pcpu_zones();
+
 	/* Import existing vmlist entries. */
 	for (tmp = vmlist; tmp; tmp = tmp->next) {
+		struct cpu_vmap_zone *z;
+
 		va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
 		if (WARN_ON_ONCE(!va))
 			continue;
@@ -4358,7 +4397,9 @@ void __init vmalloc_init(void)
 		va->va_start = (unsigned long)tmp->addr;
 		va->va_end = va->va_start + tmp->size;
 		va->vm = tmp;
-		insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
+
+		z = addr_to_cvz(va->va_start);
+		insert_vmap_area(va, &fbl_root(z, BUSY), &fbl_head(z, BUSY));
 	}
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/9] mm: vmalloc: Support multiple zones in vmallocinfo
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (4 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-22 11:08 ` [PATCH 7/9] mm: vmalloc: Insert lazy-VA per-cpu zone Uladzislau Rezki (Sony)
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

A global vmap area busy tree has been removed and replaced by
multiple per-cpu trees/lists, therefore we need to traversal
all per-cpu busy-lists in order to dump all allocated objects.

Please note, after this patch, dumped addresses of allocated
areas are not sequential. See an example below:

  0   1   2   0   1   2
|---|---|---|---|---|---|-> vmap space

There 3 CPUs dumping is done per-CPU zone, as you can see
address of zone_0 can be ahead of addresses residing in the
zone_1 or zone_2.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 90 ++++++++++++++++++++++++++--------------------------
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a9170fe19909..dd83deb5ef4f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4159,26 +4159,18 @@ bool vmalloc_dump_obj(void *object)
 
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)
-	__acquires(&vmap_purge_lock)
-	__acquires(&fbl(&per_cpu(cpu_vmap_zone, 0), BUSY, lock))
 {
-	mutex_lock(&vmap_purge_lock);
-	fbl_lock((&per_cpu(cpu_vmap_zone, 0)), BUSY);
-
-	return seq_list_start(&fbl_head((&per_cpu(cpu_vmap_zone, 0)), BUSY), *pos);
+	return *pos < 1 ? (void *)1 : NULL;
 }
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-	return seq_list_next(p, &fbl_head((&per_cpu(cpu_vmap_zone, 0)), BUSY), pos);
+	++*pos;
+	return NULL;
 }
 
 static void s_stop(struct seq_file *m, void *p)
-	__releases(&fbl(&per_cpu(cpu_vmap_zone, 0), BUSY, lock))
-	__releases(&vmap_purge_lock)
 {
-	fbl_unlock((&per_cpu(cpu_vmap_zone, 0)), BUSY);
-	mutex_unlock(&vmap_purge_lock);
 }
 
 static void show_numa_info(struct seq_file *m, struct vm_struct *v)
@@ -4209,6 +4201,7 @@ static void show_purge_info(struct seq_file *m)
 {
 	struct vmap_area *va;
 
+	mutex_lock(&vmap_purge_lock);
 	spin_lock(&purge_vmap_area_lock);
 	list_for_each_entry(va, &purge_vmap_area_list, list) {
 		seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n",
@@ -4216,65 +4209,72 @@ static void show_purge_info(struct seq_file *m)
 			va->va_end - va->va_start);
 	}
 	spin_unlock(&purge_vmap_area_lock);
+	mutex_unlock(&vmap_purge_lock);
 }
 
 static int s_show(struct seq_file *m, void *p)
 {
+	struct cpu_vmap_zone *z;
 	struct vmap_area *va;
 	struct vm_struct *v;
+	int i;
 
-	va = list_entry(p, struct vmap_area, list);
+	for_each_possible_cpu(i) {
+		z = per_cpu_ptr(&cpu_vmap_zone, i);
 
-	if (!va->vm) {
-		if (va->flags & VMAP_RAM)
-			seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
-				(void *)va->va_start, (void *)va->va_end,
-				va->va_end - va->va_start);
+		fbl_lock(z, BUSY);
+		list_for_each_entry(va, &fbl_head(z, BUSY), list) {
+			if (!va->vm) {
+				if (va->flags & VMAP_RAM)
+					seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
+						(void *)va->va_start, (void *)va->va_end,
+						va->va_end - va->va_start);
 
-		goto final;
-	}
+				continue;
+			}
 
-	v = va->vm;
+			v = va->vm;
 
-	seq_printf(m, "0x%pK-0x%pK %7ld",
-		v->addr, v->addr + v->size, v->size);
+			seq_printf(m, "0x%pK-0x%pK %7ld",
+			   v->addr, v->addr + v->size, v->size);
 
-	if (v->caller)
-		seq_printf(m, " %pS", v->caller);
+			if (v->caller)
+				seq_printf(m, " %pS", v->caller);
 
-	if (v->nr_pages)
-		seq_printf(m, " pages=%d", v->nr_pages);
+			if (v->nr_pages)
+				seq_printf(m, " pages=%d", v->nr_pages);
 
-	if (v->phys_addr)
-		seq_printf(m, " phys=%pa", &v->phys_addr);
+			if (v->phys_addr)
+				seq_printf(m, " phys=%pa", &v->phys_addr);
 
-	if (v->flags & VM_IOREMAP)
-		seq_puts(m, " ioremap");
+			if (v->flags & VM_IOREMAP)
+				seq_puts(m, " ioremap");
 
-	if (v->flags & VM_ALLOC)
-		seq_puts(m, " vmalloc");
+			if (v->flags & VM_ALLOC)
+				seq_puts(m, " vmalloc");
 
-	if (v->flags & VM_MAP)
-		seq_puts(m, " vmap");
+			if (v->flags & VM_MAP)
+				seq_puts(m, " vmap");
 
-	if (v->flags & VM_USERMAP)
-		seq_puts(m, " user");
+			if (v->flags & VM_USERMAP)
+				seq_puts(m, " user");
 
-	if (v->flags & VM_DMA_COHERENT)
-		seq_puts(m, " dma-coherent");
+			if (v->flags & VM_DMA_COHERENT)
+				seq_puts(m, " dma-coherent");
 
-	if (is_vmalloc_addr(v->pages))
-		seq_puts(m, " vpages");
+			if (is_vmalloc_addr(v->pages))
+				seq_puts(m, " vpages");
 
-	show_numa_info(m, v);
-	seq_putc(m, '\n');
+			show_numa_info(m, v);
+			seq_putc(m, '\n');
+		}
+		fbl_unlock(z, BUSY);
+	}
 
 	/*
 	 * As a final step, dump "unpurged" areas.
 	 */
-final:
-	if (list_is_last(&va->list, &fbl_head((&per_cpu(cpu_vmap_zone, 0)), BUSY)))
-		show_purge_info(m);
+	show_purge_info(m);
 
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 7/9] mm: vmalloc: Insert lazy-VA per-cpu zone
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (5 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 6/9] mm: vmalloc: Support multiple zones in vmallocinfo Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-22 11:08 ` [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock Uladzislau Rezki (Sony)
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Similar to busy VAs, lazy ones are stored per a CPU zone
also. Freed address is converted into a correct zone it
belongs to and resides there for further handling.

Such approach does not require to have any global locking
primitive, instead an access becomes scalable to number of
CPUs.

This patch removes a global purge-lock, global purge-tree
and list.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 127 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 71 insertions(+), 56 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd83deb5ef4f..fe993c0561dd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -734,10 +734,6 @@ static DEFINE_SPINLOCK(free_vmap_area_lock);
 LIST_HEAD(vmap_area_list);
 static bool vmap_initialized __read_mostly;
 
-static struct rb_root purge_vmap_area_root = RB_ROOT;
-static LIST_HEAD(purge_vmap_area_list);
-static DEFINE_SPINLOCK(purge_vmap_area_lock);
-
 /*
  * This kmem_cache is used for vmap_area objects. Instead of
  * allocating from slab we reuse an object from this cache to
@@ -1792,39 +1788,17 @@ static DEFINE_MUTEX(vmap_purge_lock);
 /* for per-CPU blocks */
 static void purge_fragmented_blocks_allcpus(void);
 
-/*
- * Purges all lazily-freed vmap areas.
- */
-static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
+static unsigned long
+purge_cpu_vmap_zone(struct cpu_vmap_zone *z)
 {
-	unsigned long resched_threshold;
-	unsigned int num_purged_areas = 0;
-	struct list_head local_purge_list;
+	unsigned long num_purged_areas = 0;
 	struct vmap_area *va, *n_va;
 
-	lockdep_assert_held(&vmap_purge_lock);
-
-	spin_lock(&purge_vmap_area_lock);
-	purge_vmap_area_root = RB_ROOT;
-	list_replace_init(&purge_vmap_area_list, &local_purge_list);
-	spin_unlock(&purge_vmap_area_lock);
-
-	if (unlikely(list_empty(&local_purge_list)))
+	if (list_empty(&z->purge_list))
 		goto out;
 
-	start = min(start,
-		list_first_entry(&local_purge_list,
-			struct vmap_area, list)->va_start);
-
-	end = max(end,
-		list_last_entry(&local_purge_list,
-			struct vmap_area, list)->va_end);
-
-	flush_tlb_kernel_range(start, end);
-	resched_threshold = lazy_max_pages() << 1;
-
 	spin_lock(&free_vmap_area_lock);
-	list_for_each_entry_safe(va, n_va, &local_purge_list, list) {
+	list_for_each_entry_safe(va, n_va, &z->purge_list, list) {
 		unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
 		unsigned long orig_start = va->va_start;
 		unsigned long orig_end = va->va_end;
@@ -1846,13 +1820,57 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 
 		atomic_long_sub(nr, &vmap_lazy_nr);
 		num_purged_areas++;
-
-		if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
-			cond_resched_lock(&free_vmap_area_lock);
 	}
 	spin_unlock(&free_vmap_area_lock);
 
 out:
+	return num_purged_areas;
+}
+
+/*
+ * Purges all lazily-freed vmap areas.
+ */
+static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
+{
+	unsigned long num_purged_areas = 0;
+	struct cpu_vmap_zone *z;
+	int need_purge = 0;
+	int i;
+
+	lockdep_assert_held(&vmap_purge_lock);
+
+	for_each_possible_cpu(i) {
+		z = per_cpu_ptr(&cpu_vmap_zone, i);
+		INIT_LIST_HEAD(&z->purge_list);
+
+		if (RB_EMPTY_ROOT(&fbl_root(z, LAZY)))
+			continue;
+
+		fbl_lock(z, LAZY);
+		WRITE_ONCE(fbl(z, LAZY, root.rb_node), NULL);
+		list_replace_init(&fbl_head(z, LAZY), &z->purge_list);
+		fbl_unlock(z, LAZY);
+
+		start = min(start,
+			list_first_entry(&z->purge_list,
+				struct vmap_area, list)->va_start);
+
+		end = max(end,
+			list_last_entry(&z->purge_list,
+				struct vmap_area, list)->va_end);
+
+		need_purge++;
+	}
+
+	if (need_purge) {
+		flush_tlb_kernel_range(start, end);
+
+		for_each_possible_cpu(i) {
+			z = per_cpu_ptr(&cpu_vmap_zone, i);
+			num_purged_areas += purge_cpu_vmap_zone(z);
+		}
+	}
+
 	trace_purge_vmap_area_lazy(start, end, num_purged_areas);
 	return num_purged_areas > 0;
 }
@@ -1870,16 +1888,9 @@ static void purge_vmap_area_lazy(void)
 
 static void drain_vmap_area_work(struct work_struct *work)
 {
-	unsigned long nr_lazy;
-
-	do {
-		mutex_lock(&vmap_purge_lock);
-		__purge_vmap_area_lazy(ULONG_MAX, 0);
-		mutex_unlock(&vmap_purge_lock);
-
-		/* Recheck if further work is required. */
-		nr_lazy = atomic_long_read(&vmap_lazy_nr);
-	} while (nr_lazy > lazy_max_pages());
+	mutex_lock(&vmap_purge_lock);
+	__purge_vmap_area_lazy(ULONG_MAX, 0);
+	mutex_unlock(&vmap_purge_lock);
 }
 
 /*
@@ -1889,6 +1900,7 @@ static void drain_vmap_area_work(struct work_struct *work)
  */
 static void free_vmap_area_noflush(struct vmap_area *va)
 {
+	struct cpu_vmap_zone *z = addr_to_cvz(va->va_start);
 	unsigned long nr_lazy_max = lazy_max_pages();
 	unsigned long va_start = va->va_start;
 	unsigned long nr_lazy;
@@ -1902,10 +1914,9 @@ static void free_vmap_area_noflush(struct vmap_area *va)
 	/*
 	 * Merge or place it to the purge tree/list.
 	 */
-	spin_lock(&purge_vmap_area_lock);
-	merge_or_add_vmap_area(va,
-		&purge_vmap_area_root, &purge_vmap_area_list);
-	spin_unlock(&purge_vmap_area_lock);
+	fbl_lock(z, LAZY);
+	merge_or_add_vmap_area(va, &fbl_root(z, LAZY), &fbl_head(z, LAZY));
+	fbl_unlock(z, LAZY);
 
 	trace_free_vmap_area_noflush(va_start, nr_lazy, nr_lazy_max);
 
@@ -4199,17 +4210,21 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v)
 
 static void show_purge_info(struct seq_file *m)
 {
+	struct cpu_vmap_zone *z;
 	struct vmap_area *va;
+	int i;
 
-	mutex_lock(&vmap_purge_lock);
-	spin_lock(&purge_vmap_area_lock);
-	list_for_each_entry(va, &purge_vmap_area_list, list) {
-		seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n",
-			(void *)va->va_start, (void *)va->va_end,
-			va->va_end - va->va_start);
+	for_each_possible_cpu(i) {
+		z = per_cpu_ptr(&cpu_vmap_zone, i);
+
+		fbl_lock(z, LAZY);
+		list_for_each_entry(va, &fbl_head(z, LAZY), list) {
+			seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n",
+				(void *)va->va_start, (void *)va->va_end,
+				va->va_end - va->va_start);
+		}
+		fbl_unlock(z, LAZY);
 	}
-	spin_unlock(&purge_vmap_area_lock);
-	mutex_unlock(&vmap_purge_lock);
 }
 
 static int s_show(struct seq_file *m, void *p)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (6 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 7/9] mm: vmalloc: Insert lazy-VA per-cpu zone Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-06-05  0:43   ` Baoquan He
  2023-05-22 11:08 ` [PATCH 9/9] mm: vmalloc: Scale and activate cvz_size Uladzislau Rezki (Sony)
  2023-05-23 11:59 ` [PATCH 0/9] Mitigate a vmap lock contention Hyeonggon Yoo
  9 siblings, 1 reply; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce a fast path of allocation sequence, that consists
of per-cpu path and fallback mechanism which is used when a
request can not be accomplished by fast track.

A fast track pre-loads a chunk from a global vmap heap directly
into its per-cpu zone, following by clipping the chunk based on
allocation request.

This technique allows to offload a global free_vmap_area_lock
making an allocation path to be serialized to number of CPUs
in a system.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 123 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index fe993c0561dd..8054b8bf6c18 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1642,6 +1642,93 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
 		kmem_cache_free(vmap_area_cachep, va);
 }
 
+static unsigned long
+this_cpu_zone_alloc_fill(struct cpu_vmap_zone *z,
+	unsigned long size, unsigned long align,
+	gfp_t gfp_mask, int node)
+{
+	unsigned long addr = VMALLOC_END;
+	struct vmap_area *va;
+
+	/*
+	 * It still can race. One task sets a progress to
+	 * 1 a second one gets preempted on entry, the first
+	 * zeroed the progress flag and second proceed with
+	 * an extra prefetch.
+	 */
+	if (atomic_xchg(&z->fill_in_progress, 1))
+		return addr;
+
+	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
+	if (unlikely(!va))
+		goto out;
+
+	spin_lock(&free_vmap_area_lock);
+	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
+		cvz_size, 1, VMALLOC_START, VMALLOC_END);
+	spin_unlock(&free_vmap_area_lock);
+
+	if (addr == VMALLOC_END) {
+		kmem_cache_free(vmap_area_cachep, va);
+		goto out;
+	}
+
+	va->va_start = addr;
+	va->va_end = addr + cvz_size;
+
+	fbl_lock(z, FREE);
+	va = merge_or_add_vmap_area_augment(va,
+		&fbl_root(z, FREE), &fbl_head(z, FREE));
+	addr = va_alloc(va, &fbl_root(z, FREE), &fbl_head(z, FREE),
+		size, align, VMALLOC_START, VMALLOC_END);
+	fbl_unlock(z, FREE);
+
+out:
+	atomic_set(&z->fill_in_progress, 0);
+	return addr;
+}
+
+static unsigned long
+this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
+{
+	struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
+	unsigned long extra = align > PAGE_SIZE ? align : 0;
+	unsigned long addr = VMALLOC_END, left = 0;
+
+	/*
+	 * It is disabled, fallback to a global heap.
+	 */
+	if (cvz_size == ULONG_MAX)
+		return addr;
+
+	/*
+	 * Any allocation bigger/equal than one half of
+	 * a zone-size will fallback to a global heap.
+	 */
+	if (cvz_size / (size + extra) < 3)
+		return addr;
+
+	if (RB_EMPTY_ROOT(&fbl_root(z, FREE)))
+		goto fill;
+
+	fbl_lock(z, FREE);
+	addr = __alloc_vmap_area(&fbl_root(z, FREE), &fbl_head(z, FREE),
+		size, align, VMALLOC_START, VMALLOC_END);
+
+	if (addr == VMALLOC_END)
+		left = get_subtree_max_size(fbl_root(z, FREE).rb_node);
+	fbl_unlock(z, FREE);
+
+fill:
+	/*
+	 * A low watermark is 3 pages.
+	 */
+	if (addr == VMALLOC_END && left < 4 * PAGE_SIZE)
+		addr = this_cpu_zone_alloc_fill(z, size, align, gfp_mask, node);
+
+	return addr;
+}
+
 /*
  * Allocate a region of KVA of the specified size and alignment, within the
  * vstart and vend.
@@ -1678,11 +1765,21 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	 */
 	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask);
 
+	/*
+	 * Fast path allocation, start with it.
+	 */
+	if (vstart == VMALLOC_START && vend == VMALLOC_END)
+		addr = this_cpu_zone_alloc(size, align, gfp_mask, node);
+	else
+		addr = vend;
+
 retry:
-	preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node);
-	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
-		size, align, vstart, vend);
-	spin_unlock(&free_vmap_area_lock);
+	if (addr == vend) {
+		preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node);
+		addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
+			size, align, vstart, vend);
+		spin_unlock(&free_vmap_area_lock);
+	}
 
 	trace_alloc_vmap_area(addr, size, align, vstart, vend, addr == vend);
 
@@ -1827,6 +1924,27 @@ purge_cpu_vmap_zone(struct cpu_vmap_zone *z)
 	return num_purged_areas;
 }
 
+static void
+drop_cpu_vmap_cache(struct cpu_vmap_zone *z)
+{
+	struct vmap_area *va, *n_va;
+	LIST_HEAD(free_head);
+
+	if (RB_EMPTY_ROOT(&fbl_root(z, FREE)))
+		return;
+
+	fbl_lock(z, FREE);
+	WRITE_ONCE(fbl(z, FREE, root.rb_node), NULL);
+	list_replace_init(&fbl_head(z, FREE), &free_head);
+	fbl_unlock(z, FREE);
+
+	spin_lock(&free_vmap_area_lock);
+	list_for_each_entry_safe(va, n_va, &free_head, list)
+		merge_or_add_vmap_area_augment(va,
+			&free_vmap_area_root, &free_vmap_area_list);
+	spin_unlock(&free_vmap_area_lock);
+}
+
 /*
  * Purges all lazily-freed vmap areas.
  */
@@ -1868,6 +1986,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 		for_each_possible_cpu(i) {
 			z = per_cpu_ptr(&cpu_vmap_zone, i);
 			num_purged_areas += purge_cpu_vmap_zone(z);
+			drop_cpu_vmap_cache(z);
 		}
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 9/9] mm: vmalloc: Scale and activate cvz_size
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (7 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock Uladzislau Rezki (Sony)
@ 2023-05-22 11:08 ` Uladzislau Rezki (Sony)
  2023-05-23 11:59 ` [PATCH 0/9] Mitigate a vmap lock contention Hyeonggon Yoo
  9 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki (Sony) @ 2023-05-22 11:08 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Uladzislau Rezki,
	Oleksiy Avramchenko

Scale a zone size to number of CPUs. Each power of two
value adds extra 100 * PAGE_SIZE bytes to a zone size.

A maximum zone size limit is set to 4M, it is reached
if number of CPUs=1024 and PAGE_SIZE=4096 bytes.

For 32-bit or single core systems cvz_size is set to
ULONG_MAX. It means that a pre-loading technique is
deactivated.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8054b8bf6c18..a5e6956a8da4 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -806,8 +806,6 @@ struct cpu_vmap_zone {
 };
 
 static DEFINE_PER_CPU(struct cpu_vmap_zone, cpu_vmap_zone);
-
-/* Disable a per-cpu caching. */
 static __read_mostly unsigned long cvz_size = ULONG_MAX;
 
 static inline unsigned int
@@ -4490,6 +4488,19 @@ static void vmap_init_pcpu_zones(void)
 			spin_lock_init(&z->fbl[j].lock);
 		}
 	}
+
+	/*
+	 * Scale a zone size to number of CPUs. Each power of two
+	 * value adds extra 100 pages to a zone size. A high-threshold
+	 * limit is set to 4M. It can be reached if number of CPUs=1024
+	 * whereas a PAGE_SIZE is 4096 bytes.
+	 */
+#if BITS_PER_LONG == 64
+	if (i > 1) {
+		cvz_size = fls(i) * (100 * PAGE_SIZE);
+		cvz_size = min(cvz_size, (unsigned long) SZ_4M);
+	}
+#endif
 }
 
 void __init vmalloc_init(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/9] mm: vmalloc: Add va_alloc() helper
  2023-05-22 11:08 ` [PATCH 1/9] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
@ 2023-05-23  6:05   ` Christoph Hellwig
  2023-05-23  9:57     ` Uladzislau Rezki
  2023-05-27 19:55   ` Lorenzo Stoakes
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-23  6:05 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:41PM +0200, Uladzislau Rezki (Sony) wrote:
> Currently __alloc_vmap_area() function contains an open codded
> logic that finds and adjusts a VA based on allocation request.
> 
> Introduce a va_alloc() helper that adjusts found VA only. It
> will be used later at least in two places.
> 
> There is no a functional change as a result of this patch.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 41 ++++++++++++++++++++++++++++-------------
>  1 file changed, 28 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..409285b68a67 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1482,6 +1482,32 @@ adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
>  	return 0;
>  }
>  
> +static unsigned long
> +va_alloc(struct vmap_area *va,
> +	struct rb_root *root, struct list_head *head,
> +	unsigned long size, unsigned long align,
> +	unsigned long vstart, unsigned long vend)

Prototype continuations don't use a single tab indent.  Either two
tabs (my preference) or after the opening brace.  I.e.:

static unsigned long
va_alloc(struct vmap_area *va, struct rb_root *root, struct list_head *head,
		unsigned long size, unsigned long align, unsigned long vstart,
		unsigned long vend)


Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-22 11:08 ` [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
@ 2023-05-23  6:06   ` Christoph Hellwig
  2023-05-23 10:01     ` Uladzislau Rezki
  2023-05-23 17:24   ` Liam R. Howlett
  2023-05-27 21:50   ` Lorenzo Stoakes
  2 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-23  6:06 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:42PM +0200, Uladzislau Rezki (Sony) wrote:
>  static __always_inline int
> -adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
> -		      struct vmap_area *va, unsigned long nva_start_addr,
> -		      unsigned long size)
> +va_clip(struct rb_root *root, struct list_head *head,
> +	struct vmap_area *va, unsigned long nva_start_addr,
> +	unsigned long size)

Same comment as before, not going to repeat it for futher reviews.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c
  2023-05-22 11:08 ` [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
@ 2023-05-23  6:06   ` Christoph Hellwig
  2023-05-27 21:52   ` Lorenzo Stoakes
  1 sibling, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-23  6:06 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure
  2023-05-22 11:08 ` [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure Uladzislau Rezki (Sony)
@ 2023-05-23  6:08   ` Christoph Hellwig
  2023-05-23 14:53     ` Uladzislau Rezki
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-23  6:08 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:44PM +0200, Uladzislau Rezki (Sony) wrote:
> +#define fbl(z, i, m) z->fbl[i].m
> +#define fbl_root(z, i) fbl(z, i, root)
> +#define fbl_head(z, i) fbl(z, i, head)
> +
> +#define fbl_lock(z, i) spin_lock(&fbl(z, i, lock))
> +#define fbl_unlock(z, i) spin_unlock(&fbl(z, i, lock))

Even if it is just temporary, I don't think adding these wrappers
make much sense.

> +struct cpu_vmap_zone {
> +	/*
> +	 * FREE, BUSY, LAZY bookkeeping data of this CPU zone.
> +	 */
> +	struct {
> +		struct rb_root root;
> +		struct list_head head;
> +		spinlock_t lock;
> +	} fbl[NFBL];

Maybe replace NFBL with something longer and more descriptive?

But also in general it feels like this should be folded into a patch
doing real work.  As-is it doesn't look very useful.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone
  2023-05-22 11:08 ` [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone Uladzislau Rezki (Sony)
@ 2023-05-23  6:12   ` Christoph Hellwig
  2023-05-23 15:00     ` Uladzislau Rezki
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-23  6:12 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

>  /* Look up the first VA which satisfies addr < va_end, NULL if none. */
> -static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr)
> +static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root)

Please avoid the overly long line.

> +	struct cpu_vmap_zone *z = addr_to_cvz(va->va_start);
> +
>  	/*
>  	 * Remove from the busy tree/list.
>  	 */
> -	spin_lock(&vmap_area_lock);
> -	unlink_va(va, &vmap_area_root);
> -	spin_unlock(&vmap_area_lock);
> +	fbl_lock(z, BUSY);
> +	unlink_va(va, &fbl_root(z, BUSY));
> +	fbl_unlock(z, BUSY);

I find the BUSY magic here very confusing, and would prefer to
just spell the actual lock reference out.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/9] mm: vmalloc: Add va_alloc() helper
  2023-05-23  6:05   ` Christoph Hellwig
@ 2023-05-23  9:57     ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-23  9:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 11:05:57PM -0700, Christoph Hellwig wrote:
> On Mon, May 22, 2023 at 01:08:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > Currently __alloc_vmap_area() function contains an open codded
> > logic that finds and adjusts a VA based on allocation request.
> > 
> > Introduce a va_alloc() helper that adjusts found VA only. It
> > will be used later at least in two places.
> > 
> > There is no a functional change as a result of this patch.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  mm/vmalloc.c | 41 ++++++++++++++++++++++++++++-------------
> >  1 file changed, 28 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 31ff782d368b..409285b68a67 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1482,6 +1482,32 @@ adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
> >  	return 0;
> >  }
> >  
> > +static unsigned long
> > +va_alloc(struct vmap_area *va,
> > +	struct rb_root *root, struct list_head *head,
> > +	unsigned long size, unsigned long align,
> > +	unsigned long vstart, unsigned long vend)
> 
> Prototype continuations don't use a single tab indent.  Either two
> tabs (my preference) or after the opening brace.  I.e.:
> 
Will go with two tabs.

> static unsigned long
> va_alloc(struct vmap_area *va, struct rb_root *root, struct list_head *head,
> 		unsigned long size, unsigned long align, unsigned long vstart,
> 		unsigned long vend)
> 
> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
Added.

Appreciate you look at it!

--
Uladizislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-23  6:06   ` Christoph Hellwig
@ 2023-05-23 10:01     ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 10:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 11:06:31PM -0700, Christoph Hellwig wrote:
> On Mon, May 22, 2023 at 01:08:42PM +0200, Uladzislau Rezki (Sony) wrote:
> >  static __always_inline int
> > -adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
> > -		      struct vmap_area *va, unsigned long nva_start_addr,
> > -		      unsigned long size)
> > +va_clip(struct rb_root *root, struct list_head *head,
> > +	struct vmap_area *va, unsigned long nva_start_addr,
> > +	unsigned long size)
> 
> Same comment as before, not going to repeat it for futher reviews.
> 
Fixed!


> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
                   ` (8 preceding siblings ...)
  2023-05-22 11:08 ` [PATCH 9/9] mm: vmalloc: Scale and activate cvz_size Uladzislau Rezki (Sony)
@ 2023-05-23 11:59 ` Hyeonggon Yoo
  2023-05-23 15:12   ` Uladzislau Rezki
  9 siblings, 1 reply; 40+ messages in thread
From: Hyeonggon Yoo @ 2023-05-23 11:59 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:40PM +0200, Uladzislau Rezki (Sony) wrote:
> Hello, folk.
> 
> 1. This is a followup of the vmap topic that was highlighted at the LSFMMBPF-2023
> conference. This small serial attempts to mitigate the contention across the
> vmap/vmalloc code. The problem is described here:
> 

Hello Uladzislau, thank you for the work!

> wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf

I ran the exactly same command but couldn't download the file, did I
miss something?

$ wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf
[...]
==> PASV ... done.    ==> RETR Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf ... 
No such file `Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf'.

> The material is tagged as a v2 version. It contains extra slides about testing
> the throughput, steps and comparison with a current approach.
> 
> 2. Motivation.
> 
> - The vmap code is not scalled to number of CPUs and this should be fixed;
> - XFS folk has complained several times that vmalloc might be contented on
>   their workloads:
> 
> <snip>
> commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> Author: Dave Chinner <dchinner@redhat.com>
> Date:   Tue Jan 4 17:22:18 2022 -0800
> 
>     xfs: reduce kvmalloc overhead for CIL shadow buffers
>     
>     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
>     
>     The problem is when we are logging lots of large objects, we hit
>     kvmalloc really damn hard with costly order allocations, and
>     behaviour utterly sucks:

based on the commit I guess xfs should use vmalloc/kvmalloc is because
it allocates large buffers, how large could it be?

> 3. Test
> 
> On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> 
>     1-page     1-page-this-patch
> 1  0.576131   vs   0.555889
> 2   2.68376   vs    1.07895
> 3   4.26502   vs    1.01739
> 4   6.04306   vs    1.28924
> 5   8.04786   vs    1.57616
> 6   9.38844   vs    1.78142

<snip>

> 29    20.06   vs    3.59869
> 30  20.4353   vs     3.6991
> 31  20.9082   vs    3.73028
> 32  21.0865   vs    3.82904
> 
> 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> pair throughput.

I would be more interested in real numbers than synthetic benchmarks,
Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
with and without this patchset?

By the way looking at the commit, teaching __p?d_alloc() about gfp
context (that I'm _slowly_ working on...) could be nice for allowing
non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1]

Thanks!

[1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@casper.infradead.org

-- 
Hyeonggon Yoo

Doing kernel stuff as a hobby
Undergraduate | Chungnam National University
Dept. Computer Science & Engineering

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure
  2023-05-23  6:08   ` Christoph Hellwig
@ 2023-05-23 14:53     ` Uladzislau Rezki
  2023-05-23 15:13       ` Christoph Hellwig
  0 siblings, 1 reply; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 14:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 11:08:38PM -0700, Christoph Hellwig wrote:
> On Mon, May 22, 2023 at 01:08:44PM +0200, Uladzislau Rezki (Sony) wrote:
> > +#define fbl(z, i, m) z->fbl[i].m
> > +#define fbl_root(z, i) fbl(z, i, root)
> > +#define fbl_head(z, i) fbl(z, i, head)
> > +
> > +#define fbl_lock(z, i) spin_lock(&fbl(z, i, lock))
> > +#define fbl_unlock(z, i) spin_unlock(&fbl(z, i, lock))
> 
> Even if it is just temporary, I don't think adding these wrappers
> make much sense.
> 
If open-coded, it looks like:

spin_lock(&z->fbl[BUSY].lock);
fbl_lock(z, BUSY);

the reason of adding such helpers is to make the name shorter.

> > +struct cpu_vmap_zone {
> > +	/*
> > +	 * FREE, BUSY, LAZY bookkeeping data of this CPU zone.
> > +	 */
> > +	struct {
> > +		struct rb_root root;
> > +		struct list_head head;
> > +		spinlock_t lock;
> > +	} fbl[NFBL];
> 
> Maybe replace NFBL with something longer and more descriptive?
> 
> But also in general it feels like this should be folded into a patch
> doing real work.  As-is it doesn't look very useful.
>
I split it for better understanding for review. But i can fold it.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone
  2023-05-23  6:12   ` Christoph Hellwig
@ 2023-05-23 15:00     ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 15:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 11:12:08PM -0700, Christoph Hellwig wrote:
> >  /* Look up the first VA which satisfies addr < va_end, NULL if none. */
> > -static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr)
> > +static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root)
> 
> Please avoid the overly long line.
> 
Will fix it.

> > +	struct cpu_vmap_zone *z = addr_to_cvz(va->va_start);
> > +
> >  	/*
> >  	 * Remove from the busy tree/list.
> >  	 */
> > -	spin_lock(&vmap_area_lock);
> > -	unlink_va(va, &vmap_area_root);
> > -	spin_unlock(&vmap_area_lock);
> > +	fbl_lock(z, BUSY);
> > +	unlink_va(va, &fbl_root(z, BUSY));
> > +	fbl_unlock(z, BUSY);
> 
> I find the BUSY magic here very confusing, and would prefer to
> just spell the actual lock reference out.
>
No problem. I can make it open-coded. What is about an access to tree/list?

>> unlink_va(va, &fbl_root(z, BUSY));
I mean this one.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-23 11:59 ` [PATCH 0/9] Mitigate a vmap lock contention Hyeonggon Yoo
@ 2023-05-23 15:12   ` Uladzislau Rezki
  2023-05-23 18:04     ` Hyeonggon Yoo
  0 siblings, 1 reply; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 15:12 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Tue, May 23, 2023 at 08:59:05PM +0900, Hyeonggon Yoo wrote:
> On Mon, May 22, 2023 at 01:08:40PM +0200, Uladzislau Rezki (Sony) wrote:
> > Hello, folk.
> > 
> > 1. This is a followup of the vmap topic that was highlighted at the LSFMMBPF-2023
> > conference. This small serial attempts to mitigate the contention across the
> > vmap/vmalloc code. The problem is described here:
> > 
> 
> Hello Uladzislau, thank you for the work!
> 
> > wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf
> 
> I ran the exactly same command but couldn't download the file, did I
> miss something?
> 
wget ftp://vps418301.ovh.net/incoming/Mitigate_a_vmalloc_lock_contention_in_SMP_env_v2.pdf

Sorry, i renamed the file name :)

> $ wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf
> [...]
> ==> PASV ... done.    ==> RETR Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf ... 
> No such file `Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf'.
> 
> > The material is tagged as a v2 version. It contains extra slides about testing
> > the throughput, steps and comparison with a current approach.
> > 
> > 2. Motivation.
> > 
> > - The vmap code is not scalled to number of CPUs and this should be fixed;
> > - XFS folk has complained several times that vmalloc might be contented on
> >   their workloads:
> > 
> > <snip>
> > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> > Author: Dave Chinner <dchinner@redhat.com>
> > Date:   Tue Jan 4 17:22:18 2022 -0800
> > 
> >     xfs: reduce kvmalloc overhead for CIL shadow buffers
> >     
> >     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
> >     
> >     The problem is when we are logging lots of large objects, we hit
> >     kvmalloc really damn hard with costly order allocations, and
> >     behaviour utterly sucks:
> 
> based on the commit I guess xfs should use vmalloc/kvmalloc is because
> it allocates large buffers, how large could it be?
> 
They use kvmalloc(). When the page allocator is not able to serve a
request they fallback to vmalloc. At least what i see, the sizes are:

from 73728 up to 1048576, i.e. 18 pages up to 256 pages.

> > 3. Test
> > 
> > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> > 
> >     1-page     1-page-this-patch
> > 1  0.576131   vs   0.555889
> > 2   2.68376   vs    1.07895
> > 3   4.26502   vs    1.01739
> > 4   6.04306   vs    1.28924
> > 5   8.04786   vs    1.57616
> > 6   9.38844   vs    1.78142
> 
> <snip>
> 
> > 29    20.06   vs    3.59869
> > 30  20.4353   vs     3.6991
> > 31  20.9082   vs    3.73028
> > 32  21.0865   vs    3.82904
> > 
> > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> > pair throughput.
> 
> I would be more interested in real numbers than synthetic benchmarks,
> Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
> with and without this patchset?
> 
I added Dave Chinner <david@fromorbit.com> to this thread. But. The
contention exists. Apart of that per-cpu-KVA allocator can go away
if we make it generic instead.

> By the way looking at the commit, teaching __p?d_alloc() about gfp
> context (that I'm _slowly_ working on...) could be nice for allowing
> non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1]
> 
> Thanks!
> 
> [1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@casper.infradead.org
> 

Thanks!

--
Uladzisdlau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure
  2023-05-23 14:53     ` Uladzislau Rezki
@ 2023-05-23 15:13       ` Christoph Hellwig
  2023-05-23 15:32         ` Uladzislau Rezki
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-23 15:13 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, linux-mm, Andrew Morton, LKML, Baoquan He,
	Lorenzo Stoakes, Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Tue, May 23, 2023 at 04:53:25PM +0200, Uladzislau Rezki wrote:
> > > +#define fbl_lock(z, i) spin_lock(&fbl(z, i, lock))
> > > +#define fbl_unlock(z, i) spin_unlock(&fbl(z, i, lock))
> > 
> > Even if it is just temporary, I don't think adding these wrappers
> > make much sense.
> > 
> If open-coded, it looks like:
> 
> spin_lock(&z->fbl[BUSY].lock);

Give the fbl structure a name and you can have a local variable for it,
which will make all this a lot more readable.  And then unless there is
a really good reason to iterate over this as an array just have three
of these structs embedded named free, busy and lazy.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure
  2023-05-23 15:13       ` Christoph Hellwig
@ 2023-05-23 15:32         ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 15:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Baoquan He,
	Lorenzo Stoakes, Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Tue, May 23, 2023 at 08:13:47AM -0700, Christoph Hellwig wrote:
> On Tue, May 23, 2023 at 04:53:25PM +0200, Uladzislau Rezki wrote:
> > > > +#define fbl_lock(z, i) spin_lock(&fbl(z, i, lock))
> > > > +#define fbl_unlock(z, i) spin_unlock(&fbl(z, i, lock))
> > > 
> > > Even if it is just temporary, I don't think adding these wrappers
> > > make much sense.
> > > 
> > If open-coded, it looks like:
> > 
> > spin_lock(&z->fbl[BUSY].lock);
> 
> Give the fbl structure a name and you can have a local variable for it,
> which will make all this a lot more readable.  And then unless there is
> a really good reason to iterate over this as an array just have three
> of these structs embedded named free, busy and lazy.
>
OK. I can go that way.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-22 11:08 ` [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
  2023-05-23  6:06   ` Christoph Hellwig
@ 2023-05-23 17:24   ` Liam R. Howlett
  2023-05-24 11:51     ` Uladzislau Rezki
  2023-05-27 21:50   ` Lorenzo Stoakes
  2 siblings, 1 reply; 40+ messages in thread
From: Liam R. Howlett @ 2023-05-23 17:24 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

* Uladzislau Rezki (Sony) <urezki@gmail.com> [230522 07:09]:
> This patch renames the adjust_va_to_fit_type() function
> to a shorter variant and more expressive. A new name is
> va_clip().
> 
> There is no a functional change as a result of this patch.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 409285b68a67..5f900efec6a9 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1383,9 +1383,9 @@ classify_va_fit_type(struct vmap_area *va,
>  }
>  
>  static __always_inline int
> -adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
> -		      struct vmap_area *va, unsigned long nva_start_addr,
> -		      unsigned long size)
> +va_clip(struct rb_root *root, struct list_head *head,
> +	struct vmap_area *va, unsigned long nva_start_addr,
> +	unsigned long size)
>  {
>  	struct vmap_area *lva = NULL;
>  	enum fit_type type = classify_va_fit_type(va, nva_start_addr, size);
> @@ -1501,7 +1501,7 @@ va_alloc(struct vmap_area *va,
>  		return vend;
>  
>  	/* Update the free vmap_area. */
> -	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
> +	ret = va_clip(root, head, va, nva_start_addr, size);
>  	if (WARN_ON_ONCE(ret))

ret is only used in this WARN_ON_ONCE() check (from patch 1), since you
have shortened adjust_va_to_fit_type(), you can probably drop the ret
variable all together here?

>  		return vend;
>  
> @@ -3979,9 +3979,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  			/* It is a BUG(), but trigger recovery instead. */
>  			goto recovery;
>  
> -		ret = adjust_va_to_fit_type(&free_vmap_area_root,
> -					    &free_vmap_area_list,
> -					    va, start, size);
> +		ret = va_clip(&free_vmap_area_root,
> +			&free_vmap_area_list, va, start, size);
>  		if (WARN_ON_ONCE(unlikely(ret)))
>  			/* It is a BUG(), but trigger recovery instead. */
>  			goto recovery;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-23 15:12   ` Uladzislau Rezki
@ 2023-05-23 18:04     ` Hyeonggon Yoo
  2023-05-23 21:43       ` Dave Chinner
  2023-05-24  9:50       ` Uladzislau Rezki
  0 siblings, 2 replies; 40+ messages in thread
From: Hyeonggon Yoo @ 2023-05-23 18:04 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko, linux-xfs

On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote:
> > > 2. Motivation.
> > > 
> > > - The vmap code is not scalled to number of CPUs and this should be fixed;
> > > - XFS folk has complained several times that vmalloc might be contented on
> > >   their workloads:
> > > 
> > > <snip>
> > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> > > Author: Dave Chinner <dchinner@redhat.com>
> > > Date:   Tue Jan 4 17:22:18 2022 -0800
> > > 
> > >     xfs: reduce kvmalloc overhead for CIL shadow buffers
> > >     
> > >     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
> > >     
> > >     The problem is when we are logging lots of large objects, we hit
> > >     kvmalloc really damn hard with costly order allocations, and
> > >     behaviour utterly sucks:
> > 
> > based on the commit I guess xfs should use vmalloc/kvmalloc is because
> > it allocates large buffers, how large could it be?
> > 
> They use kvmalloc(). When the page allocator is not able to serve a
> request they fallback to vmalloc. At least what i see, the sizes are:
> 
> from 73728 up to 1048576, i.e. 18 pages up to 256 pages.
> 
> > > 3. Test
> > > 
> > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> > > 
> > >     1-page     1-page-this-patch
> > > 1  0.576131   vs   0.555889
> > > 2   2.68376   vs    1.07895
> > > 3   4.26502   vs    1.01739
> > > 4   6.04306   vs    1.28924
> > > 5   8.04786   vs    1.57616
> > > 6   9.38844   vs    1.78142
> > 
> > <snip>
> > 
> > > 29    20.06   vs    3.59869
> > > 30  20.4353   vs     3.6991
> > > 31  20.9082   vs    3.73028
> > > 32  21.0865   vs    3.82904
> > > 
> > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> > > pair throughput.
> > 
> > I would be more interested in real numbers than synthetic benchmarks,
> > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
> > with and without this patchset?
> > 
> I added Dave Chinner <david@fromorbit.com> to this thread.

Oh, I missed that, and it would be better to [+Cc linux-xfs]

> But. The contention exists.

I think "theoretically can be contended" doesn't necessarily mean it's actually
contended in the real world.

Also I find it difficult to imagine vmalloc being highly contended because it was
historically considered slow and thus discouraged when performance is important.

IOW vmalloc would not be contended when allocation size is small because we have
kmalloc/buddy API, and therefore I wonder which workloads are allocating very large
buffers and at the same time allocating very frequently, thus performance-sensitive.

I am not against this series, but wondering which workloads would benefit ;)

> Apart of that per-cpu-KVA allocator can go away if we make it generic instead.

Not sure I understand your point, can you elaborate please?

And I would like to ask some side questions:

1. Is vm_[un]map_ram() API still worth with this patchset?

2. How does this patchset deals with 32-bit machines where
   vmalloc address space is limited?

Thanks!

> > By the way looking at the commit, teaching __p?d_alloc() about gfp
> > context (that I'm _slowly_ working on...) could be nice for allowing
> > non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1]
> > 
> > Thanks!
> > 
> > [1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@casper.infradead.org
> > 

-- 
Hyeonggon Yoo

Doing kernel stuff as a hobby
Undergraduate | Chungnam National University
Dept. Computer Science & Engineering

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-23 18:04     ` Hyeonggon Yoo
@ 2023-05-23 21:43       ` Dave Chinner
  2023-05-24  1:30         ` Hyeonggon Yoo
  2023-05-24  9:50       ` Uladzislau Rezki
  1 sibling, 1 reply; 40+ messages in thread
From: Dave Chinner @ 2023-05-23 21:43 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Baoquan He,
	Lorenzo Stoakes, Christoph Hellwig, Matthew Wilcox,
	Liam R . Howlett, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko, linux-xfs

On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote:
> On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote:
> > > > 2. Motivation.
> > > > 
> > > > - The vmap code is not scalled to number of CPUs and this should be fixed;
> > > > - XFS folk has complained several times that vmalloc might be contented on
> > > >   their workloads:
> > > > 
> > > > <snip>
> > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> > > > Author: Dave Chinner <dchinner@redhat.com>
> > > > Date:   Tue Jan 4 17:22:18 2022 -0800
> > > > 
> > > >     xfs: reduce kvmalloc overhead for CIL shadow buffers
> > > >     
> > > >     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
> > > >     
> > > >     The problem is when we are logging lots of large objects, we hit
> > > >     kvmalloc really damn hard with costly order allocations, and
> > > >     behaviour utterly sucks:
> > > 
> > > based on the commit I guess xfs should use vmalloc/kvmalloc is because
> > > it allocates large buffers, how large could it be?
> > > 
> > They use kvmalloc(). When the page allocator is not able to serve a
> > request they fallback to vmalloc. At least what i see, the sizes are:
> > 
> > from 73728 up to 1048576, i.e. 18 pages up to 256 pages.
> > 
> > > > 3. Test
> > > > 
> > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> > > > 
> > > >     1-page     1-page-this-patch
> > > > 1  0.576131   vs   0.555889
> > > > 2   2.68376   vs    1.07895
> > > > 3   4.26502   vs    1.01739
> > > > 4   6.04306   vs    1.28924
> > > > 5   8.04786   vs    1.57616
> > > > 6   9.38844   vs    1.78142
> > > 
> > > <snip>
> > > 
> > > > 29    20.06   vs    3.59869
> > > > 30  20.4353   vs     3.6991
> > > > 31  20.9082   vs    3.73028
> > > > 32  21.0865   vs    3.82904
> > > > 
> > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> > > > pair throughput.
> > > 
> > > I would be more interested in real numbers than synthetic benchmarks,
> > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
> > > with and without this patchset?
> > > 
> > I added Dave Chinner <david@fromorbit.com> to this thread.
> 
> Oh, I missed that, and it would be better to [+Cc linux-xfs]
> 
> > But. The contention exists.
> 
> I think "theoretically can be contended" doesn't necessarily mean it's actually
> contended in the real world.

Did you not read the commit message for the XFS commit documented
above? vmalloc lock contention most c0ertainly does exist in the
real world and the profiles in commit 8dc9384b7d75  ("xfs: reduce
kvmalloc overhead for CIL shadow buffers") document it clearly.

> Also I find it difficult to imagine vmalloc being highly contended because it was
> historically considered slow and thus discouraged when performance is important.

Read the above XFS commit.

We use vmalloc in critical high performance fast paths that cannot
tolerate high order memory allocation failure. XFS runs this
fast path millions of times a second, and will call into
vmalloc() several hundred thousands times a second with machine wide
concurrency under certain types of workloads.

> IOW vmalloc would not be contended when allocation size is small because we have
> kmalloc/buddy API, and therefore I wonder which workloads are allocating very large
> buffers and at the same time allocating very frequently, thus performance-sensitive.
>
> I am not against this series, but wondering which workloads would benefit ;)

Yup, you need to read the XFS commit message. If you understand what
is in that commit message, then you wouldn't be doubting that
vmalloc contention is real and that it is used in high performance
fast paths that are traversed millions of times a second....

> > Apart of that per-cpu-KVA allocator can go away if we make it generic instead.
> 
> Not sure I understand your point, can you elaborate please?
> 
> And I would like to ask some side questions:
> 
> 1. Is vm_[un]map_ram() API still worth with this patchset?

XFS also uses this interface for mapping multi-page buffers in the
XFS buffer cache. These are the items that also require the high
order costly kvmalloc allocations in the transaction commit path
when they are modified.

So, yes, we need these mapping interfaces to scale just as well as
vmalloc itself....

> 2. How does this patchset deals with 32-bit machines where
>    vmalloc address space is limited?

From the XFS side, we just don't care about 32 bit machines at all.
XFS is aimed at server and HPC environments which have been entirely
64 bit for a long, long time now...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-23 21:43       ` Dave Chinner
@ 2023-05-24  1:30         ` Hyeonggon Yoo
  0 siblings, 0 replies; 40+ messages in thread
From: Hyeonggon Yoo @ 2023-05-24  1:30 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Baoquan He,
	Lorenzo Stoakes, Christoph Hellwig, Matthew Wilcox,
	Liam R . Howlett, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko, linux-xfs

On Wed, May 24, 2023 at 07:43:38AM +1000, Dave Chinner wrote:
> On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote:
> > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote:
> > > > > 2. Motivation.
> > > > > 
> > > > > - The vmap code is not scalled to number of CPUs and this should be fixed;
> > > > > - XFS folk has complained several times that vmalloc might be contented on
> > > > >   their workloads:
> > > > > 
> > > > > <snip>
> > > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> > > > > Author: Dave Chinner <dchinner@redhat.com>
> > > > > Date:   Tue Jan 4 17:22:18 2022 -0800
> > > > > 
> > > > >     xfs: reduce kvmalloc overhead for CIL shadow buffers
> > > > >     
> > > > >     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
> > > > >     
> > > > >     The problem is when we are logging lots of large objects, we hit
> > > > >     kvmalloc really damn hard with costly order allocations, and
> > > > >     behaviour utterly sucks:
> > > > 
> > > > based on the commit I guess xfs should use vmalloc/kvmalloc is because
> > > > it allocates large buffers, how large could it be?
> > > > 
> > > They use kvmalloc(). When the page allocator is not able to serve a
> > > request they fallback to vmalloc. At least what i see, the sizes are:
> > > 
> > > from 73728 up to 1048576, i.e. 18 pages up to 256 pages.
> > > 
> > > > > 3. Test
> > > > > 
> > > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> > > > > 
> > > > >     1-page     1-page-this-patch
> > > > > 1  0.576131   vs   0.555889
> > > > > 2   2.68376   vs    1.07895
> > > > > 3   4.26502   vs    1.01739
> > > > > 4   6.04306   vs    1.28924
> > > > > 5   8.04786   vs    1.57616
> > > > > 6   9.38844   vs    1.78142
> > > > 
> > > > <snip>
> > > > 
> > > > > 29    20.06   vs    3.59869
> > > > > 30  20.4353   vs     3.6991
> > > > > 31  20.9082   vs    3.73028
> > > > > 32  21.0865   vs    3.82904
> > > > > 
> > > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> > > > > pair throughput.
> > > > 
> > > > I would be more interested in real numbers than synthetic benchmarks,
> > > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
> > > > with and without this patchset?
> > > > 
> > > I added Dave Chinner <david@fromorbit.com> to this thread.
> > 
> > Oh, I missed that, and it would be better to [+Cc linux-xfs]
> > 
> > > But. The contention exists.
> > 
> > I think "theoretically can be contended" doesn't necessarily mean it's actually
> > contended in the real world.
> 
> Did you not read the commit message for the XFS commit documented
> above? vmalloc lock contention most c0ertainly does exist in the
> real world and the profiles in commit 8dc9384b7d75  ("xfs: reduce
> kvmalloc overhead for CIL shadow buffers") document it clearly.
>
> > Also I find it difficult to imagine vmalloc being highly contended because it was
> > historically considered slow and thus discouraged when performance is important.
> 
> Read the above XFS commit.
> 
> We use vmalloc in critical high performance fast paths that cannot
> tolerate high order memory allocation failure. XFS runs this
> fast path millions of times a second, and will call into
> vmalloc() several hundred thousands times a second with machine wide
> concurrency under certain types of workloads.
>
> > IOW vmalloc would not be contended when allocation size is small because we have
> > kmalloc/buddy API, and therefore I wonder which workloads are allocating very large
> > buffers and at the same time allocating very frequently, thus performance-sensitive.
> >
> > I am not against this series, but wondering which workloads would benefit ;)
> 
> Yup, you need to read the XFS commit message. If you understand what
> is in that commit message, then you wouldn't be doubting that
> vmalloc contention is real and that it is used in high performance
> fast paths that are traversed millions of times a second....

Oh, I read the commit but seems slipped my mind while reading it - sorry for such a dumb
question, now I get it, and thank you so much. In any case didn't mean to offend,
I should've read and thought more before asking.

>
> > > Apart of that per-cpu-KVA allocator can go away if we make it generic instead.
> > 
> > Not sure I understand your point, can you elaborate please?
> > 
> > And I would like to ask some side questions:
> > 
> > 1. Is vm_[un]map_ram() API still worth with this patchset?
> 
> XFS also uses this interface for mapping multi-page buffers in the
> XFS buffer cache. These are the items that also require the high
> order costly kvmalloc allocations in the transaction commit path
> when they are modified.
> 
> So, yes, we need these mapping interfaces to scale just as well as
> vmalloc itself....

I mean, even before this series, vm_[un]map_ram() caches vmap_blocks
per CPU but it has limitation on size that can be cached per cpu.

But now that vmap() itself becomes scalable after this series, I wonder
they are still worth, why not replace it with v[un]map()?
> 
> > 2. How does this patchset deals with 32-bit machines where
> >    vmalloc address space is limited?
> 
> From the XFS side, we just don't care about 32 bit machines at all.
> XFS is aimed at server and HPC environments which have been entirely
> 64 bit for a long, long time now...

But Linux still supports 32 bit machines and is not going to drop
support for them anytime soon so I think there should be at least a way to
disable this feature.

Thanks!

-- 
Hyeonggon Yoo

Doing kernel stuff as a hobby
Undergraduate | Chungnam National University
Dept. Computer Science & Engineering

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-23 18:04     ` Hyeonggon Yoo
  2023-05-23 21:43       ` Dave Chinner
@ 2023-05-24  9:50       ` Uladzislau Rezki
  2023-05-24 21:56         ` Dave Chinner
  1 sibling, 1 reply; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-24  9:50 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Baoquan He,
	Lorenzo Stoakes, Christoph Hellwig, Matthew Wilcox,
	Liam R . Howlett, Dave Chinner, Paul E . McKenney,
	Joel Fernandes, Oleksiy Avramchenko, linux-xfs

On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote:
> On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote:
> > > > 2. Motivation.
> > > > 
> > > > - The vmap code is not scalled to number of CPUs and this should be fixed;
> > > > - XFS folk has complained several times that vmalloc might be contented on
> > > >   their workloads:
> > > > 
> > > > <snip>
> > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf
> > > > Author: Dave Chinner <dchinner@redhat.com>
> > > > Date:   Tue Jan 4 17:22:18 2022 -0800
> > > > 
> > > >     xfs: reduce kvmalloc overhead for CIL shadow buffers
> > > >     
> > > >     Oh, let me count the ways that the kvmalloc API sucks dog eggs.
> > > >     
> > > >     The problem is when we are logging lots of large objects, we hit
> > > >     kvmalloc really damn hard with costly order allocations, and
> > > >     behaviour utterly sucks:
> > > 
> > > based on the commit I guess xfs should use vmalloc/kvmalloc is because
> > > it allocates large buffers, how large could it be?
> > > 
> > They use kvmalloc(). When the page allocator is not able to serve a
> > request they fallback to vmalloc. At least what i see, the sizes are:
> > 
> > from 73728 up to 1048576, i.e. 18 pages up to 256 pages.
> > 
> > > > 3. Test
> > > > 
> > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures:
> > > > 
> > > >     1-page     1-page-this-patch
> > > > 1  0.576131   vs   0.555889
> > > > 2   2.68376   vs    1.07895
> > > > 3   4.26502   vs    1.01739
> > > > 4   6.04306   vs    1.28924
> > > > 5   8.04786   vs    1.57616
> > > > 6   9.38844   vs    1.78142
> > > 
> > > <snip>
> > > 
> > > > 29    20.06   vs    3.59869
> > > > 30  20.4353   vs     3.6991
> > > > 31  20.9082   vs    3.73028
> > > > 32  21.0865   vs    3.82904
> > > > 
> > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree()
> > > > pair throughput.
> > > 
> > > I would be more interested in real numbers than synthetic benchmarks,
> > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750
> > > with and without this patchset?
> > > 
> > I added Dave Chinner <david@fromorbit.com> to this thread.
> 
> Oh, I missed that, and it would be better to [+Cc linux-xfs]
> 
> > But. The contention exists.
> 
> I think "theoretically can be contended" doesn't necessarily mean it's actually
> contended in the real world.
> 
> Also I find it difficult to imagine vmalloc being highly contended because it was
> historically considered slow and thus discouraged when performance is important.
> 
> IOW vmalloc would not be contended when allocation size is small because we have
> kmalloc/buddy API, and therefore I wonder which workloads are allocating very large
> buffers and at the same time allocating very frequently, thus performance-sensitive.
> 
> I am not against this series, but wondering which workloads would benefit ;)
> 
> > Apart of that per-cpu-KVA allocator can go away if we make it generic instead.
> 
> Not sure I understand your point, can you elaborate please?
> 
> And I would like to ask some side questions:
> 
> 1. Is vm_[un]map_ram() API still worth with this patchset?
> 
It is up to community to decide. As i see XFS needs it also. Maybe in
the future it can be removed(who knows). If the vmalloc code itself can
deliver such performance as vm_map* APIs.

>
> 2. How does this patchset deals with 32-bit machines where
>    vmalloc address space is limited?
> 
It can deal without any problems. Though i am not sure it is needed for
32-bit systems. The reason is, the vmalloc code was a bit slow when it
comes to lookup time, it used to be O(n). After that it was improved to
O(logn).

vm_map_ram() and friends interface was added because of vmalloc drawbacks.
I am not sure that there are 32-bit systems with 10/20/30... CPUs on board.
In that case it is worth care about contention.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-23 17:24   ` Liam R. Howlett
@ 2023-05-24 11:51     ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-24 11:51 UTC (permalink / raw)
  To: Liam R. Howlett, Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Tue, May 23, 2023 at 01:24:09PM -0400, Liam R. Howlett wrote:
> * Uladzislau Rezki (Sony) <urezki@gmail.com> [230522 07:09]:
> > This patch renames the adjust_va_to_fit_type() function
> > to a shorter variant and more expressive. A new name is
> > va_clip().
> > 
> > There is no a functional change as a result of this patch.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  mm/vmalloc.c | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 409285b68a67..5f900efec6a9 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1383,9 +1383,9 @@ classify_va_fit_type(struct vmap_area *va,
> >  }
> >  
> >  static __always_inline int
> > -adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
> > -		      struct vmap_area *va, unsigned long nva_start_addr,
> > -		      unsigned long size)
> > +va_clip(struct rb_root *root, struct list_head *head,
> > +	struct vmap_area *va, unsigned long nva_start_addr,
> > +	unsigned long size)
> >  {
> >  	struct vmap_area *lva = NULL;
> >  	enum fit_type type = classify_va_fit_type(va, nva_start_addr, size);
> > @@ -1501,7 +1501,7 @@ va_alloc(struct vmap_area *va,
> >  		return vend;
> >  
> >  	/* Update the free vmap_area. */
> > -	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
> > +	ret = va_clip(root, head, va, nva_start_addr, size);
> >  	if (WARN_ON_ONCE(ret))
> 
> ret is only used in this WARN_ON_ONCE() check (from patch 1), since you
> have shortened adjust_va_to_fit_type(), you can probably drop the ret
> variable all together here?
> 
I can move the WARN_ON_ONCE() check into the va_clip(). So it makes sense.

As for a return value, we should check it, at least here:

		ret = va_clip(&free_vmap_area_root,
			&free_vmap_area_list, va, start, size);
		if (WARN_ON_ONCE(unlikely(ret)))
			/* It is a BUG(), but trigger recovery instead. */
			goto recovery;



i t can fail, though it rather goes toward almost zero%.

Thank you for looking at it, Liam!

--
Uladzuslau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-24  9:50       ` Uladzislau Rezki
@ 2023-05-24 21:56         ` Dave Chinner
  2023-05-25  7:59           ` Christoph Hellwig
  2023-05-25 10:20           ` Uladzislau Rezki
  0 siblings, 2 replies; 40+ messages in thread
From: Dave Chinner @ 2023-05-24 21:56 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Hyeonggon Yoo, linux-mm, Andrew Morton, LKML, Baoquan He,
	Lorenzo Stoakes, Christoph Hellwig, Matthew Wilcox,
	Liam R . Howlett, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko, linux-xfs

On Wed, May 24, 2023 at 11:50:12AM +0200, Uladzislau Rezki wrote:
> On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote:
> > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote:
> > And I would like to ask some side questions:
> > 
> > 1. Is vm_[un]map_ram() API still worth with this patchset?
> > 
> It is up to community to decide. As i see XFS needs it also. Maybe in
> the future it can be removed(who knows). If the vmalloc code itself can
> deliver such performance as vm_map* APIs.

vm_map* APIs cannot be replaced with vmalloc, they cover a very
different use case.  i.e.  vmalloc allocates mapped memory,
vm_map_ram() maps allocated memory....

> vm_map_ram() and friends interface was added because of vmalloc drawbacks.

No. vm_map*() were scalability improvements added in 2009 to replace
on vmap() and vunmap() to avoid global lock contention in the vmap
allocator that XFS had been working around for years with it's own
internal vmap cache....

commit 95f8e302c04c0b0c6de35ab399a5551605eeb006
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Tue Jan 6 14:43:09 2009 +1100

    [XFS] use scalable vmap API
    
    Implement XFS's large buffer support with the new vmap APIs. See the vmap
    rewrite (db64fe02) for some numbers. The biggest improvement that comes from
    using the new APIs is avoiding the global KVA allocation lock on every call.
    
    Signed-off-by: Nick Piggin <npiggin@suse.de>
    Reviewed-by: Christoph Hellwig <hch@infradead.org>
    Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>

vmap/vunmap() themselves were introduce in 2.5.32 (2002) and before
that XFS was using remap_page_array() and vfree() in exactly the
same way it uses vm_map_ram() and vm_unmap_ram() today....

XFS has a long, long history of causing virtual memory allocator
scalability and contention problems. As you can see, this isn't our
first rodeo...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-24 21:56         ` Dave Chinner
@ 2023-05-25  7:59           ` Christoph Hellwig
  2023-05-25 10:20           ` Uladzislau Rezki
  1 sibling, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2023-05-25  7:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Uladzislau Rezki, Hyeonggon Yoo, linux-mm, Andrew Morton, LKML,
	Baoquan He, Lorenzo Stoakes, Christoph Hellwig, Matthew Wilcox,
	Liam R . Howlett, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko, linux-xfs

On Thu, May 25, 2023 at 07:56:56AM +1000, Dave Chinner wrote:
> > It is up to community to decide. As i see XFS needs it also. Maybe in
> > the future it can be removed(who knows). If the vmalloc code itself can
> > deliver such performance as vm_map* APIs.
> 
> vm_map* APIs cannot be replaced with vmalloc, they cover a very
> different use case.  i.e.  vmalloc allocates mapped memory,
> vm_map_ram() maps allocated memory....
> 
> > vm_map_ram() and friends interface was added because of vmalloc drawbacks.
> 
> No. vm_map*() were scalability improvements added in 2009 to replace
> on vmap() and vunmap() to avoid global lock contention in the vmap
> allocator that XFS had been working around for years with it's own
> internal vmap cache....

All of that is true.  At the same time XFS could very much switch to
vmalloc for !XBF_UNMAPPED && size > PAGE_SIZE buffers IFF that provided
an advantage.  The need for vmap and then vm_map_* initially came from
the fact that we were using the page cache to back the xfs_buf (or
page_buf back then).  With your work on getting rid of the pagecache
usage we could just use vmalloc now if we wanted to and it improves
performance.  Or at some point we could even look into just using large
folios for that with the infrastructure for that improving a lot (but
I suspect we're not quite there yet).

But ther are other uses of vm_map_* that can't do that, and they will
benefit from the additional scalability as well even IFF just using
vmalloc was fine for XFS now (I don't know, I haven't actually looked
into it).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/9] Mitigate a vmap lock contention
  2023-05-24 21:56         ` Dave Chinner
  2023-05-25  7:59           ` Christoph Hellwig
@ 2023-05-25 10:20           ` Uladzislau Rezki
  1 sibling, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-25 10:20 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Uladzislau Rezki, Hyeonggon Yoo, linux-mm, Andrew Morton, LKML,
	Baoquan He, Lorenzo Stoakes, Christoph Hellwig, Matthew Wilcox,
	Liam R . Howlett, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko, linux-xfs

On Thu, May 25, 2023 at 07:56:56AM +1000, Dave Chinner wrote:
> On Wed, May 24, 2023 at 11:50:12AM +0200, Uladzislau Rezki wrote:
> > On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote:
> > > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote:
> > > And I would like to ask some side questions:
> > > 
> > > 1. Is vm_[un]map_ram() API still worth with this patchset?
> > > 
> > It is up to community to decide. As i see XFS needs it also. Maybe in
> > the future it can be removed(who knows). If the vmalloc code itself can
> > deliver such performance as vm_map* APIs.
> 
> vm_map* APIs cannot be replaced with vmalloc, they cover a very
> different use case.  i.e.  vmalloc allocates mapped memory,
> vm_map_ram() maps allocated memory....
> 
> > vm_map_ram() and friends interface was added because of vmalloc drawbacks.
> 
> No. vm_map*() were scalability improvements added in 2009 to replace
> on vmap() and vunmap() to avoid global lock contention in the vmap
> allocator that XFS had been working around for years with it's own
> internal vmap cache....
> 
> commit 95f8e302c04c0b0c6de35ab399a5551605eeb006
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Tue Jan 6 14:43:09 2009 +1100
> 
>     [XFS] use scalable vmap API
>     
>     Implement XFS's large buffer support with the new vmap APIs. See the vmap
>     rewrite (db64fe02) for some numbers. The biggest improvement that comes from
>     using the new APIs is avoiding the global KVA allocation lock on every call.
>     
>     Signed-off-by: Nick Piggin <npiggin@suse.de>
>     Reviewed-by: Christoph Hellwig <hch@infradead.org>
>     Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
> 
> vmap/vunmap() themselves were introduce in 2.5.32 (2002) and before
> that XFS was using remap_page_array() and vfree() in exactly the
> same way it uses vm_map_ram() and vm_unmap_ram() today....
> 
> XFS has a long, long history of causing virtual memory allocator
> scalability and contention problems. As you can see, this isn't our
> first rodeo...
> 
Let me be more specific, sorry it looks like there is misunderstanding.
I am talking about removing of vb_alloc()/vb_free() per-cpu stuff. If
alloc_vmap_area() gives same performance:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d50c551592fc..a1687bbdad30 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2503,12 +2503,6 @@ void vm_unmap_ram(const void *mem, unsigned int count)

        kasan_poison_vmalloc(mem, size);

-       if (likely(count <= VMAP_MAX_ALLOC)) {
-               debug_check_no_locks_freed(mem, size);
-               vb_free(addr, size);
-               return;
-       }
-
        va = find_unlink_vmap_area(addr);
        if (WARN_ON_ONCE(!va))
                return;
@@ -2539,12 +2533,6 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
        unsigned long addr;
        void *mem;

-       if (likely(count <= VMAP_MAX_ALLOC)) {
-               mem = vb_alloc(size, GFP_KERNEL);
-               if (IS_ERR(mem))
-                       return NULL;
-               addr = (unsigned long)mem;
-       } else {
                struct vmap_area *va;
                va = alloc_vmap_area(size, PAGE_SIZE,
                                VMALLOC_START, VMALLOC_END,
@@ -2554,7 +2542,6 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)

                addr = va->va_start;
                mem = (void *)addr;
-       }

        if (vmap_pages_range(addr, addr + size, PAGE_KERNEL,
                                pages, PAGE_SHIFT) < 0) {


+ other related parts.

--
Uladzislau Rezki

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/9] mm: vmalloc: Add va_alloc() helper
  2023-05-22 11:08 ` [PATCH 1/9] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
  2023-05-23  6:05   ` Christoph Hellwig
@ 2023-05-27 19:55   ` Lorenzo Stoakes
  1 sibling, 0 replies; 40+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 19:55 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:41PM +0200, Uladzislau Rezki (Sony) wrote:
> Currently __alloc_vmap_area() function contains an open codded
> logic that finds and adjusts a VA based on allocation request.
>
> Introduce a va_alloc() helper that adjusts found VA only. It
> will be used later at least in two places.
>
> There is no a functional change as a result of this patch.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 41 ++++++++++++++++++++++++++++-------------
>  1 file changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..409285b68a67 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1482,6 +1482,32 @@ adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
>  	return 0;
>  }
>
> +static unsigned long
> +va_alloc(struct vmap_area *va,
> +	struct rb_root *root, struct list_head *head,
> +	unsigned long size, unsigned long align,
> +	unsigned long vstart, unsigned long vend)
> +{
> +	unsigned long nva_start_addr;
> +	int ret;
> +
> +	if (va->va_start > vstart)
> +		nva_start_addr = ALIGN(va->va_start, align);
> +	else
> +		nva_start_addr = ALIGN(vstart, align);
> +
> +	/* Check the "vend" restriction. */
> +	if (nva_start_addr + size > vend)
> +		return vend;
> +
> +	/* Update the free vmap_area. */
> +	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
> +	if (WARN_ON_ONCE(ret))
> +		return vend;
> +
> +	return nva_start_addr;
> +}
> +
>  /*
>   * Returns a start address of the newly allocated area, if success.
>   * Otherwise a vend is returned that indicates failure.
> @@ -1494,7 +1520,6 @@ __alloc_vmap_area(struct rb_root *root, struct list_head *head,
>  	bool adjust_search_size = true;
>  	unsigned long nva_start_addr;
>  	struct vmap_area *va;
> -	int ret;
>
>  	/*
>  	 * Do not adjust when:
> @@ -1512,18 +1537,8 @@ __alloc_vmap_area(struct rb_root *root, struct list_head *head,
>  	if (unlikely(!va))
>  		return vend;
>
> -	if (va->va_start > vstart)
> -		nva_start_addr = ALIGN(va->va_start, align);
> -	else
> -		nva_start_addr = ALIGN(vstart, align);
> -
> -	/* Check the "vend" restriction. */
> -	if (nva_start_addr + size > vend)
> -		return vend;
> -
> -	/* Update the free vmap_area. */
> -	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
> -	if (WARN_ON_ONCE(ret))
> +	nva_start_addr = va_alloc(va, root, head, size, align, vstart, vend);
> +	if (nva_start_addr == vend)
>  		return vend;
>
>  #if DEBUG_AUGMENT_LOWEST_MATCH_CHECK
> --
> 2.30.2
>

LGTM,

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-22 11:08 ` [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
  2023-05-23  6:06   ` Christoph Hellwig
  2023-05-23 17:24   ` Liam R. Howlett
@ 2023-05-27 21:50   ` Lorenzo Stoakes
  2023-05-29 20:37     ` Uladzislau Rezki
  2 siblings, 1 reply; 40+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 21:50 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:42PM +0200, Uladzislau Rezki (Sony) wrote:
> This patch renames the adjust_va_to_fit_type() function
> to a shorter variant and more expressive. A new name is
> va_clip().
>
> There is no a functional change as a result of this patch.

Small nit - I think:-

'This patch renames the adjust_va_to_fit_type() function to va_clip() which
is shorter and more expressive.'

Reads better here.

>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 409285b68a67..5f900efec6a9 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1383,9 +1383,9 @@ classify_va_fit_type(struct vmap_area *va,
>  }
>
>  static __always_inline int
> -adjust_va_to_fit_type(struct rb_root *root, struct list_head *head,
> -		      struct vmap_area *va, unsigned long nva_start_addr,
> -		      unsigned long size)
> +va_clip(struct rb_root *root, struct list_head *head,
> +	struct vmap_area *va, unsigned long nva_start_addr,
> +	unsigned long size)
>  {
>  	struct vmap_area *lva = NULL;
>  	enum fit_type type = classify_va_fit_type(va, nva_start_addr, size);
> @@ -1501,7 +1501,7 @@ va_alloc(struct vmap_area *va,
>  		return vend;
>
>  	/* Update the free vmap_area. */
> -	ret = adjust_va_to_fit_type(root, head, va, nva_start_addr, size);
> +	ret = va_clip(root, head, va, nva_start_addr, size);
>  	if (WARN_ON_ONCE(ret))
>  		return vend;
>
> @@ -3979,9 +3979,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  			/* It is a BUG(), but trigger recovery instead. */
>  			goto recovery;
>
> -		ret = adjust_va_to_fit_type(&free_vmap_area_root,
> -					    &free_vmap_area_list,
> -					    va, start, size);
> +		ret = va_clip(&free_vmap_area_root,
> +			&free_vmap_area_list, va, start, size);
>  		if (WARN_ON_ONCE(unlikely(ret)))
>  			/* It is a BUG(), but trigger recovery instead. */
>  			goto recovery;
> --
> 2.30.2
>

Otherwise,

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c
  2023-05-22 11:08 ` [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
  2023-05-23  6:06   ` Christoph Hellwig
@ 2023-05-27 21:52   ` Lorenzo Stoakes
  1 sibling, 0 replies; 40+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 21:52 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Mon, May 22, 2023 at 01:08:43PM +0200, Uladzislau Rezki (Sony) wrote:
> A vmap_init_free_space() is a function that setups a vmap space
> and is considered as part of initialization phase. Since a main
> entry which is vmalloc_init(), has been moved down in vmalloc.c
> it makes sense to follow the pattern.
>
> There is no a functional change as a result of this patch.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 82 ++++++++++++++++++++++++++--------------------------
>  1 file changed, 41 insertions(+), 41 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 5f900efec6a9..19dcffb0d563 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2416,47 +2416,6 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
>  	kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
>  }
>
> -static void vmap_init_free_space(void)
> -{
> -	unsigned long vmap_start = 1;
> -	const unsigned long vmap_end = ULONG_MAX;
> -	struct vmap_area *busy, *free;
> -
> -	/*
> -	 *     B     F     B     B     B     F
> -	 * -|-----|.....|-----|-----|-----|.....|-
> -	 *  |           The KVA space           |
> -	 *  |<--------------------------------->|
> -	 */
> -	list_for_each_entry(busy, &vmap_area_list, list) {
> -		if (busy->va_start - vmap_start > 0) {
> -			free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
> -			if (!WARN_ON_ONCE(!free)) {
> -				free->va_start = vmap_start;
> -				free->va_end = busy->va_start;
> -
> -				insert_vmap_area_augment(free, NULL,
> -					&free_vmap_area_root,
> -						&free_vmap_area_list);
> -			}
> -		}
> -
> -		vmap_start = busy->va_end;
> -	}
> -
> -	if (vmap_end - vmap_start > 0) {
> -		free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
> -		if (!WARN_ON_ONCE(!free)) {
> -			free->va_start = vmap_start;
> -			free->va_end = vmap_end;
> -
> -			insert_vmap_area_augment(free, NULL,
> -				&free_vmap_area_root,
> -					&free_vmap_area_list);
> -		}
> -	}
> -}
> -
>  static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
>  	struct vmap_area *va, unsigned long flags, const void *caller)
>  {
> @@ -4271,6 +4230,47 @@ module_init(proc_vmalloc_init);
>
>  #endif
>
> +static void vmap_init_free_space(void)
> +{
> +	unsigned long vmap_start = 1;
> +	const unsigned long vmap_end = ULONG_MAX;
> +	struct vmap_area *busy, *free;
> +
> +	/*
> +	 *     B     F     B     B     B     F
> +	 * -|-----|.....|-----|-----|-----|.....|-
> +	 *  |           The KVA space           |
> +	 *  |<--------------------------------->|
> +	 */
> +	list_for_each_entry(busy, &vmap_area_list, list) {
> +		if (busy->va_start - vmap_start > 0) {
> +			free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
> +			if (!WARN_ON_ONCE(!free)) {
> +				free->va_start = vmap_start;
> +				free->va_end = busy->va_start;
> +
> +				insert_vmap_area_augment(free, NULL,
> +					&free_vmap_area_root,
> +						&free_vmap_area_list);
> +			}
> +		}
> +
> +		vmap_start = busy->va_end;
> +	}
> +
> +	if (vmap_end - vmap_start > 0) {
> +		free = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
> +		if (!WARN_ON_ONCE(!free)) {
> +			free->va_start = vmap_start;
> +			free->va_end = vmap_end;
> +
> +			insert_vmap_area_augment(free, NULL,
> +				&free_vmap_area_root,
> +					&free_vmap_area_list);
> +		}
> +	}
> +}
> +
>  void __init vmalloc_init(void)
>  {
>  	struct vmap_area *va;
> --
> 2.30.2
>

LGTM,

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function
  2023-05-27 21:50   ` Lorenzo Stoakes
@ 2023-05-29 20:37     ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-05-29 20:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
	Matthew Wilcox, Liam R . Howlett, Dave Chinner,
	Paul E . McKenney, Joel Fernandes, Oleksiy Avramchenko

On Sat, May 27, 2023 at 10:50:22PM +0100, Lorenzo Stoakes wrote:
> On Mon, May 22, 2023 at 01:08:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > This patch renames the adjust_va_to_fit_type() function
> > to a shorter variant and more expressive. A new name is
> > va_clip().
> >
> > There is no a functional change as a result of this patch.
> 
> Small nit - I think:-
> 
> 'This patch renames the adjust_va_to_fit_type() function to va_clip() which
> is shorter and more expressive.'
> 
> Reads better here.
> 
I will update.

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock
  2023-05-22 11:08 ` [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock Uladzislau Rezki (Sony)
@ 2023-06-05  0:43   ` Baoquan He
  2023-06-06  9:01     ` Uladzislau Rezki
  0 siblings, 1 reply; 40+ messages in thread
From: Baoquan He @ 2023-06-05  0:43 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, LKML, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On 05/22/23 at 01:08pm, Uladzislau Rezki (Sony) wrote:
......  
> +static unsigned long
> +this_cpu_zone_alloc_fill(struct cpu_vmap_zone *z,
> +	unsigned long size, unsigned long align,
> +	gfp_t gfp_mask, int node)
> +{
> +	unsigned long addr = VMALLOC_END;
> +	struct vmap_area *va;
> +
> +	/*
> +	 * It still can race. One task sets a progress to
> +	 * 1 a second one gets preempted on entry, the first
> +	 * zeroed the progress flag and second proceed with
> +	 * an extra prefetch.
> +	 */
> +	if (atomic_xchg(&z->fill_in_progress, 1))
> +		return addr;
> +
> +	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
> +	if (unlikely(!va))
> +		goto out;
> +
> +	spin_lock(&free_vmap_area_lock);
> +	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
> +		cvz_size, 1, VMALLOC_START, VMALLOC_END);
> +	spin_unlock(&free_vmap_area_lock);

The 'z' is passed in from this_cpu_zone_alloc(), and it's got with
raw_cpu_ptr(&cpu_vmap_zone). Here when we try to get chunk of cvz_size
from free_vmap_area_root/free_vmap_area_list, how can we guarantee it
must belong to the 'z' zone? With my understanding, __alloc_vmap_area()
will get efficient address range sequentially bottom up from
free_vmap_area_root. Please correct me if I am wrong.

static unsigned long
this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
{
        struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
	......
	if (addr == VMALLOC_END && left < 4 * PAGE_SIZE)
                addr = this_cpu_zone_alloc_fill(z, size, align, gfp_mask, node);
}

> +
> +	if (addr == VMALLOC_END) {
> +		kmem_cache_free(vmap_area_cachep, va);
> +		goto out;
> +	}
> +
> +	va->va_start = addr;
> +	va->va_end = addr + cvz_size;
> +
> +	fbl_lock(z, FREE);
> +	va = merge_or_add_vmap_area_augment(va,
> +		&fbl_root(z, FREE), &fbl_head(z, FREE));
> +	addr = va_alloc(va, &fbl_root(z, FREE), &fbl_head(z, FREE),
> +		size, align, VMALLOC_START, VMALLOC_END);
> +	fbl_unlock(z, FREE);
> +
> +out:
> +	atomic_set(&z->fill_in_progress, 0);
> +	return addr;
> +}
> +
> +static unsigned long
> +this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
> +{
> +	struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
> +	unsigned long extra = align > PAGE_SIZE ? align : 0;
> +	unsigned long addr = VMALLOC_END, left = 0;
> +
> +	/*
> +	 * It is disabled, fallback to a global heap.
> +	 */
> +	if (cvz_size == ULONG_MAX)
> +		return addr;
> +
> +	/*
> +	 * Any allocation bigger/equal than one half of
                          ~~~~~~typo~~~~~~  bigger than/equal to
> +	 * a zone-size will fallback to a global heap.
> +	 */
> +	if (cvz_size / (size + extra) < 3)
> +		return addr;
> +
> +	if (RB_EMPTY_ROOT(&fbl_root(z, FREE)))
> +		goto fill;
> +
> +	fbl_lock(z, FREE);
> +	addr = __alloc_vmap_area(&fbl_root(z, FREE), &fbl_head(z, FREE),
> +		size, align, VMALLOC_START, VMALLOC_END);
> +
> +	if (addr == VMALLOC_END)
> +		left = get_subtree_max_size(fbl_root(z, FREE).rb_node);
> +	fbl_unlock(z, FREE);
> +
> +fill:
> +	/*
> +	 * A low watermark is 3 pages.
> +	 */
> +	if (addr == VMALLOC_END && left < 4 * PAGE_SIZE)
> +		addr = this_cpu_zone_alloc_fill(z, size, align, gfp_mask, node);
> +
> +	return addr;
> +}
> +
>  /*
>   * Allocate a region of KVA of the specified size and alignment, within the
>   * vstart and vend.
> @@ -1678,11 +1765,21 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	 */
>  	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask);
>  
> +	/*
> +	 * Fast path allocation, start with it.
> +	 */
> +	if (vstart == VMALLOC_START && vend == VMALLOC_END)
> +		addr = this_cpu_zone_alloc(size, align, gfp_mask, node);
> +	else
> +		addr = vend;
> +
>  retry:
> -	preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node);
> -	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
> -		size, align, vstart, vend);
> -	spin_unlock(&free_vmap_area_lock);
> +	if (addr == vend) {
> +		preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node);
> +		addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
> +			size, align, vstart, vend);
> +		spin_unlock(&free_vmap_area_lock);
> +	}
>  
>  	trace_alloc_vmap_area(addr, size, align, vstart, vend, addr == vend);
>  
> @@ -1827,6 +1924,27 @@ purge_cpu_vmap_zone(struct cpu_vmap_zone *z)
>  	return num_purged_areas;
>  }
>  
> +static void
> +drop_cpu_vmap_cache(struct cpu_vmap_zone *z)
> +{
> +	struct vmap_area *va, *n_va;
> +	LIST_HEAD(free_head);
> +
> +	if (RB_EMPTY_ROOT(&fbl_root(z, FREE)))
> +		return;
> +
> +	fbl_lock(z, FREE);
> +	WRITE_ONCE(fbl(z, FREE, root.rb_node), NULL);
> +	list_replace_init(&fbl_head(z, FREE), &free_head);
> +	fbl_unlock(z, FREE);
> +
> +	spin_lock(&free_vmap_area_lock);
> +	list_for_each_entry_safe(va, n_va, &free_head, list)
> +		merge_or_add_vmap_area_augment(va,
> +			&free_vmap_area_root, &free_vmap_area_list);
> +	spin_unlock(&free_vmap_area_lock);
> +}
> +
>  /*
>   * Purges all lazily-freed vmap areas.
>   */
> @@ -1868,6 +1986,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>  		for_each_possible_cpu(i) {
>  			z = per_cpu_ptr(&cpu_vmap_zone, i);
>  			num_purged_areas += purge_cpu_vmap_zone(z);
> +			drop_cpu_vmap_cache(z);
>  		}
>  	}
>  
> -- 
> 2.30.2
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock
  2023-06-05  0:43   ` Baoquan He
@ 2023-06-06  9:01     ` Uladzislau Rezki
  2023-06-06 12:11       ` Baoquan He
  0 siblings, 1 reply; 40+ messages in thread
From: Uladzislau Rezki @ 2023-06-06  9:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, LKML, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Mon, Jun 05, 2023 at 08:43:39AM +0800, Baoquan He wrote:
> On 05/22/23 at 01:08pm, Uladzislau Rezki (Sony) wrote:
> ......  
> > +static unsigned long
> > +this_cpu_zone_alloc_fill(struct cpu_vmap_zone *z,
> > +	unsigned long size, unsigned long align,
> > +	gfp_t gfp_mask, int node)
> > +{
> > +	unsigned long addr = VMALLOC_END;
> > +	struct vmap_area *va;
> > +
> > +	/*
> > +	 * It still can race. One task sets a progress to
> > +	 * 1 a second one gets preempted on entry, the first
> > +	 * zeroed the progress flag and second proceed with
> > +	 * an extra prefetch.
> > +	 */
> > +	if (atomic_xchg(&z->fill_in_progress, 1))
> > +		return addr;
> > +
> > +	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
> > +	if (unlikely(!va))
> > +		goto out;
> > +
> > +	spin_lock(&free_vmap_area_lock);
> > +	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
> > +		cvz_size, 1, VMALLOC_START, VMALLOC_END);
> > +	spin_unlock(&free_vmap_area_lock);
> 
> The 'z' is passed in from this_cpu_zone_alloc(), and it's got with
> raw_cpu_ptr(&cpu_vmap_zone). Here when we try to get chunk of cvz_size
> from free_vmap_area_root/free_vmap_area_list, how can we guarantee it
> must belong to the 'z' zone? With my understanding, __alloc_vmap_area()
> will get efficient address range sequentially bottom up from
> free_vmap_area_root. Please correct me if I am wrong.
> 
We do not guarantee that and it does not worth it. The most important is:

If we search a zone that exactly match a CPU-id the usage of a global
vmap space becomes more wider, i.e. toward a high address space. This is
not good because we can affect other users which allocate within a specific
range. On a big system it might be a problem. Therefore a pre-fetch is done 
sequentially on demand.

Secondly, i do not see much difference in performance if we follow
exactly CPU-zone-id.

> static unsigned long
> this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
> {
>         struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
> 	......
> 	if (addr == VMALLOC_END && left < 4 * PAGE_SIZE)
>                 addr = this_cpu_zone_alloc_fill(z, size, align, gfp_mask, node);
> }
> 
> > +
> > +	if (addr == VMALLOC_END) {
> > +		kmem_cache_free(vmap_area_cachep, va);
> > +		goto out;
> > +	}
> > +
> > +	va->va_start = addr;
> > +	va->va_end = addr + cvz_size;
> > +
> > +	fbl_lock(z, FREE);
> > +	va = merge_or_add_vmap_area_augment(va,
> > +		&fbl_root(z, FREE), &fbl_head(z, FREE));
> > +	addr = va_alloc(va, &fbl_root(z, FREE), &fbl_head(z, FREE),
> > +		size, align, VMALLOC_START, VMALLOC_END);
> > +	fbl_unlock(z, FREE);
> > +
> > +out:
> > +	atomic_set(&z->fill_in_progress, 0);
> > +	return addr;
> > +}
> > +
> > +static unsigned long
> > +this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
> > +{
> > +	struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
> > +	unsigned long extra = align > PAGE_SIZE ? align : 0;
> > +	unsigned long addr = VMALLOC_END, left = 0;
> > +
> > +	/*
> > +	 * It is disabled, fallback to a global heap.
> > +	 */
> > +	if (cvz_size == ULONG_MAX)
> > +		return addr;
> > +
> > +	/*
> > +	 * Any allocation bigger/equal than one half of
>                           ~~~~~~typo~~~~~~  bigger than/equal to
I will rework it!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock
  2023-06-06  9:01     ` Uladzislau Rezki
@ 2023-06-06 12:11       ` Baoquan He
  2023-06-07  6:58         ` Uladzislau Rezki
  0 siblings, 1 reply; 40+ messages in thread
From: Baoquan He @ 2023-06-06 12:11 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Andrew Morton, LKML, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On 06/06/23 at 11:01am, Uladzislau Rezki wrote:
> On Mon, Jun 05, 2023 at 08:43:39AM +0800, Baoquan He wrote:
> > On 05/22/23 at 01:08pm, Uladzislau Rezki (Sony) wrote:
> > ......  
> > > +static unsigned long
> > > +this_cpu_zone_alloc_fill(struct cpu_vmap_zone *z,
> > > +	unsigned long size, unsigned long align,
> > > +	gfp_t gfp_mask, int node)
> > > +{
> > > +	unsigned long addr = VMALLOC_END;
> > > +	struct vmap_area *va;
> > > +
> > > +	/*
> > > +	 * It still can race. One task sets a progress to
> > > +	 * 1 a second one gets preempted on entry, the first
> > > +	 * zeroed the progress flag and second proceed with
> > > +	 * an extra prefetch.
> > > +	 */
> > > +	if (atomic_xchg(&z->fill_in_progress, 1))
> > > +		return addr;
> > > +
> > > +	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
> > > +	if (unlikely(!va))
> > > +		goto out;
> > > +
> > > +	spin_lock(&free_vmap_area_lock);
> > > +	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
> > > +		cvz_size, 1, VMALLOC_START, VMALLOC_END);
> > > +	spin_unlock(&free_vmap_area_lock);
> > 
> > The 'z' is passed in from this_cpu_zone_alloc(), and it's got with
> > raw_cpu_ptr(&cpu_vmap_zone). Here when we try to get chunk of cvz_size
> > from free_vmap_area_root/free_vmap_area_list, how can we guarantee it
> > must belong to the 'z' zone? With my understanding, __alloc_vmap_area()
> > will get efficient address range sequentially bottom up from
> > free_vmap_area_root. Please correct me if I am wrong.
> > 
> We do not guarantee that and it does not worth it. The most important is:
> 
> If we search a zone that exactly match a CPU-id the usage of a global
> vmap space becomes more wider, i.e. toward a high address space. This is
> not good because we can affect other users which allocate within a specific
> range. On a big system it might be a problem. Therefore a pre-fetch is done 
> sequentially on demand.
> 
> Secondly, i do not see much difference in performance if we follow
> exactly CPU-zone-id.

Ah, I see, the allocated range will be put into appropriate zone's
busy tree by calculating its zone via addr_to_cvz(va->va_start). The
cvz->free tree is only a percpu pre-fetch cache. This is smart, thanks a
lot for explanation. 

> 
> > static unsigned long
> > this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
> > {
> >         struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
> > 	......
> > 	if (addr == VMALLOC_END && left < 4 * PAGE_SIZE)
> >                 addr = this_cpu_zone_alloc_fill(z, size, align, gfp_mask, node);
> > }
> > 
> > > +
> > > +	if (addr == VMALLOC_END) {
> > > +		kmem_cache_free(vmap_area_cachep, va);
> > > +		goto out;
> > > +	}
> > > +
> > > +	va->va_start = addr;
> > > +	va->va_end = addr + cvz_size;
> > > +
> > > +	fbl_lock(z, FREE);
> > > +	va = merge_or_add_vmap_area_augment(va,
> > > +		&fbl_root(z, FREE), &fbl_head(z, FREE));
> > > +	addr = va_alloc(va, &fbl_root(z, FREE), &fbl_head(z, FREE),
> > > +		size, align, VMALLOC_START, VMALLOC_END);
> > > +	fbl_unlock(z, FREE);
> > > +
> > > +out:
> > > +	atomic_set(&z->fill_in_progress, 0);
> > > +	return addr;
> > > +}
> > > +
> > > +static unsigned long
> > > +this_cpu_zone_alloc(unsigned long size, unsigned long align, gfp_t gfp_mask, int node)
> > > +{
> > > +	struct cpu_vmap_zone *z = raw_cpu_ptr(&cpu_vmap_zone);
> > > +	unsigned long extra = align > PAGE_SIZE ? align : 0;
> > > +	unsigned long addr = VMALLOC_END, left = 0;
> > > +
> > > +	/*
> > > +	 * It is disabled, fallback to a global heap.
> > > +	 */
> > > +	if (cvz_size == ULONG_MAX)
> > > +		return addr;
> > > +
> > > +	/*
> > > +	 * Any allocation bigger/equal than one half of
> >                           ~~~~~~typo~~~~~~  bigger than/equal to
> I will rework it!
> 
> --
> Uladzislau Rezki
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock
  2023-06-06 12:11       ` Baoquan He
@ 2023-06-07  6:58         ` Uladzislau Rezki
  0 siblings, 0 replies; 40+ messages in thread
From: Uladzislau Rezki @ 2023-06-07  6:58 UTC (permalink / raw)
  To: Baoquan He
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, LKML, Lorenzo Stoakes,
	Christoph Hellwig, Matthew Wilcox, Liam R . Howlett,
	Dave Chinner, Paul E . McKenney, Joel Fernandes,
	Oleksiy Avramchenko

On Tue, Jun 06, 2023 at 08:11:04PM +0800, Baoquan He wrote:
> On 06/06/23 at 11:01am, Uladzislau Rezki wrote:
> > On Mon, Jun 05, 2023 at 08:43:39AM +0800, Baoquan He wrote:
> > > On 05/22/23 at 01:08pm, Uladzislau Rezki (Sony) wrote:
> > > ......  
> > > > +static unsigned long
> > > > +this_cpu_zone_alloc_fill(struct cpu_vmap_zone *z,
> > > > +	unsigned long size, unsigned long align,
> > > > +	gfp_t gfp_mask, int node)
> > > > +{
> > > > +	unsigned long addr = VMALLOC_END;
> > > > +	struct vmap_area *va;
> > > > +
> > > > +	/*
> > > > +	 * It still can race. One task sets a progress to
> > > > +	 * 1 a second one gets preempted on entry, the first
> > > > +	 * zeroed the progress flag and second proceed with
> > > > +	 * an extra prefetch.
> > > > +	 */
> > > > +	if (atomic_xchg(&z->fill_in_progress, 1))
> > > > +		return addr;
> > > > +
> > > > +	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
> > > > +	if (unlikely(!va))
> > > > +		goto out;
> > > > +
> > > > +	spin_lock(&free_vmap_area_lock);
> > > > +	addr = __alloc_vmap_area(&free_vmap_area_root, &free_vmap_area_list,
> > > > +		cvz_size, 1, VMALLOC_START, VMALLOC_END);
> > > > +	spin_unlock(&free_vmap_area_lock);
> > > 
> > > The 'z' is passed in from this_cpu_zone_alloc(), and it's got with
> > > raw_cpu_ptr(&cpu_vmap_zone). Here when we try to get chunk of cvz_size
> > > from free_vmap_area_root/free_vmap_area_list, how can we guarantee it
> > > must belong to the 'z' zone? With my understanding, __alloc_vmap_area()
> > > will get efficient address range sequentially bottom up from
> > > free_vmap_area_root. Please correct me if I am wrong.
> > > 
> > We do not guarantee that and it does not worth it. The most important is:
> > 
> > If we search a zone that exactly match a CPU-id the usage of a global
> > vmap space becomes more wider, i.e. toward a high address space. This is
> > not good because we can affect other users which allocate within a specific
> > range. On a big system it might be a problem. Therefore a pre-fetch is done 
> > sequentially on demand.
> > 
> > Secondly, i do not see much difference in performance if we follow
> > exactly CPU-zone-id.
> 
> Ah, I see, the allocated range will be put into appropriate zone's
> busy tree by calculating its zone via addr_to_cvz(va->va_start). The
> cvz->free tree is only a percpu pre-fetch cache. This is smart, thanks a
> lot for explanation. 
> 
Yes. The busy/lazy are placed per-cpu zone(using addr_to_cvz(addr)) whereas
the allocated chunk on a current CPU.

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-06-07  6:58 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-22 11:08 [PATCH 0/9] Mitigate a vmap lock contention Uladzislau Rezki (Sony)
2023-05-22 11:08 ` [PATCH 1/9] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
2023-05-23  6:05   ` Christoph Hellwig
2023-05-23  9:57     ` Uladzislau Rezki
2023-05-27 19:55   ` Lorenzo Stoakes
2023-05-22 11:08 ` [PATCH 2/9] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
2023-05-23  6:06   ` Christoph Hellwig
2023-05-23 10:01     ` Uladzislau Rezki
2023-05-23 17:24   ` Liam R. Howlett
2023-05-24 11:51     ` Uladzislau Rezki
2023-05-27 21:50   ` Lorenzo Stoakes
2023-05-29 20:37     ` Uladzislau Rezki
2023-05-22 11:08 ` [PATCH 3/9] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
2023-05-23  6:06   ` Christoph Hellwig
2023-05-27 21:52   ` Lorenzo Stoakes
2023-05-22 11:08 ` [PATCH 4/9] mm: vmalloc: Add a per-CPU-zone infrastructure Uladzislau Rezki (Sony)
2023-05-23  6:08   ` Christoph Hellwig
2023-05-23 14:53     ` Uladzislau Rezki
2023-05-23 15:13       ` Christoph Hellwig
2023-05-23 15:32         ` Uladzislau Rezki
2023-05-22 11:08 ` [PATCH 5/9] mm: vmalloc: Insert busy-VA per-cpu zone Uladzislau Rezki (Sony)
2023-05-23  6:12   ` Christoph Hellwig
2023-05-23 15:00     ` Uladzislau Rezki
2023-05-22 11:08 ` [PATCH 6/9] mm: vmalloc: Support multiple zones in vmallocinfo Uladzislau Rezki (Sony)
2023-05-22 11:08 ` [PATCH 7/9] mm: vmalloc: Insert lazy-VA per-cpu zone Uladzislau Rezki (Sony)
2023-05-22 11:08 ` [PATCH 8/9] mm: vmalloc: Offload free_vmap_area_lock global lock Uladzislau Rezki (Sony)
2023-06-05  0:43   ` Baoquan He
2023-06-06  9:01     ` Uladzislau Rezki
2023-06-06 12:11       ` Baoquan He
2023-06-07  6:58         ` Uladzislau Rezki
2023-05-22 11:08 ` [PATCH 9/9] mm: vmalloc: Scale and activate cvz_size Uladzislau Rezki (Sony)
2023-05-23 11:59 ` [PATCH 0/9] Mitigate a vmap lock contention Hyeonggon Yoo
2023-05-23 15:12   ` Uladzislau Rezki
2023-05-23 18:04     ` Hyeonggon Yoo
2023-05-23 21:43       ` Dave Chinner
2023-05-24  1:30         ` Hyeonggon Yoo
2023-05-24  9:50       ` Uladzislau Rezki
2023-05-24 21:56         ` Dave Chinner
2023-05-25  7:59           ` Christoph Hellwig
2023-05-25 10:20           ` Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).