linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Some cleanups for the KVA/vmalloc
@ 2019-06-06 12:04 Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 1/4] mm/vmalloc.c: remove "node" argument Uladzislau Rezki (Sony)
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-06-06 12:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roman Gushchin, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

v4->v5:
    - base on next-20190606
    - embed preloading directly into alloc_vmap_area(). [2] patch
    - update the commit message of [2].
    - if RB_EMPTY_NODE(), generate warning and return; [4] patch

v3->v4:
    - Replace BUG_ON by WARN_ON() in [4];
    - Update the commit message of the [4].

v2->v3:
    - remove the odd comment from the [3];

v1->v2:
    - update the commit message. [2] patch;
    - fix typos in comments. [2] patch;
    - do the "preload" for NUMA awareness. [2] patch;

Uladzislau Rezki (Sony) (4):
  mm/vmalloc.c: remove "node" argument
  mm/vmalloc.c: preload a CPU with one object for split purpose
  mm/vmalloc.c: get rid of one single unlink_va() when merge
  mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va()

 mm/vmalloc.c | 92 ++++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 65 insertions(+), 27 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/4] mm/vmalloc.c: remove "node" argument
  2019-06-06 12:04 [PATCH v5 0/4] Some cleanups for the KVA/vmalloc Uladzislau Rezki (Sony)
@ 2019-06-06 12:04 ` Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 2/4] mm/vmalloc.c: preload a CPU with one object for split purpose Uladzislau Rezki (Sony)
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-06-06 12:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roman Gushchin, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

Remove unused argument from the __alloc_vmap_area() function.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
---
 mm/vmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bed6a065f73a..6e5e3e39c05e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -986,7 +986,7 @@ adjust_va_to_fit_type(struct vmap_area *va,
  */
 static __always_inline unsigned long
 __alloc_vmap_area(unsigned long size, unsigned long align,
-	unsigned long vstart, unsigned long vend, int node)
+	unsigned long vstart, unsigned long vend)
 {
 	unsigned long nva_start_addr;
 	struct vmap_area *va;
@@ -1063,7 +1063,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	 * If an allocation fails, the "vend" address is
 	 * returned. Therefore trigger the overflow path.
 	 */
-	addr = __alloc_vmap_area(size, align, vstart, vend, node);
+	addr = __alloc_vmap_area(size, align, vstart, vend);
 	if (unlikely(addr == vend))
 		goto overflow;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 2/4] mm/vmalloc.c: preload a CPU with one object for split purpose
  2019-06-06 12:04 [PATCH v5 0/4] Some cleanups for the KVA/vmalloc Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 1/4] mm/vmalloc.c: remove "node" argument Uladzislau Rezki (Sony)
@ 2019-06-06 12:04 ` Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 3/4] mm/vmalloc.c: get rid of one single unlink_va() when merge Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 4/4] mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va() Uladzislau Rezki (Sony)
  3 siblings, 0 replies; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-06-06 12:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roman Gushchin, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

Refactor the NE_FIT_TYPE split case when it comes to an allocation of one
extra object. We need it in order to build a remaining space. The preload
is done per CPU in non-atomic context with GFP_KERNEL flags.

More permissive parameters can be beneficial for systems which are suffer
from high memory pressure or low memory condition. For example on my KVM
system(4xCPUs, no swap, 256MB RAM) i can simulate the failure of page
allocation with GFP_NOWAIT flags. Using "stress-ng" tool and starting N
workers spinning on fork() and exit(), i can trigger below trace:

<snip>
[  179.815161] stress-ng-fork: page allocation failure: order:0, mode:0x40800(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[  179.815168] CPU: 0 PID: 12612 Comm: stress-ng-fork Not tainted 5.2.0-rc3+ #1003
[  179.815170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  179.815171] Call Trace:
[  179.815178]  dump_stack+0x5c/0x7b
[  179.815182]  warn_alloc+0x108/0x190
[  179.815187]  __alloc_pages_slowpath+0xdc7/0xdf0
[  179.815191]  __alloc_pages_nodemask+0x2de/0x330
[  179.815194]  cache_grow_begin+0x77/0x420
[  179.815197]  fallback_alloc+0x161/0x200
[  179.815200]  kmem_cache_alloc+0x1c9/0x570
[  179.815202]  alloc_vmap_area+0x32c/0x990
[  179.815206]  __get_vm_area_node+0xb0/0x170
[  179.815208]  __vmalloc_node_range+0x6d/0x230
[  179.815211]  ? _do_fork+0xce/0x3d0
[  179.815213]  copy_process.part.46+0x850/0x1b90
[  179.815215]  ? _do_fork+0xce/0x3d0
[  179.815219]  _do_fork+0xce/0x3d0
[  179.815226]  ? __do_page_fault+0x2bf/0x4e0
[  179.815229]  do_syscall_64+0x55/0x130
[  179.815231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.815234] RIP: 0033:0x7fedec4c738b
...
[  179.815237] RSP: 002b:00007ffda469d730 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  179.815239] RAX: ffffffffffffffda RBX: 00007ffda469d730 RCX: 00007fedec4c738b
[  179.815240] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  179.815241] RBP: 00007ffda469d780 R08: 00007fededd6e300 R09: 00007ffda47f50a0
[  179.815242] R10: 00007fededd6e5d0 R11: 0000000000000246 R12: 0000000000000000
[  179.815243] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
[  179.815245] Mem-Info:
[  179.815249] active_anon:12686 inactive_anon:14760 isolated_anon:0
                active_file:502 inactive_file:61 isolated_file:70
                unevictable:2 dirty:0 writeback:0 unstable:0
                slab_reclaimable:2380 slab_unreclaimable:7520
                mapped:15069 shmem:14813 pagetables:10833 bounce:0
                free:1922 free_pcp:229 free_cma:0
<snip>

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6e5e3e39c05e..fcda966589a6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -365,6 +365,13 @@ static LIST_HEAD(free_vmap_area_list);
  */
 static struct rb_root free_vmap_area_root = RB_ROOT;
 
+/*
+ * Preload a CPU with one object for "no edge" split case. The
+ * aim is to get rid of allocations from the atomic context, thus
+ * to use more permissive allocation masks.
+ */
+static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node);
+
 static __always_inline unsigned long
 va_size(struct vmap_area *va)
 {
@@ -951,9 +958,24 @@ adjust_va_to_fit_type(struct vmap_area *va,
 		 *   L V  NVA  V R
 		 * |---|-------|---|
 		 */
-		lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
-		if (unlikely(!lva))
-			return -1;
+		lva = __this_cpu_xchg(ne_fit_preload_node, NULL);
+		if (unlikely(!lva)) {
+			/*
+			 * For percpu allocator we do not do any pre-allocation
+			 * and leave it as it is. The reason is it most likely
+			 * never ends up with NE_FIT_TYPE splitting. In case of
+			 * percpu allocations offsets and sizes are aligned to
+			 * fixed align request, i.e. RE_FIT_TYPE and FL_FIT_TYPE
+			 * are its main fitting cases.
+			 *
+			 * There are a few exceptions though, as an example it is
+			 * a first allocation (early boot up) when we have "one"
+			 * big free space that has to be split.
+			 */
+			lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
+			if (!lva)
+				return -1;
+		}
 
 		/*
 		 * Build the remainder.
@@ -1032,7 +1054,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 				unsigned long vstart, unsigned long vend,
 				int node, gfp_t gfp_mask)
 {
-	struct vmap_area *va;
+	struct vmap_area *va, *pva;
 	unsigned long addr;
 	int purged = 0;
 
@@ -1057,7 +1079,32 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
 
 retry:
+	/*
+	 * Preload this CPU with one extra vmap_area object to ensure
+	 * that we have it available when fit type of free area is
+	 * NE_FIT_TYPE.
+	 *
+	 * The preload is done in non-atomic context, thus it allows us
+	 * to use more permissive allocation masks to be more stable under
+	 * low memory condition and high memory pressure.
+	 *
+	 * Even if it fails we do not really care about that. Just proceed
+	 * as it is. "overflow" path will refill the cache we allocate from.
+	 */
+	preempt_disable();
+	if (!__this_cpu_read(ne_fit_preload_node)) {
+		preempt_enable();
+		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
+		preempt_disable();
+
+		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
+			if (pva)
+				kmem_cache_free(vmap_area_cachep, pva);
+		}
+	}
+
 	spin_lock(&vmap_area_lock);
+	preempt_enable();
 
 	/*
 	 * If an allocation fails, the "vend" address is
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 3/4] mm/vmalloc.c: get rid of one single unlink_va() when merge
  2019-06-06 12:04 [PATCH v5 0/4] Some cleanups for the KVA/vmalloc Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 1/4] mm/vmalloc.c: remove "node" argument Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 2/4] mm/vmalloc.c: preload a CPU with one object for split purpose Uladzislau Rezki (Sony)
@ 2019-06-06 12:04 ` Uladzislau Rezki (Sony)
  2019-06-06 12:04 ` [PATCH v5 4/4] mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va() Uladzislau Rezki (Sony)
  3 siblings, 0 replies; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-06-06 12:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roman Gushchin, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

It does not make sense to try to "unlink" the node that is definitely not
linked with a list nor tree.  On the first merge step VA just points to
the previously disconnected busy area.

On the second step, check if the node has been merged and do "unlink" if
so, because now it points to an object that must be linked.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
---
 mm/vmalloc.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index fcda966589a6..a4bdf5fc3512 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -719,9 +719,6 @@ merge_or_add_vmap_area(struct vmap_area *va,
 			/* Check and update the tree if needed. */
 			augment_tree_propagate_from(sibling);
 
-			/* Remove this VA, it has been merged. */
-			unlink_va(va, root);
-
 			/* Free vmap_area object. */
 			kmem_cache_free(vmap_area_cachep, va);
 
@@ -746,12 +743,11 @@ merge_or_add_vmap_area(struct vmap_area *va,
 			/* Check and update the tree if needed. */
 			augment_tree_propagate_from(sibling);
 
-			/* Remove this VA, it has been merged. */
-			unlink_va(va, root);
+			if (merged)
+				unlink_va(va, root);
 
 			/* Free vmap_area object. */
 			kmem_cache_free(vmap_area_cachep, va);
-
 			return;
 		}
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 4/4] mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va()
  2019-06-06 12:04 [PATCH v5 0/4] Some cleanups for the KVA/vmalloc Uladzislau Rezki (Sony)
                   ` (2 preceding siblings ...)
  2019-06-06 12:04 ` [PATCH v5 3/4] mm/vmalloc.c: get rid of one single unlink_va() when merge Uladzislau Rezki (Sony)
@ 2019-06-06 12:04 ` Uladzislau Rezki (Sony)
  3 siblings, 0 replies; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-06-06 12:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roman Gushchin, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

Trigger a warning if an object that is about to be freed is detached.
We used to have a BUG_ON(), but even though it is considered as faulty
behaviour that is not a good reason to break a system.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a4bdf5fc3512..899a250e4eb6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -534,20 +534,17 @@ link_va(struct vmap_area *va, struct rb_root *root,
 static __always_inline void
 unlink_va(struct vmap_area *va, struct rb_root *root)
 {
-	/*
-	 * During merging a VA node can be empty, therefore
-	 * not linked with the tree nor list. Just check it.
-	 */
-	if (!RB_EMPTY_NODE(&va->rb_node)) {
-		if (root == &free_vmap_area_root)
-			rb_erase_augmented(&va->rb_node,
-				root, &free_vmap_area_rb_augment_cb);
-		else
-			rb_erase(&va->rb_node, root);
+	if (WARN_ON(RB_EMPTY_NODE(&va->rb_node)))
+		return;
 
-		list_del(&va->list);
-		RB_CLEAR_NODE(&va->rb_node);
-	}
+	if (root == &free_vmap_area_root)
+		rb_erase_augmented(&va->rb_node,
+			root, &free_vmap_area_rb_augment_cb);
+	else
+		rb_erase(&va->rb_node, root);
+
+	list_del(&va->list);
+	RB_CLEAR_NODE(&va->rb_node);
 }
 
 #if DEBUG_AUGMENT_PROPAGATE_CHECK
@@ -1162,8 +1159,6 @@ EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier);
 
 static void __free_vmap_area(struct vmap_area *va)
 {
-	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
-
 	/*
 	 * Remove from the busy tree/list.
 	 */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-06 12:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06 12:04 [PATCH v5 0/4] Some cleanups for the KVA/vmalloc Uladzislau Rezki (Sony)
2019-06-06 12:04 ` [PATCH v5 1/4] mm/vmalloc.c: remove "node" argument Uladzislau Rezki (Sony)
2019-06-06 12:04 ` [PATCH v5 2/4] mm/vmalloc.c: preload a CPU with one object for split purpose Uladzislau Rezki (Sony)
2019-06-06 12:04 ` [PATCH v5 3/4] mm/vmalloc.c: get rid of one single unlink_va() when merge Uladzislau Rezki (Sony)
2019-06-06 12:04 ` [PATCH v5 4/4] mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va() Uladzislau Rezki (Sony)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).