linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chen Jun <chenjun102@huawei.com>
To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<cl@linux.com>, <penberg@kernel.org>, <rientjes@google.com>,
	<iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>,
	<vbabka@suse.cz>, <roman.gushchin@linux.dev>,
	<42.hyeyoo@gmail.com>
Cc: <xuqiang36@huawei.com>, <chenjun102@huawei.com>,
	<wangkefeng.wang@huawei.com>
Subject: [PATCH v2] mm/slub: Reduce memory consumption in extreme scenarios
Date: Sat, 30 Mar 2024 16:23:35 +0800	[thread overview]
Message-ID: <20240330082335.29710-1-chenjun102@huawei.com> (raw)

When kmalloc_node() is called without __GFP_THISNODE and the target node
lacks sufficient memory, SLUB allocates a folio from a different node
other than the requested node, instead of taking a partial slab from it.

However, since the allocated folio does not belong to the requested
node, it is deactivated and added to the partial slab list of the node
it belongs to.

This behavior can result in excessive memory usage when the requested
node has insufficient memory, as SLUB will repeatedly allocate folios
from other nodes without reusing the previously allocated ones.

To prevent memory wastage,
when (node != NUMA_NO_NODE) && !(gfpflags & __GFP_THISNODE) is,
1) try to get a partial slab from target node with GFP_NOWAIT |
   __GFP_THISNODE opportunistically.
2) if 1) failed, try to allocate a new slab from target node with
   GFP_NOWAIT | __GFP_THISNODE opportunistically too.
3) if 2) failed, retry 1) and 2) with orignal gfpflags.

when node != NUMA_NO_NODE || (gfpflags & __GFP_THISNODE), the behavior
remains unchanged.

On qemu with 4 numa nodes and each numa has 1G memory. Write a test ko
to call kmalloc_node(196, GFP_KERNEL, 3) for (4 * 1024 + 4) * 1024 times.

cat /proc/slabinfo shows:
kmalloc-256       4200530 13519712    256   32    2 : tunables..

after this patch,
cat /proc/slabinfo shows:
kmalloc-256       4200558 4200768    256   32    2 : tunables..

Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
v2: 
- try to alloc partial slab or new slab with GFP_NOWAIT(it includes
  __GFP_NOWARN) opportunistically, then fallback to orignal gfpflag,
  suggested by Vlastimil Babka,
- update changelog

v1: https://lore.kernel.org/linux-mm/20230314123403.100158-1-chenjun102@huawei.com/

 mm/slub.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1bb2a93cf7b6..c1c51595a59f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2699,7 +2699,7 @@ static struct slab *get_partial(struct kmem_cache *s, int node,
 		searchnode = numa_mem_id();
 
 	slab = get_partial_node(s, get_node(s, searchnode), pc);
-	if (slab || node != NUMA_NO_NODE)
+	if (slab || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
 		return slab;
 
 	return get_any_partial(s, pc);
@@ -3375,6 +3375,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	struct slab *slab;
 	unsigned long flags;
 	struct partial_context pc;
+	bool try_thisnode = true;
 
 	stat(s, ALLOC_SLOWPATH);
 
@@ -3501,6 +3502,17 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 new_objects:
 
 	pc.flags = gfpflags;
+	/*
+	 * when (node != NUMA_NO_NODE) && !(gfpflags & __GFP_THISNODE)
+	 * 1) try to get a partial slab from target node with GPF_NOWAIT |
+	 *    __GFP_THISNODE opportunistically.
+	 * 2) if 1) failed, try to allocate a new slab from target node with
+	 *    GPF_NOWAIT | __GFP_THISNODE opportunistically too.
+	 * 3) if 2) failed, retry 1) and 2) with original gfpflags.
+	 */
+	if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode)
+		pc.flags = GFP_NOWAIT | __GFP_THISNODE;
+
 	pc.orig_size = orig_size;
 	slab = get_partial(s, node, &pc);
 	if (slab) {
@@ -3522,10 +3534,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	}
 
 	slub_put_cpu_ptr(s->cpu_slab);
-	slab = new_slab(s, gfpflags, node);
+	slab = new_slab(s, pc.flags, node);
 	c = slub_get_cpu_ptr(s->cpu_slab);
 
 	if (unlikely(!slab)) {
+		if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) &&
+		    try_thisnode) {
+			try_thisnode = false;
+			goto new_objects;
+		}
 		slab_out_of_memory(s, gfpflags, node);
 		return NULL;
 	}
-- 
2.27.0



             reply	other threads:[~2024-03-30  8:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-30  8:23 Chen Jun [this message]
2024-04-02 16:08 ` [PATCH v2] mm/slub: Reduce memory consumption in extreme scenarios Vlastimil Babka
2024-04-05 16:50 ` Christoph Lameter (Ampere)
2024-04-08 13:17   ` Vlastimil Babka
2024-04-08 18:17     ` Christoph Lameter (Ampere)
2024-04-09  6:16       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240330082335.29710-1-chenjun102@huawei.com \
    --to=chenjun102@huawei.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=xuqiang36@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).