From: Joonsoo Kim <iamjoonsoo.kim@lge.com> To: Anton Blanchard <anton@samba.org> Cc: benh@kernel.crashing.org, paulus@samba.org, cl@linux-foundation.org, penberg@kernel.org, mpm@selenic.com, nacc@linux.vnet.ibm.com, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Date: Tue, 7 Jan 2014 16:41:36 +0900 [thread overview] Message-ID: <20140107074136.GA4011@lge.com> (raw) In-Reply-To: <20140107132100.5b5ad198@kryten> On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote: > > We noticed a huge amount of slab memory consumed on a large ppc64 box: > > Slab: 2094336 kB > > Almost 2GB. This box is not balanced and some nodes do not have local > memory, causing slub to be very inefficient in its slab usage. > > Each time we call kmem_cache_alloc_node slub checks the per cpu slab, > sees it isn't node local, deactivates it and tries to allocate a new > slab. On empty nodes we will allocate a new remote slab and use the > first slot, but as explained above when we get called a second time > we will just deactivate that slab and retry. > > As such we end up only using 1 entry in each slab: > > slab mem objects > used active > ------------------------------------ > kmalloc-16384 1404 MB 4.90% > task_struct 668 MB 2.90% > kmalloc-128 193 MB 3.61% > kmalloc-192 152 MB 5.23% > kmalloc-8192 72 MB 23.40% > kmalloc-16 64 MB 7.43% > kmalloc-512 33 MB 22.41% > > The patch below checks that a node is not empty before deactivating a > slab and trying to allocate it again. With this patch applied we now > use about 352MB: > > Slab: 360192 kB > > And our efficiency is much better: > > slab mem objects > used active > ------------------------------------ > kmalloc-16384 92 MB 74.27% > task_struct 23 MB 83.46% > idr_layer_cache 18 MB 100.00% > pgtable-2^12 17 MB 100.00% > kmalloc-65536 15 MB 100.00% > inode_cache 14 MB 100.00% > kmalloc-256 14 MB 97.81% > kmalloc-8192 14 MB 85.71% > > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > > Thoughts? It seems like we could hit a similar situation if a machine > is balanced but we run out of memory on a single node. > > Index: b/mm/slub.c > =================================================================== > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2278,10 +2278,17 @@ redo: > > if (unlikely(!node_match(page, node))) { > stat(s, ALLOC_NODE_MISMATCH); > - deactivate_slab(s, page, c->freelist); > - c->page = NULL; > - c->freelist = NULL; > - goto new_slab; > + > + /* > + * If the node contains no memory there is no point in trying > + * to allocate a new node local slab > + */ > + if (node_spanned_pages(node)) { > + deactivate_slab(s, page, c->freelist); > + c->page = NULL; > + c->freelist = NULL; > + goto new_slab; > + } > } > > /* Hello, I think that we need more efforts to solve unbalanced node problem. With this patch, even if node of current cpu slab is not favorable to unbalanced node, allocation would proceed and we would get the unintended memory. And there is one more problem. Even if we have some partial slabs on compatible node, we would allocate new slab, because get_partial() cannot handle this unbalance node case. To fix this correctly, how about following patch? Thanks. ------------->8-------------------- diff --git a/mm/slub.c b/mm/slub.c index c3eb3d3..a1f6dfa 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, { void *object; int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; + struct zonelist *zonelist; + struct zoneref *z; + struct zone *zone; + enum zone_type high_zoneidx = gfp_zone(flags); + if (!node_present_pages(searchnode)) { + zonelist = node_zonelist(searchnode, flags); + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { + searchnode = zone_to_nid(zone); + if (node_present_pages(searchnode)) + break; + } + } object = get_partial_node(s, get_node(s, searchnode), c, flags); if (object || node != NUMA_NO_NODE) return object; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Joonsoo Kim <iamjoonsoo.kim@lge.com> To: Anton Blanchard <anton@samba.org> Cc: cl@linux-foundation.org, nacc@linux.vnet.ibm.com, penberg@kernel.org, linux-mm@kvack.org, paulus@samba.org, mpm@selenic.com, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Date: Tue, 7 Jan 2014 16:41:36 +0900 [thread overview] Message-ID: <20140107074136.GA4011@lge.com> (raw) In-Reply-To: <20140107132100.5b5ad198@kryten> On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote: > > We noticed a huge amount of slab memory consumed on a large ppc64 box: > > Slab: 2094336 kB > > Almost 2GB. This box is not balanced and some nodes do not have local > memory, causing slub to be very inefficient in its slab usage. > > Each time we call kmem_cache_alloc_node slub checks the per cpu slab, > sees it isn't node local, deactivates it and tries to allocate a new > slab. On empty nodes we will allocate a new remote slab and use the > first slot, but as explained above when we get called a second time > we will just deactivate that slab and retry. > > As such we end up only using 1 entry in each slab: > > slab mem objects > used active > ------------------------------------ > kmalloc-16384 1404 MB 4.90% > task_struct 668 MB 2.90% > kmalloc-128 193 MB 3.61% > kmalloc-192 152 MB 5.23% > kmalloc-8192 72 MB 23.40% > kmalloc-16 64 MB 7.43% > kmalloc-512 33 MB 22.41% > > The patch below checks that a node is not empty before deactivating a > slab and trying to allocate it again. With this patch applied we now > use about 352MB: > > Slab: 360192 kB > > And our efficiency is much better: > > slab mem objects > used active > ------------------------------------ > kmalloc-16384 92 MB 74.27% > task_struct 23 MB 83.46% > idr_layer_cache 18 MB 100.00% > pgtable-2^12 17 MB 100.00% > kmalloc-65536 15 MB 100.00% > inode_cache 14 MB 100.00% > kmalloc-256 14 MB 97.81% > kmalloc-8192 14 MB 85.71% > > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > > Thoughts? It seems like we could hit a similar situation if a machine > is balanced but we run out of memory on a single node. > > Index: b/mm/slub.c > =================================================================== > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2278,10 +2278,17 @@ redo: > > if (unlikely(!node_match(page, node))) { > stat(s, ALLOC_NODE_MISMATCH); > - deactivate_slab(s, page, c->freelist); > - c->page = NULL; > - c->freelist = NULL; > - goto new_slab; > + > + /* > + * If the node contains no memory there is no point in trying > + * to allocate a new node local slab > + */ > + if (node_spanned_pages(node)) { > + deactivate_slab(s, page, c->freelist); > + c->page = NULL; > + c->freelist = NULL; > + goto new_slab; > + } > } > > /* Hello, I think that we need more efforts to solve unbalanced node problem. With this patch, even if node of current cpu slab is not favorable to unbalanced node, allocation would proceed and we would get the unintended memory. And there is one more problem. Even if we have some partial slabs on compatible node, we would allocate new slab, because get_partial() cannot handle this unbalance node case. To fix this correctly, how about following patch? Thanks. ------------->8-------------------- diff --git a/mm/slub.c b/mm/slub.c index c3eb3d3..a1f6dfa 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, { void *object; int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; + struct zonelist *zonelist; + struct zoneref *z; + struct zone *zone; + enum zone_type high_zoneidx = gfp_zone(flags); + if (!node_present_pages(searchnode)) { + zonelist = node_zonelist(searchnode, flags); + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { + searchnode = zone_to_nid(zone); + if (node_present_pages(searchnode)) + break; + } + } object = get_partial_node(s, get_node(s, searchnode), c, flags); if (object || node != NUMA_NO_NODE) return object;
next prev parent reply other threads:[~2014-01-07 7:41 UTC|newest] Thread overview: 229+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-01-07 2:21 [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Anton Blanchard 2014-01-07 2:21 ` Anton Blanchard 2014-01-07 4:19 ` Wanpeng Li 2014-01-07 4:19 ` Wanpeng Li 2014-01-07 4:19 ` Wanpeng Li 2014-01-08 14:17 ` Anton Blanchard 2014-01-08 14:17 ` Anton Blanchard 2014-01-07 4:19 ` Wanpeng Li 2014-01-07 6:49 ` Andi Kleen 2014-01-07 6:49 ` Andi Kleen 2014-01-08 14:03 ` Anton Blanchard 2014-01-08 14:03 ` Anton Blanchard 2014-01-07 7:41 ` Joonsoo Kim [this message] 2014-01-07 7:41 ` Joonsoo Kim 2014-01-07 8:48 ` Wanpeng Li 2014-01-07 8:48 ` Wanpeng Li 2014-01-07 8:48 ` Wanpeng Li 2014-01-07 8:48 ` Wanpeng Li 2014-01-07 9:10 ` Joonsoo Kim 2014-01-07 9:10 ` Joonsoo Kim 2014-01-07 9:21 ` Wanpeng Li 2014-01-07 9:21 ` Wanpeng Li 2014-01-07 9:31 ` Joonsoo Kim 2014-01-07 9:31 ` Joonsoo Kim 2014-01-07 9:49 ` Wanpeng Li 2014-01-07 9:49 ` Wanpeng Li 2014-01-07 9:49 ` Wanpeng Li 2014-01-07 9:49 ` Wanpeng Li 2014-01-07 9:21 ` Wanpeng Li 2014-01-07 9:21 ` Wanpeng Li 2014-01-07 9:52 ` Wanpeng Li 2014-01-07 9:52 ` Wanpeng Li 2014-01-07 9:52 ` Wanpeng Li 2014-01-09 0:20 ` Joonsoo Kim 2014-01-09 0:20 ` Joonsoo Kim 2014-01-07 9:52 ` Wanpeng Li 2014-01-20 9:10 ` Wanpeng Li 2014-01-20 9:10 ` Wanpeng Li 2014-01-20 9:10 ` Wanpeng Li 2014-01-20 9:10 ` Wanpeng Li [not found] ` <52dce7fe.e5e6420a.5ff6.ffff84a0SMTPIN_ADDED_BROKEN@mx.google.com> 2014-01-20 22:13 ` Christoph Lameter 2014-01-20 22:13 ` Christoph Lameter 2014-01-21 2:20 ` Wanpeng Li 2014-01-21 2:20 ` Wanpeng Li 2014-01-21 2:20 ` Wanpeng Li 2014-01-21 2:20 ` Wanpeng Li 2014-01-24 3:09 ` Wanpeng Li 2014-01-24 3:09 ` Wanpeng Li 2014-01-24 3:09 ` Wanpeng Li 2014-01-24 3:09 ` Wanpeng Li 2014-01-24 3:14 ` Wanpeng Li 2014-01-24 3:14 ` Wanpeng Li 2014-01-24 3:14 ` Wanpeng Li 2014-01-24 3:14 ` Wanpeng Li [not found] ` <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com> 2014-01-24 15:50 ` Christoph Lameter 2014-01-24 15:50 ` Christoph Lameter 2014-01-24 21:03 ` David Rientjes 2014-01-24 21:03 ` David Rientjes 2014-01-24 22:19 ` Nishanth Aravamudan 2014-01-24 22:19 ` Nishanth Aravamudan 2014-01-24 23:29 ` Nishanth Aravamudan 2014-01-24 23:29 ` Nishanth Aravamudan 2014-01-24 23:49 ` David Rientjes 2014-01-24 23:49 ` David Rientjes 2014-01-25 0:16 ` Nishanth Aravamudan 2014-01-25 0:16 ` Nishanth Aravamudan 2014-01-25 0:25 ` David Rientjes 2014-01-25 0:25 ` David Rientjes 2014-01-25 1:10 ` Nishanth Aravamudan 2014-01-25 1:10 ` Nishanth Aravamudan 2014-01-27 5:58 ` Joonsoo Kim 2014-01-27 5:58 ` Joonsoo Kim 2014-01-28 18:29 ` Nishanth Aravamudan 2014-01-28 18:29 ` Nishanth Aravamudan 2014-01-29 15:54 ` Christoph Lameter 2014-01-29 15:54 ` Christoph Lameter 2014-01-29 22:36 ` Nishanth Aravamudan 2014-01-29 22:36 ` Nishanth Aravamudan 2014-01-30 16:26 ` Christoph Lameter 2014-01-30 16:26 ` Christoph Lameter 2014-02-03 23:00 ` Nishanth Aravamudan 2014-02-03 23:00 ` Nishanth Aravamudan 2014-02-04 3:38 ` Christoph Lameter 2014-02-04 3:38 ` Christoph Lameter 2014-02-04 7:26 ` Nishanth Aravamudan 2014-02-04 7:26 ` Nishanth Aravamudan 2014-02-04 20:39 ` Christoph Lameter 2014-02-04 20:39 ` Christoph Lameter 2014-02-05 0:13 ` Nishanth Aravamudan 2014-02-05 0:13 ` Nishanth Aravamudan 2014-02-05 19:28 ` Christoph Lameter 2014-02-05 19:28 ` Christoph Lameter 2014-02-06 2:08 ` Nishanth Aravamudan 2014-02-06 2:08 ` Nishanth Aravamudan 2014-02-06 17:25 ` Christoph Lameter 2014-02-06 17:25 ` Christoph Lameter 2014-01-27 16:18 ` Christoph Lameter 2014-01-27 16:18 ` Christoph Lameter 2014-02-06 2:07 ` Nishanth Aravamudan 2014-02-06 2:07 ` Nishanth Aravamudan 2014-02-06 8:04 ` Joonsoo Kim 2014-02-06 8:04 ` Joonsoo Kim [not found] ` <20140206185955.GA7845@linux.vnet.ibm.com> 2014-02-06 19:28 ` Nishanth Aravamudan 2014-02-06 19:28 ` Nishanth Aravamudan 2014-02-07 8:03 ` Joonsoo Kim 2014-02-07 8:03 ` Joonsoo Kim 2014-02-06 8:07 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() Joonsoo Kim 2014-02-06 8:07 ` Joonsoo Kim 2014-02-06 8:07 ` [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node Joonsoo Kim 2014-02-06 8:07 ` Joonsoo Kim 2014-02-06 8:52 ` David Rientjes 2014-02-06 8:52 ` David Rientjes 2014-02-06 10:29 ` Joonsoo Kim 2014-02-06 10:29 ` Joonsoo Kim 2014-02-06 19:11 ` Nishanth Aravamudan 2014-02-06 19:11 ` Nishanth Aravamudan 2014-02-07 5:42 ` Joonsoo Kim 2014-02-07 5:42 ` Joonsoo Kim 2014-02-06 20:52 ` David Rientjes 2014-02-06 20:52 ` David Rientjes 2014-02-07 5:48 ` Joonsoo Kim 2014-02-07 5:48 ` Joonsoo Kim 2014-02-07 17:53 ` Christoph Lameter 2014-02-07 17:53 ` Christoph Lameter 2014-02-07 18:51 ` Christoph Lameter 2014-02-07 18:51 ` Christoph Lameter 2014-02-07 21:38 ` Nishanth Aravamudan 2014-02-07 21:38 ` Nishanth Aravamudan 2014-02-10 1:15 ` Joonsoo Kim 2014-02-10 1:15 ` Joonsoo Kim 2014-02-10 1:29 ` Joonsoo Kim 2014-02-10 1:29 ` Joonsoo Kim 2014-02-11 18:45 ` Christoph Lameter 2014-02-11 18:45 ` Christoph Lameter 2014-02-10 19:13 ` Nishanth Aravamudan 2014-02-10 19:13 ` Nishanth Aravamudan 2014-02-11 7:42 ` Joonsoo Kim 2014-02-11 7:42 ` Joonsoo Kim 2014-02-12 22:16 ` Christoph Lameter 2014-02-12 22:16 ` Christoph Lameter 2014-02-13 3:53 ` Nishanth Aravamudan 2014-02-13 3:53 ` Nishanth Aravamudan 2014-02-17 6:52 ` Joonsoo Kim 2014-02-17 6:52 ` Joonsoo Kim 2014-02-18 16:38 ` Christoph Lameter 2014-02-18 16:38 ` Christoph Lameter 2014-02-19 22:04 ` David Rientjes 2014-02-19 22:04 ` David Rientjes 2014-02-20 16:02 ` Christoph Lameter 2014-02-20 16:02 ` Christoph Lameter 2014-02-24 5:08 ` Joonsoo Kim 2014-02-24 5:08 ` Joonsoo Kim 2014-02-24 19:54 ` Christoph Lameter 2014-02-24 19:54 ` Christoph Lameter 2014-03-13 16:51 ` Nishanth Aravamudan 2014-03-13 16:51 ` Nishanth Aravamudan 2014-02-18 17:22 ` Nishanth Aravamudan 2014-02-18 17:22 ` Nishanth Aravamudan 2014-02-13 6:51 ` Nishanth Aravamudan 2014-02-13 6:51 ` Nishanth Aravamudan 2014-02-17 7:00 ` Joonsoo Kim 2014-02-17 7:00 ` Joonsoo Kim 2014-02-18 16:57 ` Christoph Lameter 2014-02-18 16:57 ` Christoph Lameter 2014-02-18 17:28 ` Nishanth Aravamudan 2014-02-18 17:28 ` Nishanth Aravamudan 2014-02-18 19:58 ` Christoph Lameter 2014-02-18 19:58 ` Christoph Lameter 2014-02-18 21:09 ` Nishanth Aravamudan 2014-02-18 21:09 ` Nishanth Aravamudan 2014-02-18 21:49 ` Christoph Lameter 2014-02-18 21:49 ` Christoph Lameter 2014-02-18 22:22 ` Nishanth Aravamudan 2014-02-18 22:22 ` Nishanth Aravamudan 2014-02-19 16:11 ` Christoph Lameter 2014-02-19 16:11 ` Christoph Lameter 2014-02-19 22:03 ` David Rientjes 2014-02-19 22:03 ` David Rientjes 2014-02-08 9:57 ` David Rientjes 2014-02-08 9:57 ` David Rientjes 2014-02-10 1:09 ` Joonsoo Kim 2014-02-10 1:09 ` Joonsoo Kim 2014-07-22 1:03 ` Nishanth Aravamudan 2014-07-22 1:03 ` Nishanth Aravamudan 2014-07-22 1:16 ` David Rientjes 2014-07-22 1:16 ` David Rientjes 2014-07-22 21:43 ` Nishanth Aravamudan 2014-07-22 21:43 ` Nishanth Aravamudan 2014-07-22 21:49 ` Tejun Heo 2014-07-22 21:49 ` Tejun Heo 2014-07-22 23:47 ` Nishanth Aravamudan 2014-07-22 23:47 ` Nishanth Aravamudan 2014-07-23 0:43 ` David Rientjes 2014-07-23 0:43 ` David Rientjes 2014-02-06 8:07 ` [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node Joonsoo Kim 2014-02-06 8:07 ` Joonsoo Kim 2014-02-06 17:30 ` Christoph Lameter 2014-02-06 17:30 ` Christoph Lameter 2014-02-07 5:41 ` Joonsoo Kim 2014-02-07 5:41 ` Joonsoo Kim 2014-02-07 17:49 ` Christoph Lameter 2014-02-07 17:49 ` Christoph Lameter 2014-02-10 1:22 ` Joonsoo Kim 2014-02-10 1:22 ` Joonsoo Kim 2014-02-06 8:37 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() David Rientjes 2014-02-06 8:37 ` David Rientjes 2014-02-06 17:31 ` Christoph Lameter 2014-02-06 17:31 ` Christoph Lameter 2014-02-06 17:26 ` Christoph Lameter 2014-02-06 17:26 ` Christoph Lameter 2014-05-16 23:37 ` Nishanth Aravamudan 2014-05-16 23:37 ` Nishanth Aravamudan 2014-05-19 2:41 ` Joonsoo Kim 2014-05-19 2:41 ` Joonsoo Kim 2014-06-05 0:13 ` [RESEND PATCH] " David Rientjes 2014-06-05 0:13 ` David Rientjes 2014-06-05 0:13 ` David Rientjes 2014-01-27 16:24 ` [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Christoph Lameter 2014-01-27 16:24 ` Christoph Lameter 2014-01-27 16:16 ` Christoph Lameter 2014-01-27 16:16 ` Christoph Lameter 2014-01-07 9:42 ` David Laight 2014-01-07 9:42 ` David Laight 2014-01-08 14:14 ` Anton Blanchard 2014-01-08 14:14 ` Anton Blanchard 2014-01-07 10:28 ` Wanpeng Li 2014-01-07 10:28 ` Wanpeng Li 2014-01-07 10:28 ` Wanpeng Li 2014-01-07 10:28 ` Wanpeng Li
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20140107074136.GA4011@lge.com \ --to=iamjoonsoo.kim@lge.com \ --cc=anton@samba.org \ --cc=benh@kernel.crashing.org \ --cc=cl@linux-foundation.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpm@selenic.com \ --cc=nacc@linux.vnet.ibm.com \ --cc=paulus@samba.org \ --cc=penberg@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.