From: Nick Piggin <npiggin@suse.de> To: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi>, Andi Kleen <andi@firstfloor.org>, Christoph Lameter <cl@linux-foundation.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, haicheng.li@intel.com, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Subject: Re: [patch] slab: add memory hotplug support Date: Wed, 10 Mar 2010 00:46:33 +1100 [thread overview] Message-ID: <20100309134633.GM8653@laptop> (raw) In-Reply-To: <alpine.DEB.2.00.1003081502400.30456@chino.kir.corp.google.com> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote: > On Fri, 5 Mar 2010, Nick Piggin wrote: > > > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG) > > > +/* > > > + * Drains and frees nodelists for a node on each slab cache, used for memory > > > + * hotplug. Returns -EBUSY if all objects cannot be drained on memory > > > + * hot-remove so that the node is not removed. When used because memory > > > + * hot-add is canceled, the only result is the freed kmem_list3. > > > + * > > > + * Must hold cache_chain_mutex. > > > + */ > > > +static int __meminit free_cache_nodelists_node(int node) > > > +{ > > > + struct kmem_cache *cachep; > > > + int ret = 0; > > > + > > > + list_for_each_entry(cachep, &cache_chain, next) { > > > + struct array_cache *shared; > > > + struct array_cache **alien; > > > + struct kmem_list3 *l3; > > > + > > > + l3 = cachep->nodelists[node]; > > > + if (!l3) > > > + continue; > > > + > > > + spin_lock_irq(&l3->list_lock); > > > + shared = l3->shared; > > > + if (shared) { > > > + free_block(cachep, shared->entry, shared->avail, node); > > > + l3->shared = NULL; > > > + } > > > + alien = l3->alien; > > > + l3->alien = NULL; > > > + spin_unlock_irq(&l3->list_lock); > > > + > > > + if (alien) { > > > + drain_alien_cache(cachep, alien); > > > + free_alien_cache(alien); > > > + } > > > + kfree(shared); > > > + > > > + drain_freelist(cachep, l3, l3->free_objects); > > > + if (!list_empty(&l3->slabs_full) || > > > + !list_empty(&l3->slabs_partial)) { > > > + /* > > > + * Continue to iterate through each slab cache to free > > > + * as many nodelists as possible even though the > > > + * offline will be canceled. > > > + */ > > > + ret = -EBUSY; > > > + continue; > > > + } > > > + kfree(l3); > > > + cachep->nodelists[node] = NULL; > > > > What's stopping races of other CPUs trying to access l3 and array > > caches while they're being freed? > > > > numa_node_id() will not return an offlined nodeid and cache_alloc_node() > already does a fallback to other onlined nodes in case a nodeid is passed > to kmalloc_node() that does not have a nodelist. l3->shared and l3->alien > cannot be accessed without l3->list_lock (drain, cache_alloc_refill, > cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap). Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here) while it is being accessed by another CPU and concurrently being freed on this one? > > > + } > > > + return ret; > > > +} > > > + > > > +/* > > > + * Onlines nid either as the result of memory hot-add or canceled hot-remove. > > > + */ > > > +static int __meminit slab_node_online(int nid) > > > +{ > > > + int ret; > > > + mutex_lock(&cache_chain_mutex); > > > + ret = init_cache_nodelists_node(nid); > > > + mutex_unlock(&cache_chain_mutex); > > > + return ret; > > > +} > > > + > > > +/* > > > + * Offlines nid either as the result of memory hot-remove or canceled hot-add. > > > + */ > > > +static int __meminit slab_node_offline(int nid) > > > +{ > > > + int ret; > > > + mutex_lock(&cache_chain_mutex); > > > + ret = free_cache_nodelists_node(nid); > > > + mutex_unlock(&cache_chain_mutex); > > > + return ret; > > > +} > > > + > > > +static int __meminit slab_memory_callback(struct notifier_block *self, > > > + unsigned long action, void *arg) > > > +{ > > > + struct memory_notify *mnb = arg; > > > + int ret = 0; > > > + int nid; > > > + > > > + nid = mnb->status_change_nid; > > > + if (nid < 0) > > > + goto out; > > > + > > > + switch (action) { > > > + case MEM_GOING_ONLINE: > > > + case MEM_CANCEL_OFFLINE: > > > + ret = slab_node_online(nid); > > > + break; > > > > This would explode if CANCEL_OFFLINE fails. Call it theoretical and > > put a panic() in here and I don't mind. Otherwise you get corruption > > somewhere in the slab code. > > > > MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be > allocated anywhere on the system and if that happens then the node simply > couldn't be allocated from (numa_node_id() would never return it as the > cpu's node, so it's possible to fallback in this scenario). Why would it never return the CPU's node? It's CANCEL_OFFLINE that is the problem. > Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of > the array caches and the nodelist until MEM_OFFLINE. We're guaranteed > that all pages are freed at that point so there are no existing objects > that we need to track and then if the offline fails from a different > callback it would be possible to reset the l3->nodelists[node] pointers > since they haven't been freed yet.
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de> To: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi>, Andi Kleen <andi@firstfloor.org>, Christoph Lameter <cl@linux-foundation.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, haicheng.li@intel.com, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Subject: Re: [patch] slab: add memory hotplug support Date: Wed, 10 Mar 2010 00:46:33 +1100 [thread overview] Message-ID: <20100309134633.GM8653@laptop> (raw) In-Reply-To: <alpine.DEB.2.00.1003081502400.30456@chino.kir.corp.google.com> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote: > On Fri, 5 Mar 2010, Nick Piggin wrote: > > > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG) > > > +/* > > > + * Drains and frees nodelists for a node on each slab cache, used for memory > > > + * hotplug. Returns -EBUSY if all objects cannot be drained on memory > > > + * hot-remove so that the node is not removed. When used because memory > > > + * hot-add is canceled, the only result is the freed kmem_list3. > > > + * > > > + * Must hold cache_chain_mutex. > > > + */ > > > +static int __meminit free_cache_nodelists_node(int node) > > > +{ > > > + struct kmem_cache *cachep; > > > + int ret = 0; > > > + > > > + list_for_each_entry(cachep, &cache_chain, next) { > > > + struct array_cache *shared; > > > + struct array_cache **alien; > > > + struct kmem_list3 *l3; > > > + > > > + l3 = cachep->nodelists[node]; > > > + if (!l3) > > > + continue; > > > + > > > + spin_lock_irq(&l3->list_lock); > > > + shared = l3->shared; > > > + if (shared) { > > > + free_block(cachep, shared->entry, shared->avail, node); > > > + l3->shared = NULL; > > > + } > > > + alien = l3->alien; > > > + l3->alien = NULL; > > > + spin_unlock_irq(&l3->list_lock); > > > + > > > + if (alien) { > > > + drain_alien_cache(cachep, alien); > > > + free_alien_cache(alien); > > > + } > > > + kfree(shared); > > > + > > > + drain_freelist(cachep, l3, l3->free_objects); > > > + if (!list_empty(&l3->slabs_full) || > > > + !list_empty(&l3->slabs_partial)) { > > > + /* > > > + * Continue to iterate through each slab cache to free > > > + * as many nodelists as possible even though the > > > + * offline will be canceled. > > > + */ > > > + ret = -EBUSY; > > > + continue; > > > + } > > > + kfree(l3); > > > + cachep->nodelists[node] = NULL; > > > > What's stopping races of other CPUs trying to access l3 and array > > caches while they're being freed? > > > > numa_node_id() will not return an offlined nodeid and cache_alloc_node() > already does a fallback to other onlined nodes in case a nodeid is passed > to kmalloc_node() that does not have a nodelist. l3->shared and l3->alien > cannot be accessed without l3->list_lock (drain, cache_alloc_refill, > cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap). Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here) while it is being accessed by another CPU and concurrently being freed on this one? > > > + } > > > + return ret; > > > +} > > > + > > > +/* > > > + * Onlines nid either as the result of memory hot-add or canceled hot-remove. > > > + */ > > > +static int __meminit slab_node_online(int nid) > > > +{ > > > + int ret; > > > + mutex_lock(&cache_chain_mutex); > > > + ret = init_cache_nodelists_node(nid); > > > + mutex_unlock(&cache_chain_mutex); > > > + return ret; > > > +} > > > + > > > +/* > > > + * Offlines nid either as the result of memory hot-remove or canceled hot-add. > > > + */ > > > +static int __meminit slab_node_offline(int nid) > > > +{ > > > + int ret; > > > + mutex_lock(&cache_chain_mutex); > > > + ret = free_cache_nodelists_node(nid); > > > + mutex_unlock(&cache_chain_mutex); > > > + return ret; > > > +} > > > + > > > +static int __meminit slab_memory_callback(struct notifier_block *self, > > > + unsigned long action, void *arg) > > > +{ > > > + struct memory_notify *mnb = arg; > > > + int ret = 0; > > > + int nid; > > > + > > > + nid = mnb->status_change_nid; > > > + if (nid < 0) > > > + goto out; > > > + > > > + switch (action) { > > > + case MEM_GOING_ONLINE: > > > + case MEM_CANCEL_OFFLINE: > > > + ret = slab_node_online(nid); > > > + break; > > > > This would explode if CANCEL_OFFLINE fails. Call it theoretical and > > put a panic() in here and I don't mind. Otherwise you get corruption > > somewhere in the slab code. > > > > MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be > allocated anywhere on the system and if that happens then the node simply > couldn't be allocated from (numa_node_id() would never return it as the > cpu's node, so it's possible to fallback in this scenario). Why would it never return the CPU's node? It's CANCEL_OFFLINE that is the problem. > Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of > the array caches and the nodelist until MEM_OFFLINE. We're guaranteed > that all pages are freed at that point so there are no existing objects > that we need to track and then if the offline fails from a different > callback it would be possible to reset the l3->nodelists[node] pointers > since they haven't been freed yet. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-09 13:46 UTC|newest] Thread overview: 170+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-02-11 20:53 [PATCH] [0/4] Update slab memory hotplug series Andi Kleen 2010-02-11 20:53 ` Andi Kleen 2010-02-11 20:54 ` [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2 Andi Kleen 2010-02-11 20:54 ` Andi Kleen 2010-02-11 21:41 ` David Rientjes 2010-02-11 21:41 ` David Rientjes 2010-02-11 21:55 ` Andi Kleen 2010-02-11 21:55 ` Andi Kleen 2010-02-15 6:04 ` Nick Piggin 2010-02-15 6:04 ` Nick Piggin 2010-02-15 10:07 ` Andi Kleen 2010-02-15 10:07 ` Andi Kleen 2010-02-15 10:22 ` Nick Piggin 2010-02-15 10:22 ` Nick Piggin 2010-02-11 20:54 ` [PATCH] [2/4] SLAB: Separate node initialization into separate function Andi Kleen 2010-02-11 20:54 ` Andi Kleen 2010-02-11 21:44 ` David Rientjes 2010-02-11 21:44 ` David Rientjes 2010-02-11 20:54 ` [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2 Andi Kleen 2010-02-11 20:54 ` Andi Kleen 2010-02-11 21:45 ` David Rientjes 2010-02-11 21:45 ` David Rientjes 2010-02-15 6:06 ` Nick Piggin 2010-02-15 6:06 ` Nick Piggin 2010-02-15 21:47 ` David Rientjes 2010-02-15 21:47 ` David Rientjes 2010-02-16 14:04 ` Nick Piggin 2010-02-16 14:04 ` Nick Piggin 2010-02-16 20:45 ` Pekka Enberg 2010-02-16 20:45 ` Pekka Enberg 2010-02-11 20:54 ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Andi Kleen 2010-02-11 20:54 ` Andi Kleen 2010-02-11 21:45 ` David Rientjes 2010-02-11 21:45 ` David Rientjes 2010-02-15 6:15 ` Nick Piggin 2010-02-15 6:15 ` Nick Piggin 2010-02-15 10:32 ` Andi Kleen 2010-02-15 10:32 ` Andi Kleen 2010-02-15 10:41 ` Nick Piggin 2010-02-15 10:41 ` Nick Piggin 2010-02-15 10:52 ` Andi Kleen 2010-02-15 10:52 ` Andi Kleen 2010-02-15 11:01 ` Nick Piggin 2010-02-15 11:01 ` Nick Piggin 2010-02-15 15:30 ` Andi Kleen 2010-02-15 15:30 ` Andi Kleen 2010-02-19 18:22 ` Christoph Lameter 2010-02-19 18:22 ` Christoph Lameter 2010-02-20 9:01 ` Andi Kleen 2010-02-20 9:01 ` Andi Kleen 2010-02-22 10:53 ` Pekka Enberg 2010-02-22 10:53 ` Pekka Enberg 2010-02-22 14:31 ` Andi Kleen 2010-02-22 14:31 ` Andi Kleen 2010-02-22 16:11 ` Pekka Enberg 2010-02-22 16:11 ` Pekka Enberg 2010-02-22 20:20 ` Andi Kleen 2010-02-22 20:20 ` Andi Kleen 2010-02-24 15:49 ` Christoph Lameter 2010-02-24 15:49 ` Christoph Lameter 2010-02-25 7:26 ` Pekka Enberg 2010-02-25 7:26 ` Pekka Enberg 2010-02-25 8:01 ` David Rientjes 2010-02-25 8:01 ` David Rientjes 2010-02-25 18:30 ` Christoph Lameter 2010-02-25 18:30 ` Christoph Lameter 2010-02-25 21:45 ` David Rientjes 2010-02-25 21:45 ` David Rientjes 2010-02-25 22:31 ` Christoph Lameter 2010-02-25 22:31 ` Christoph Lameter 2010-02-26 10:45 ` Pekka Enberg 2010-02-26 10:45 ` Pekka Enberg 2010-02-26 11:43 ` Andi Kleen 2010-02-26 11:43 ` Andi Kleen 2010-02-26 12:35 ` Pekka Enberg 2010-02-26 12:35 ` Pekka Enberg 2010-02-26 14:08 ` Andi Kleen 2010-02-26 14:08 ` Andi Kleen 2010-02-26 1:09 ` KAMEZAWA Hiroyuki 2010-02-26 1:09 ` KAMEZAWA Hiroyuki 2010-02-26 11:41 ` Andi Kleen 2010-02-26 11:41 ` Andi Kleen 2010-02-26 15:04 ` Christoph Lameter 2010-02-26 15:04 ` Christoph Lameter 2010-02-26 15:05 ` Christoph Lameter 2010-02-26 15:05 ` Christoph Lameter 2010-02-26 15:59 ` Andi Kleen 2010-02-26 15:59 ` Andi Kleen 2010-02-26 15:57 ` Andi Kleen 2010-02-26 15:57 ` Andi Kleen 2010-02-26 17:24 ` Christoph Lameter 2010-02-26 17:24 ` Christoph Lameter 2010-02-26 17:31 ` Andi Kleen 2010-02-26 17:31 ` Andi Kleen 2010-03-01 1:59 ` KAMEZAWA Hiroyuki 2010-03-01 1:59 ` KAMEZAWA Hiroyuki 2010-03-01 10:27 ` David Rientjes 2010-03-01 10:27 ` David Rientjes 2010-02-27 0:01 ` David Rientjes 2010-02-27 0:01 ` David Rientjes 2010-03-01 10:24 ` [patch] slab: add memory hotplug support David Rientjes 2010-03-01 10:24 ` David Rientjes 2010-03-02 5:53 ` Pekka Enberg 2010-03-02 5:53 ` Pekka Enberg 2010-03-02 20:20 ` Christoph Lameter 2010-03-02 20:20 ` Christoph Lameter 2010-03-02 21:03 ` David Rientjes 2010-03-02 21:03 ` David Rientjes 2010-03-03 1:28 ` KAMEZAWA Hiroyuki 2010-03-03 1:28 ` KAMEZAWA Hiroyuki 2010-03-03 2:39 ` David Rientjes 2010-03-03 2:39 ` David Rientjes 2010-03-03 2:51 ` KAMEZAWA Hiroyuki 2010-03-03 2:51 ` KAMEZAWA Hiroyuki 2010-03-02 12:53 ` Andi Kleen 2010-03-02 12:53 ` Andi Kleen 2010-03-02 15:04 ` Pekka Enberg 2010-03-02 15:04 ` Pekka Enberg 2010-03-03 14:34 ` Andi Kleen 2010-03-03 14:34 ` Andi Kleen 2010-03-03 15:46 ` Christoph Lameter 2010-03-03 15:46 ` Christoph Lameter 2010-03-02 21:17 ` David Rientjes 2010-03-02 21:17 ` David Rientjes 2010-03-05 6:20 ` Nick Piggin 2010-03-05 6:20 ` Nick Piggin 2010-03-05 12:47 ` Anca Emanuel 2010-03-05 12:47 ` Anca Emanuel 2010-03-05 13:58 ` Anca Emanuel 2010-03-05 13:58 ` Anca Emanuel 2010-03-05 14:11 ` Christoph Lameter 2010-03-05 14:11 ` Christoph Lameter 2010-03-08 3:06 ` Andi Kleen 2010-03-08 3:06 ` Andi Kleen 2010-03-08 2:58 ` Andi Kleen 2010-03-08 2:58 ` Andi Kleen 2010-03-08 23:19 ` David Rientjes 2010-03-08 23:19 ` David Rientjes 2010-03-09 13:46 ` Nick Piggin [this message] 2010-03-09 13:46 ` Nick Piggin 2010-03-22 17:28 ` Pekka Enberg 2010-03-22 17:28 ` Pekka Enberg 2010-03-22 21:12 ` Nick Piggin 2010-03-22 21:12 ` Nick Piggin 2010-03-28 2:13 ` David Rientjes 2010-03-28 2:13 ` David Rientjes 2010-03-28 2:40 ` [patch v2] " David Rientjes 2010-03-28 2:40 ` David Rientjes 2010-03-30 9:01 ` Pekka Enberg 2010-03-30 9:01 ` Pekka Enberg 2010-03-30 16:43 ` Christoph Lameter 2010-03-30 16:43 ` Christoph Lameter 2010-04-04 20:45 ` David Rientjes 2010-04-04 20:45 ` David Rientjes 2010-04-07 16:29 ` Pekka Enberg 2010-04-07 16:29 ` Pekka Enberg 2010-02-25 18:34 ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Christoph Lameter 2010-02-25 18:34 ` Christoph Lameter 2010-02-25 18:46 ` Pekka Enberg 2010-02-25 18:46 ` Pekka Enberg 2010-02-25 19:19 ` Christoph Lameter 2010-02-25 19:19 ` Christoph Lameter 2010-03-02 12:55 ` Andi Kleen 2010-03-02 12:55 ` Andi Kleen 2010-02-19 18:22 ` Christoph Lameter 2010-02-19 18:22 ` Christoph Lameter 2010-02-22 10:57 ` Pekka Enberg 2010-02-22 10:57 ` Pekka Enberg 2010-02-13 10:24 ` [PATCH] [0/4] Update slab memory hotplug series Pekka Enberg 2010-02-13 10:24 ` Pekka Enberg
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100309134633.GM8653@laptop \ --to=npiggin@suse.de \ --cc=andi@firstfloor.org \ --cc=cl@linux-foundation.org \ --cc=haicheng.li@intel.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=penberg@cs.helsinki.fi \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.