All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
	Andi Kleen <andi@firstfloor.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	haicheng.li@intel.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [patch] slab: add memory hotplug support
Date: Wed, 10 Mar 2010 00:46:33 +1100	[thread overview]
Message-ID: <20100309134633.GM8653@laptop> (raw)
In-Reply-To: <alpine.DEB.2.00.1003081502400.30456@chino.kir.corp.google.com>

On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> On Fri, 5 Mar 2010, Nick Piggin wrote:
> 
> > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > +/*
> > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > > + * hot-remove so that the node is not removed.  When used because memory
> > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > + *
> > > + * Must hold cache_chain_mutex.
> > > + */
> > > +static int __meminit free_cache_nodelists_node(int node)
> > > +{
> > > +	struct kmem_cache *cachep;
> > > +	int ret = 0;
> > > +
> > > +	list_for_each_entry(cachep, &cache_chain, next) {
> > > +		struct array_cache *shared;
> > > +		struct array_cache **alien;
> > > +		struct kmem_list3 *l3;
> > > +
> > > +		l3 = cachep->nodelists[node];
> > > +		if (!l3)
> > > +			continue;
> > > +
> > > +		spin_lock_irq(&l3->list_lock);
> > > +		shared = l3->shared;
> > > +		if (shared) {
> > > +			free_block(cachep, shared->entry, shared->avail, node);
> > > +			l3->shared = NULL;
> > > +		}
> > > +		alien = l3->alien;
> > > +		l3->alien = NULL;
> > > +		spin_unlock_irq(&l3->list_lock);
> > > +
> > > +		if (alien) {
> > > +			drain_alien_cache(cachep, alien);
> > > +			free_alien_cache(alien);
> > > +		}
> > > +		kfree(shared);
> > > +
> > > +		drain_freelist(cachep, l3, l3->free_objects);
> > > +		if (!list_empty(&l3->slabs_full) ||
> > > +					!list_empty(&l3->slabs_partial)) {
> > > +			/*
> > > +			 * Continue to iterate through each slab cache to free
> > > +			 * as many nodelists as possible even though the
> > > +			 * offline will be canceled.
> > > +			 */
> > > +			ret = -EBUSY;
> > > +			continue;
> > > +		}
> > > +		kfree(l3);
> > > +		cachep->nodelists[node] = NULL;
> > 
> > What's stopping races of other CPUs trying to access l3 and array
> > caches while they're being freed?
> > 
> 
> numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
> already does a fallback to other onlined nodes in case a nodeid is passed 
> to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
> cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
> cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).

Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
while it is being accessed by another CPU and concurrently being freed
on this one? 


> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> > > + */
> > > +static int __meminit slab_node_online(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = init_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> > > + */
> > > +static int __meminit slab_node_offline(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = free_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static int __meminit slab_memory_callback(struct notifier_block *self,
> > > +					unsigned long action, void *arg)
> > > +{
> > > +	struct memory_notify *mnb = arg;
> > > +	int ret = 0;
> > > +	int nid;
> > > +
> > > +	nid = mnb->status_change_nid;
> > > +	if (nid < 0)
> > > +		goto out;
> > > +
> > > +	switch (action) {
> > > +	case MEM_GOING_ONLINE:
> > > +	case MEM_CANCEL_OFFLINE:
> > > +		ret = slab_node_online(nid);
> > > +		break;
> > 
> > This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> > put a panic() in here and I don't mind. Otherwise you get corruption
> > somewhere in the slab code.
> > 
> 
> MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
> allocated anywhere on the system and if that happens then the node simply 
> couldn't be allocated from (numa_node_id() would never return it as the 
> cpu's node, so it's possible to fallback in this scenario).

Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
the problem.


> Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of 
> the array caches and the nodelist until MEM_OFFLINE.  We're guaranteed 
> that all pages are freed at that point so there are no existing objects 
> that we need to track and then if the offline fails from a different 
> callback it would be possible to reset the l3->nodelists[node] pointers 
> since they haven't been freed yet.


WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
	Andi Kleen <andi@firstfloor.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	haicheng.li@intel.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [patch] slab: add memory hotplug support
Date: Wed, 10 Mar 2010 00:46:33 +1100	[thread overview]
Message-ID: <20100309134633.GM8653@laptop> (raw)
In-Reply-To: <alpine.DEB.2.00.1003081502400.30456@chino.kir.corp.google.com>

On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> On Fri, 5 Mar 2010, Nick Piggin wrote:
> 
> > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > +/*
> > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > + * hotplug.  Returns -EBUSY if all objects cannot be drained on memory
> > > + * hot-remove so that the node is not removed.  When used because memory
> > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > + *
> > > + * Must hold cache_chain_mutex.
> > > + */
> > > +static int __meminit free_cache_nodelists_node(int node)
> > > +{
> > > +	struct kmem_cache *cachep;
> > > +	int ret = 0;
> > > +
> > > +	list_for_each_entry(cachep, &cache_chain, next) {
> > > +		struct array_cache *shared;
> > > +		struct array_cache **alien;
> > > +		struct kmem_list3 *l3;
> > > +
> > > +		l3 = cachep->nodelists[node];
> > > +		if (!l3)
> > > +			continue;
> > > +
> > > +		spin_lock_irq(&l3->list_lock);
> > > +		shared = l3->shared;
> > > +		if (shared) {
> > > +			free_block(cachep, shared->entry, shared->avail, node);
> > > +			l3->shared = NULL;
> > > +		}
> > > +		alien = l3->alien;
> > > +		l3->alien = NULL;
> > > +		spin_unlock_irq(&l3->list_lock);
> > > +
> > > +		if (alien) {
> > > +			drain_alien_cache(cachep, alien);
> > > +			free_alien_cache(alien);
> > > +		}
> > > +		kfree(shared);
> > > +
> > > +		drain_freelist(cachep, l3, l3->free_objects);
> > > +		if (!list_empty(&l3->slabs_full) ||
> > > +					!list_empty(&l3->slabs_partial)) {
> > > +			/*
> > > +			 * Continue to iterate through each slab cache to free
> > > +			 * as many nodelists as possible even though the
> > > +			 * offline will be canceled.
> > > +			 */
> > > +			ret = -EBUSY;
> > > +			continue;
> > > +		}
> > > +		kfree(l3);
> > > +		cachep->nodelists[node] = NULL;
> > 
> > What's stopping races of other CPUs trying to access l3 and array
> > caches while they're being freed?
> > 
> 
> numa_node_id() will not return an offlined nodeid and cache_alloc_node() 
> already does a fallback to other onlined nodes in case a nodeid is passed 
> to kmalloc_node() that does not have a nodelist.  l3->shared and l3->alien 
> cannot be accessed without l3->list_lock (drain, cache_alloc_refill, 
> cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).

Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
while it is being accessed by another CPU and concurrently being freed
on this one? 


> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> > > + */
> > > +static int __meminit slab_node_online(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = init_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> > > + */
> > > +static int __meminit slab_node_offline(int nid)
> > > +{
> > > +	int ret;
> > > +	mutex_lock(&cache_chain_mutex);
> > > +	ret = free_cache_nodelists_node(nid);
> > > +	mutex_unlock(&cache_chain_mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static int __meminit slab_memory_callback(struct notifier_block *self,
> > > +					unsigned long action, void *arg)
> > > +{
> > > +	struct memory_notify *mnb = arg;
> > > +	int ret = 0;
> > > +	int nid;
> > > +
> > > +	nid = mnb->status_change_nid;
> > > +	if (nid < 0)
> > > +		goto out;
> > > +
> > > +	switch (action) {
> > > +	case MEM_GOING_ONLINE:
> > > +	case MEM_CANCEL_OFFLINE:
> > > +		ret = slab_node_online(nid);
> > > +		break;
> > 
> > This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> > put a panic() in here and I don't mind. Otherwise you get corruption
> > somewhere in the slab code.
> > 
> 
> MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3 couldn't be 
> allocated anywhere on the system and if that happens then the node simply 
> couldn't be allocated from (numa_node_id() would never return it as the 
> cpu's node, so it's possible to fallback in this scenario).

Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
the problem.


> Instead of doing this all at MEM_GOING_OFFLINE, we could delay freeing of 
> the array caches and the nodelist until MEM_OFFLINE.  We're guaranteed 
> that all pages are freed at that point so there are no existing objects 
> that we need to track and then if the offline fails from a different 
> callback it would be possible to reset the l3->nodelists[node] pointers 
> since they haven't been freed yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-03-09 13:46 UTC|newest]

Thread overview: 170+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-11 20:53 [PATCH] [0/4] Update slab memory hotplug series Andi Kleen
2010-02-11 20:53 ` Andi Kleen
2010-02-11 20:54 ` [PATCH] [1/4] SLAB: Handle node-not-up case in fallback_alloc() v2 Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:41   ` David Rientjes
2010-02-11 21:41     ` David Rientjes
2010-02-11 21:55     ` Andi Kleen
2010-02-11 21:55       ` Andi Kleen
2010-02-15  6:04   ` Nick Piggin
2010-02-15  6:04     ` Nick Piggin
2010-02-15 10:07     ` Andi Kleen
2010-02-15 10:07       ` Andi Kleen
2010-02-15 10:22       ` Nick Piggin
2010-02-15 10:22         ` Nick Piggin
2010-02-11 20:54 ` [PATCH] [2/4] SLAB: Separate node initialization into separate function Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:44   ` David Rientjes
2010-02-11 21:44     ` David Rientjes
2010-02-11 20:54 ` [PATCH] [3/4] SLAB: Set up the l3 lists for the memory of freshly added memory v2 Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:45   ` David Rientjes
2010-02-11 21:45     ` David Rientjes
2010-02-15  6:06     ` Nick Piggin
2010-02-15  6:06       ` Nick Piggin
2010-02-15 21:47       ` David Rientjes
2010-02-15 21:47         ` David Rientjes
2010-02-16 14:04         ` Nick Piggin
2010-02-16 14:04           ` Nick Piggin
2010-02-16 20:45           ` Pekka Enberg
2010-02-16 20:45             ` Pekka Enberg
2010-02-11 20:54 ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Andi Kleen
2010-02-11 20:54   ` Andi Kleen
2010-02-11 21:45   ` David Rientjes
2010-02-11 21:45     ` David Rientjes
2010-02-15  6:15   ` Nick Piggin
2010-02-15  6:15     ` Nick Piggin
2010-02-15 10:32     ` Andi Kleen
2010-02-15 10:32       ` Andi Kleen
2010-02-15 10:41       ` Nick Piggin
2010-02-15 10:41         ` Nick Piggin
2010-02-15 10:52         ` Andi Kleen
2010-02-15 10:52           ` Andi Kleen
2010-02-15 11:01           ` Nick Piggin
2010-02-15 11:01             ` Nick Piggin
2010-02-15 15:30             ` Andi Kleen
2010-02-15 15:30               ` Andi Kleen
2010-02-19 18:22             ` Christoph Lameter
2010-02-19 18:22               ` Christoph Lameter
2010-02-20  9:01               ` Andi Kleen
2010-02-20  9:01                 ` Andi Kleen
2010-02-22 10:53                 ` Pekka Enberg
2010-02-22 10:53                   ` Pekka Enberg
2010-02-22 14:31                   ` Andi Kleen
2010-02-22 14:31                     ` Andi Kleen
2010-02-22 16:11                     ` Pekka Enberg
2010-02-22 16:11                       ` Pekka Enberg
2010-02-22 20:20                       ` Andi Kleen
2010-02-22 20:20                         ` Andi Kleen
2010-02-24 15:49                 ` Christoph Lameter
2010-02-24 15:49                   ` Christoph Lameter
2010-02-25  7:26                   ` Pekka Enberg
2010-02-25  7:26                     ` Pekka Enberg
2010-02-25  8:01                     ` David Rientjes
2010-02-25  8:01                       ` David Rientjes
2010-02-25 18:30                       ` Christoph Lameter
2010-02-25 18:30                         ` Christoph Lameter
2010-02-25 21:45                         ` David Rientjes
2010-02-25 21:45                           ` David Rientjes
2010-02-25 22:31                           ` Christoph Lameter
2010-02-25 22:31                             ` Christoph Lameter
2010-02-26 10:45                             ` Pekka Enberg
2010-02-26 10:45                               ` Pekka Enberg
2010-02-26 11:43                               ` Andi Kleen
2010-02-26 11:43                                 ` Andi Kleen
2010-02-26 12:35                                 ` Pekka Enberg
2010-02-26 12:35                                   ` Pekka Enberg
2010-02-26 14:08                                   ` Andi Kleen
2010-02-26 14:08                                     ` Andi Kleen
2010-02-26  1:09                         ` KAMEZAWA Hiroyuki
2010-02-26  1:09                           ` KAMEZAWA Hiroyuki
2010-02-26 11:41                         ` Andi Kleen
2010-02-26 11:41                           ` Andi Kleen
2010-02-26 15:04                           ` Christoph Lameter
2010-02-26 15:04                             ` Christoph Lameter
2010-02-26 15:05                             ` Christoph Lameter
2010-02-26 15:05                               ` Christoph Lameter
2010-02-26 15:59                               ` Andi Kleen
2010-02-26 15:59                                 ` Andi Kleen
2010-02-26 15:57                             ` Andi Kleen
2010-02-26 15:57                               ` Andi Kleen
2010-02-26 17:24                               ` Christoph Lameter
2010-02-26 17:24                                 ` Christoph Lameter
2010-02-26 17:31                                 ` Andi Kleen
2010-02-26 17:31                                   ` Andi Kleen
2010-03-01  1:59                                   ` KAMEZAWA Hiroyuki
2010-03-01  1:59                                     ` KAMEZAWA Hiroyuki
2010-03-01 10:27                                     ` David Rientjes
2010-03-01 10:27                                       ` David Rientjes
2010-02-27  0:01                                 ` David Rientjes
2010-02-27  0:01                                   ` David Rientjes
2010-03-01 10:24                                   ` [patch] slab: add memory hotplug support David Rientjes
2010-03-01 10:24                                     ` David Rientjes
2010-03-02  5:53                                     ` Pekka Enberg
2010-03-02  5:53                                       ` Pekka Enberg
2010-03-02 20:20                                       ` Christoph Lameter
2010-03-02 20:20                                         ` Christoph Lameter
2010-03-02 21:03                                         ` David Rientjes
2010-03-02 21:03                                           ` David Rientjes
2010-03-03  1:28                                         ` KAMEZAWA Hiroyuki
2010-03-03  1:28                                           ` KAMEZAWA Hiroyuki
2010-03-03  2:39                                           ` David Rientjes
2010-03-03  2:39                                             ` David Rientjes
2010-03-03  2:51                                             ` KAMEZAWA Hiroyuki
2010-03-03  2:51                                               ` KAMEZAWA Hiroyuki
2010-03-02 12:53                                     ` Andi Kleen
2010-03-02 12:53                                       ` Andi Kleen
2010-03-02 15:04                                       ` Pekka Enberg
2010-03-02 15:04                                         ` Pekka Enberg
2010-03-03 14:34                                         ` Andi Kleen
2010-03-03 14:34                                           ` Andi Kleen
2010-03-03 15:46                                           ` Christoph Lameter
2010-03-03 15:46                                             ` Christoph Lameter
2010-03-02 21:17                                       ` David Rientjes
2010-03-02 21:17                                         ` David Rientjes
2010-03-05  6:20                                     ` Nick Piggin
2010-03-05  6:20                                       ` Nick Piggin
2010-03-05 12:47                                       ` Anca Emanuel
2010-03-05 12:47                                         ` Anca Emanuel
2010-03-05 13:58                                         ` Anca Emanuel
2010-03-05 13:58                                           ` Anca Emanuel
2010-03-05 14:11                                         ` Christoph Lameter
2010-03-05 14:11                                           ` Christoph Lameter
2010-03-08  3:06                                           ` Andi Kleen
2010-03-08  3:06                                             ` Andi Kleen
2010-03-08  2:58                                         ` Andi Kleen
2010-03-08  2:58                                           ` Andi Kleen
2010-03-08 23:19                                       ` David Rientjes
2010-03-08 23:19                                         ` David Rientjes
2010-03-09 13:46                                         ` Nick Piggin [this message]
2010-03-09 13:46                                           ` Nick Piggin
2010-03-22 17:28                                           ` Pekka Enberg
2010-03-22 17:28                                             ` Pekka Enberg
2010-03-22 21:12                                             ` Nick Piggin
2010-03-22 21:12                                               ` Nick Piggin
2010-03-28  2:13                                           ` David Rientjes
2010-03-28  2:13                                             ` David Rientjes
2010-03-28  2:40                                             ` [patch v2] " David Rientjes
2010-03-28  2:40                                               ` David Rientjes
2010-03-30  9:01                                               ` Pekka Enberg
2010-03-30  9:01                                                 ` Pekka Enberg
2010-03-30 16:43                                                 ` Christoph Lameter
2010-03-30 16:43                                                   ` Christoph Lameter
2010-04-04 20:45                                                   ` David Rientjes
2010-04-04 20:45                                                     ` David Rientjes
2010-04-07 16:29                                               ` Pekka Enberg
2010-04-07 16:29                                                 ` Pekka Enberg
2010-02-25 18:34                     ` [PATCH] [4/4] SLAB: Fix node add timer race in cache_reap Christoph Lameter
2010-02-25 18:34                       ` Christoph Lameter
2010-02-25 18:46                       ` Pekka Enberg
2010-02-25 18:46                         ` Pekka Enberg
2010-02-25 19:19                         ` Christoph Lameter
2010-02-25 19:19                           ` Christoph Lameter
2010-03-02 12:55                         ` Andi Kleen
2010-03-02 12:55                           ` Andi Kleen
2010-02-19 18:22       ` Christoph Lameter
2010-02-19 18:22         ` Christoph Lameter
2010-02-22 10:57         ` Pekka Enberg
2010-02-22 10:57           ` Pekka Enberg
2010-02-13 10:24 ` [PATCH] [0/4] Update slab memory hotplug series Pekka Enberg
2010-02-13 10:24   ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100309134633.GM8653@laptop \
    --to=npiggin@suse.de \
    --cc=andi@firstfloor.org \
    --cc=cl@linux-foundation.org \
    --cc=haicheng.li@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.