All of lore.kernel.org
 help / color / mirror / Atom feed
* slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
@ 2007-01-29  1:13 Oleg Nesterov
  2007-01-29 16:54 ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29  1:13 UTC (permalink / raw)
  To: Christoph Lameter, Ingo Molnar
  Cc: Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy,
	Andrew Morton, linux-kernel

For the beginning, about another (but related) minor problem,

	debug_smp_processor_id:

		/*
		 * Kernel threads bound to a single CPU can safely use
		 * smp_processor_id():
		 */

This is only true without CONFIG_HOTPLUG_CPU. Otherwise CPU can go away when
the task takes a preemption or sleeps. I think we need #ifndef here.


Now,
	static void __devinit start_cpu_timer(int cpu)
	{
		struct delayed_work *reap_work = &per_cpu(reap_work, cpu);

		if (keventd_up() && reap_work->work.func == NULL) {
			init_reap_node(cpu);
			INIT_DELAYED_WORK(reap_work, cache_reap);
			schedule_delayed_work_on(cpu, reap_work,
						__round_jiffies_relative(HZ, cpu));
		}
	}

This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
CPU_UP will not restart a per-cpu "cache_reap timer".


With or without recent changes, it is possible that work->func() will run on
another CPU (not that to which it was submitted) if CPU goes down. In fact,
this can happen while work->func() is running, so even smp_processor_id()
is not safe to use in work->func().

However, cache_reap() seems to wrongly assume that smp_processor_id() is stable,
this is the second problem.

Is my understanding correct?

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29  1:13 slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Oleg Nesterov
@ 2007-01-29 16:54 ` Christoph Lameter
  2007-01-29 17:19   ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-01-29 16:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> Now,
> 	static void __devinit start_cpu_timer(int cpu)
> 	{
> 		struct delayed_work *reap_work = &per_cpu(reap_work, cpu);
> 
> 		if (keventd_up() && reap_work->work.func == NULL) {
> 			init_reap_node(cpu);
> 			INIT_DELAYED_WORK(reap_work, cache_reap);
> 			schedule_delayed_work_on(cpu, reap_work,
> 						__round_jiffies_relative(HZ, cpu));
> 		}
> 	}
> 
> This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> CPU_UP will not restart a per-cpu "cache_reap timer".

Why?

> With or without recent changes, it is possible that work->func() will run on
> another CPU (not that to which it was submitted) if CPU goes down. In fact,
> this can happen while work->func() is running, so even smp_processor_id()
> is not safe to use in work->func().

But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
general problem with schedule_delayed_work_on() and keventd?

> However, cache_reap() seems to wrongly assume that smp_processor_id() is stable,
> this is the second problem.
> 
> Is my understanding correct?

cache_reap assumes that the processor id is stable based on the kevent 
thread being pinned to a processor.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 16:54 ` Christoph Lameter
@ 2007-01-29 17:19   ` Oleg Nesterov
  2007-01-29 17:27     ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29 17:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > Now,
> > 	static void __devinit start_cpu_timer(int cpu)
> > 	{
> > 		struct delayed_work *reap_work = &per_cpu(reap_work, cpu);
> > 
> > 		if (keventd_up() && reap_work->work.func == NULL) {
> > 			init_reap_node(cpu);
> > 			INIT_DELAYED_WORK(reap_work, cache_reap);
> > 			schedule_delayed_work_on(cpu, reap_work,
> > 						__round_jiffies_relative(HZ, cpu));
> > 		}
> > 	}
> > 
> > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> > CPU_UP will not restart a per-cpu "cache_reap timer".
> 
> Why?

Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL
we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's
bug.

> > With or without recent changes, it is possible that work->func() will run on
> > another CPU (not that to which it was submitted) if CPU goes down. In fact,
> > this can happen while work->func() is running, so even smp_processor_id()
> > is not safe to use in work->func().
> 
> But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
> general problem with schedule_delayed_work_on() and keventd?

I think this is yet another problem with workqueues/cpu-hotplug interaction.
Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't
garantee that smp_processor_id() is stable even if the thread is pinned to
specific processor.

> > However, cache_reap() seems to wrongly assume that smp_processor_id() is stable,
> > this is the second problem.
> >
> > Is my understanding correct?
>
> cache_reap assumes that the processor id is stable based on the kevent 
> thread being pinned to a processor.

I think cache_reap() is not alone, and this is not its fault.

But please note another minor problem,

	void cache_reap(struct work_struct *unused)
	{
		...

		schedule_delayed_work(&__get_cpu_var(reap_work), ...);
	}

Even if smp_processor_id() was stable during the execution of cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
avoid this, and this is correct.

This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 17:19   ` Oleg Nesterov
@ 2007-01-29 17:27     ` Christoph Lameter
  2007-01-29 18:27       ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-01-29 17:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> > > CPU_UP will not restart a per-cpu "cache_reap timer".
> > 
> > Why?
> 
> Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL
> we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's
> bug.

The CPU_DOWN would need to set work.func == NULL for this to work. But 
then the slab does not shut down the work queues for the processor. Isnt 
this another issue with workqueues? The slab would need a notification 
that the workqueue for a processor was shutdown in order to set work.func 
= NULL.

> I think cache_reap() is not alone, and this is not its fault.
> 
> But please note another minor problem,
> 
> 	void cache_reap(struct work_struct *unused)
> 	{
> 		...
> 
> 		schedule_delayed_work(&__get_cpu_var(reap_work), ...);
> 	}
> 
> Even if smp_processor_id() was stable during the execution of cache_reap(),
> this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> avoid this, and this is correct.

Uhh.... This may not be correct in terms of how the slab operates.
 
> This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
> This is absolutely harmless right now, but may be it's better to use
> container_of(unused, struct delayed_work, work).

Well seems that we have a set of unresolved issues with workqueues and cpu 
hotplug.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 17:27     ` Christoph Lameter
@ 2007-01-29 18:27       ` Oleg Nesterov
  2007-01-29 19:09         ` Christoph Lameter
                           ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29 18:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> > > > CPU_UP will not restart a per-cpu "cache_reap timer".
> > > 
> > > Why?
> > 
> > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL
> > we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's
> > bug.
> 
> The CPU_DOWN would need to set work.func == NULL for this to work. But 
> then the slab does not shut down the work queues for the processor. Isnt 
> this another issue with workqueues?

I think no, please see below. Actually, this is more related to timers for
this particular case.

>                                      The slab would need a notification 
> that the workqueue for a processor was shutdown in order to set work.func 
> = NULL.

The slab has a notification: CPU_XXX events. It should cancel a pending per
cpu "reap_work timer".

> > I think cache_reap() is not alone, and this is not its fault.
> > 
> > But please note another minor problem,
> > 
> > 	void cache_reap(struct work_struct *unused)
> > 	{
> > 		...
> > 
> > 		schedule_delayed_work(&__get_cpu_var(reap_work), ...);
> > 	}
> > 
> > Even if smp_processor_id() was stable during the execution of cache_reap(),
> > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> > avoid this, and this is correct.
> 
> Uhh.... This may not be correct in terms of how the slab operates.

But this is practically impossible to avoid. We can't delay CPU_DOWN until all
workqueues flush their cwq->worklist. This is livelockable, the work can re-queue
itself, and new works can be added since the dying CPU is still on cpu_online_map.
This means that some pending works will be processed on another CPU.

delayed_work is even worse, the timer can migrate as well.

The first problem (smp_processor_id() is not stable) could be solved if we
use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug.

> > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
> > This is absolutely harmless right now, but may be it's better to use
> > container_of(unused, struct delayed_work, work).
> 
> Well seems that we have a set of unresolved issues with workqueues and cpu 
> hotplug.

Yes.

Oleg.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 18:27       ` Oleg Nesterov
@ 2007-01-29 19:09         ` Christoph Lameter
  2007-01-29 19:29           ` Oleg Nesterov
  2007-01-29 19:25         ` Christoph Lameter
  2007-02-20 18:39         ` Max Krasnyansky
  2 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-01-29 19:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> >                                      The slab would need a notification 
> > that the workqueue for a processor was shutdown in order to set work.func 
> > = NULL.
> 
> The slab has a notification: CPU_XXX events. It should cancel a pending per
> cpu "reap_work timer".

Ahh. Okay then the following patch would fix it?

Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/mm/slab.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/slab.c	2007-01-24 20:19:28.000000000 -0600
+++ linux-2.6.20-rc6/mm/slab.c	2007-01-29 13:08:05.773928988 -0600
@@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_DOWN_PREPARE:
+		/* Shutdown cache reaper */
+		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
+		per_cpu(reap_work, cpu).work.func = NULL;
+
 		mutex_lock(&cache_chain_mutex);
 		break;
 	case CPU_DOWN_FAILED:

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 18:27       ` Oleg Nesterov
  2007-01-29 19:09         ` Christoph Lameter
@ 2007-01-29 19:25         ` Christoph Lameter
  2007-01-29 19:49           ` Oleg Nesterov
  2007-02-20 18:39         ` Max Krasnyansky
  2 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-01-29 19:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> > > Even if smp_processor_id() was stable during the execution of cache_reap(),
> > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> > > avoid this, and this is correct.
> > 
> > Uhh.... This may not be correct in terms of how the slab operates.
> 
> But this is practically impossible to avoid. We can't delay CPU_DOWN until all
> workqueues flush their cwq->worklist. This is livelockable, the work can re-queue
> itself, and new works can be added since the dying CPU is still on cpu_online_map.
> This means that some pending works will be processed on another CPU.

But we could delay CPU_DOWN in the handler for the slab until we know that 
the cache_reaper is no longer running?

> > > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
> > > This is absolutely harmless right now, but may be it's better to use
> > > container_of(unused, struct delayed_work, work).

There is more where that is coming from. cache_reap determines the current 
cpu in order to find the correct per cpu cache and also determines the 
current node. If you move cache_reap to another processor / node then it 
will just clean that one and not do anything for the processor that you 
wanted it to run for. If we change processors in the middle of the run
then it may do some actions on one cpu and some on another....

Having said that here is the patch that insures that makes cache_reap no 
longer use per_cpu to access the work structure. This may be a cleanup on
its own but it will not address your issues.



Use parameter passed to cache_reap to determine pointer to work structure

Use the pointer passed to cache_reap to determine the work
pointer and consolidate exit paths.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/mm/slab.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/slab.c	2007-01-29 13:08:05.000000000 -0600
+++ linux-2.6.20-rc6/mm/slab.c	2007-01-29 13:17:01.695680898 -0600
@@ -4021,18 +4021,17 @@ void drain_array(struct kmem_cache *cach
  * If we cannot acquire the cache chain mutex then just give up - we'll try
  * again on the next iteration.
  */
-static void cache_reap(struct work_struct *unused)
+static void cache_reap(struct work_struct *w)
 {
 	struct kmem_cache *searchp;
 	struct kmem_list3 *l3;
 	int node = numa_node_id();
+	struct delayed_work *work =
+		container_of(w, struct delayed_work, work);
 
-	if (!mutex_trylock(&cache_chain_mutex)) {
+	if (!mutex_trylock(&cache_chain_mutex))
 		/* Give up. Setup the next iteration. */
-		schedule_delayed_work(&__get_cpu_var(reap_work),
-				      round_jiffies_relative(REAPTIMEOUT_CPUC));
-		return;
-	}
+		goto out;
 
 	list_for_each_entry(searchp, &cache_chain, next) {
 		check_irq_on();
@@ -4075,9 +4074,9 @@ next:
 	mutex_unlock(&cache_chain_mutex);
 	next_reap_node();
 	refresh_cpu_vm_stats(smp_processor_id());
+out:
 	/* Set up the next iteration */
-	schedule_delayed_work(&__get_cpu_var(reap_work),
-		round_jiffies_relative(REAPTIMEOUT_CPUC));
+	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));
 }
 
 #ifdef CONFIG_PROC_FS

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 19:09         ` Christoph Lameter
@ 2007-01-29 19:29           ` Oleg Nesterov
  0 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29 19:29 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > >                                      The slab would need a notification 
> > > that the workqueue for a processor was shutdown in order to set work.func 
> > > = NULL.
> > 
> > The slab has a notification: CPU_XXX events. It should cancel a pending per
> > cpu "reap_work timer".
> 
> Ahh. Okay then the following patch would fix it?
> 
> Shutdown cache_reaper when cpu goes down
> 
> Shutdown the cache_reaper in slab.c if the cpu is brought down
> and set the cache_reap.func to NULL. Otherwise hotplug shuts
> down the reaper for good.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.20-rc6/mm/slab.c
> ===================================================================
> --- linux-2.6.20-rc6.orig/mm/slab.c	2007-01-24 20:19:28.000000000 -0600
> +++ linux-2.6.20-rc6/mm/slab.c	2007-01-29 13:08:05.773928988 -0600
> @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru
>  		break;
>  #ifdef CONFIG_HOTPLUG_CPU
>  	case CPU_DOWN_PREPARE:
> +		/* Shutdown cache reaper */
> +		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
> +		per_cpu(reap_work, cpu).work.func = NULL;
> +
>  		mutex_lock(&cache_chain_mutex);
>  		break;
>  	case CPU_DOWN_FAILED:

Then CPU_DOWN_FAILED should do start_cpu_timer(cpu).

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 19:25         ` Christoph Lameter
@ 2007-01-29 19:49           ` Oleg Nesterov
  2007-01-29 20:29             ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29 19:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > > > Even if smp_processor_id() was stable during the execution of cache_reap(),
> > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
> > > > avoid this, and this is correct.
> > > 
> > > Uhh.... This may not be correct in terms of how the slab operates.
> > 
> > But this is practically impossible to avoid. We can't delay CPU_DOWN until all
> > workqueues flush their cwq->worklist. This is livelockable, the work can re-queue
> > itself, and new works can be added since the dying CPU is still on cpu_online_map.
> > This means that some pending works will be processed on another CPU.
> 
> But we could delay CPU_DOWN in the handler for the slab until we know that 
> the cache_reaper is no longer running?

Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper
like you did in the previous patch. Did you mean this? If yes - then yes :)

> > > > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
> > > > This is absolutely harmless right now, but may be it's better to use
> > > > container_of(unused, struct delayed_work, work).
> 
> There is more where that is coming from. cache_reap determines the current 
> cpu in order to find the correct per cpu cache and also determines the 
> current node. If you move cache_reap to another processor / node then it 
> will just clean that one and not do anything for the processor that you 
> wanted it to run for.

Worse, we can have 2 handlers running in parallel on the same CPU. But this
is fixed by your previous patch, I believe.

>                        If we change processors in the middle of the run
> then it may do some actions on one cpu and some on another....

Yep. For example, next_reap_node() will not be happy if we change CPU in
the middle. But this is _extremely_ unlikely, can only happen on CPU_DOWN,
and cache_reap() should not care about this.

> -static void cache_reap(struct work_struct *unused)
> +static void cache_reap(struct work_struct *w)
>  {
>  	struct kmem_cache *searchp;
>  	struct kmem_list3 *l3;
>  	int node = numa_node_id();
> +	struct delayed_work *work =
> +		container_of(w, struct delayed_work, work);
>
> ...
>
> +	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC));

Actually, I was wrong. Yes, this should work, but only with your previous
patch. Otherwise, if the handler runs on the "wrong" CPU (this is not possible
since you added cancel_delayed_work()), we are in fact starting a second reaper
on the same CPU, not good.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 19:49           ` Oleg Nesterov
@ 2007-01-29 20:29             ` Christoph Lameter
  2007-01-29 21:05               ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-01-29 20:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On Mon, 29 Jan 2007, Oleg Nesterov wrote:

> > But we could delay CPU_DOWN in the handler for the slab until we know that 
> > the cache_reaper is no longer running?
> 
> Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper
> like you did in the previous patch. Did you mean this? If yes - then yes :)

Yes sure.

> Worse, we can have 2 handlers running in parallel on the same CPU. But this
> is fixed by your previous patch, I believe.

Good.

Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and
CPU_DOWN_FAILED somehow vanished in mm?





Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6-mm1/mm/slab.c
===================================================================
--- linux-2.6.20-rc6-mm1.orig/mm/slab.c	2007-01-29 14:18:37.000000000 -0600
+++ linux-2.6.20-rc6-mm1/mm/slab.c	2007-01-29 14:21:18.119155877 -0600
@@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru
 		start_cpu_timer(cpu);
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
+  	case CPU_DOWN_PREPARE:
+		/* Shutdown cache reaper */
+		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
+		per_cpu(reap_work, cpu).work.func = NULL;
+
+  		mutex_lock(&cache_chain_mutex);
+  		break;
+  	case CPU_DOWN_FAILED:
+  		mutex_unlock(&cache_chain_mutex);
+		start_cpu_timer(cpu);
+  		break;
 	case CPU_DEAD:
 		/*
 		 * Even if all the cpus of a node are down, we don't free the

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 20:29             ` Christoph Lameter
@ 2007-01-29 21:05               ` Oleg Nesterov
  2007-01-29 21:48                 ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29 21:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On 01/29, Christoph Lameter wrote:
>
> Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and
> CPU_DOWN_FAILED somehow vanished in mm?

No, no, there are still in place, so I believe your patch is good.

Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
but this is not related.

> Shutdown cache_reaper when cpu goes down
> 
> Shutdown the cache_reaper in slab.c if the cpu is brought down
> and set the cache_reap.func to NULL. Otherwise hotplug shuts
> down the reaper for good.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.20-rc6-mm1/mm/slab.c
> ===================================================================
> --- linux-2.6.20-rc6-mm1.orig/mm/slab.c	2007-01-29 14:18:37.000000000 -0600
> +++ linux-2.6.20-rc6-mm1/mm/slab.c	2007-01-29 14:21:18.119155877 -0600
> @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru
>  		start_cpu_timer(cpu);
>  		break;
>  #ifdef CONFIG_HOTPLUG_CPU
> +  	case CPU_DOWN_PREPARE:
> +		/* Shutdown cache reaper */
> +		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
> +		per_cpu(reap_work, cpu).work.func = NULL;
> +
> +  		mutex_lock(&cache_chain_mutex);
> +  		break;
> +  	case CPU_DOWN_FAILED:
> +  		mutex_unlock(&cache_chain_mutex);
> +		start_cpu_timer(cpu);
> +  		break;
>  	case CPU_DEAD:
>  		/*
>  		 * Even if all the cpus of a node are down, we don't free the


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 21:05               ` Oleg Nesterov
@ 2007-01-29 21:48                 ` Christoph Lameter
  2007-01-29 22:14                   ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-01-29 21:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On Tue, 30 Jan 2007, Oleg Nesterov wrote:

> Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
> so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
> but this is not related.

Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX 
anymore, right? Which brings us to this form of the patch:


Shutdown cache_reaper when cpu goes down

Shutdown the cache_reaper in slab.c if the cpu is brought down
and set the cache_reap.func to NULL. Otherwise hotplug shuts
down the reaper for good.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6-mm2/mm/slab.c
===================================================================
--- linux-2.6.20-rc6-mm2.orig/mm/slab.c	2007-01-29 14:27:34.199229828 -0600
+++ linux-2.6.20-rc6-mm2/mm/slab.c	2007-01-29 15:47:18.293962726 -0600
@@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru
 		start_cpu_timer(cpu);
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
+  	case CPU_DOWN_PREPARE:
+		/* Shutdown cache reaper */
+		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
+		per_cpu(reap_work, cpu).work.func = NULL;
+  		break;
+  	case CPU_DOWN_FAILED:
+		start_cpu_timer(cpu);
+  		break;
 	case CPU_DEAD:
 		/*
 		 * Even if all the cpus of a node are down, we don't free the

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 21:48                 ` Christoph Lameter
@ 2007-01-29 22:14                   ` Oleg Nesterov
  0 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2007-01-29 22:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh,
	Gautham shenoy, Andrew Morton, linux-kernel

On 01/29, Christoph Lameter wrote:
>
> On Tue, 30 Jan 2007, Oleg Nesterov wrote:
> 
> > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE,
> > so cpuup_callback() can use them to lock/unlock cache_chain_mutex,
> > but this is not related.
> 
> Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX 
> anymore, right? Which brings us to this form of the patch:

Yes, if CPU_LOCK_ACQUIRE takes cache_chain_mutex, we don't need to do so
on CPU_DOWN_PREPARE. I didn't know cpuup_callback() was already converted,
sorry for the confusion!

Note: with the patch below we are doing cancel_rearming_delayed_work()
under cache_chain_mutex. This is ok since cache_reap() does mutex_trylock(),
so deadlock is not possible. However, this means that mutex_trylock() becomes
mandatory for cache_reap(), probably a little comment will be good.

> Shutdown cache_reaper when cpu goes down
> 
> Shutdown the cache_reaper in slab.c if the cpu is brought down
> and set the cache_reap.func to NULL. Otherwise hotplug shuts
> down the reaper for good.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.20-rc6-mm2/mm/slab.c
> ===================================================================
> --- linux-2.6.20-rc6-mm2.orig/mm/slab.c	2007-01-29 14:27:34.199229828 -0600
> +++ linux-2.6.20-rc6-mm2/mm/slab.c	2007-01-29 15:47:18.293962726 -0600
> @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru
>  		start_cpu_timer(cpu);
>  		break;
>  #ifdef CONFIG_HOTPLUG_CPU
> +  	case CPU_DOWN_PREPARE:
> +		/* Shutdown cache reaper */
> +		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
> +		per_cpu(reap_work, cpu).work.func = NULL;
> +  		break;
> +  	case CPU_DOWN_FAILED:
> +		start_cpu_timer(cpu);
> +  		break;
>  	case CPU_DEAD:
>  		/*
>  		 * Even if all the cpus of a node are down, we don't free the


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-01-29 18:27       ` Oleg Nesterov
  2007-01-29 19:09         ` Christoph Lameter
  2007-01-29 19:25         ` Christoph Lameter
@ 2007-02-20 18:39         ` Max Krasnyansky
  2007-02-20 18:45           ` Christoph Lameter
  2 siblings, 1 reply; 25+ messages in thread
From: Max Krasnyansky @ 2007-02-20 18:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Christoph Lameter, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

Folks,

Oleg Nesterov wrote:
>>> Even if smp_processor_id() was stable during the execution of cache_reap(),
>>> this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
>>> avoid this, and this is correct.
>> Uhh.... This may not be correct in terms of how the slab operates.
> 
> But this is practically impossible to avoid. We can't delay CPU_DOWN until all
> workqueues flush their cwq->worklist. This is livelockable, the work can re-queue
> itself, and new works can be added since the dying CPU is still on cpu_online_map.
> This means that some pending works will be processed on another CPU.
> 
> delayed_work is even worse, the timer can migrate as well.
> 
> The first problem (smp_processor_id() is not stable) could be solved if we
> use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug.
> 
>>> This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
>>> This is absolutely harmless right now, but may be it's better to use
>>> container_of(unused, struct delayed_work, work).
>> Well seems that we have a set of unresolved issues with workqueues and cpu 
>> hotplug.

How about storing 'cpu' explicitly in the work queue instead of relying on the
smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get
moved around.
I'm cooking a set of patches to extend cpu isolation concept a bit. In which case I'd
like one CPU to run cache_reap timer on behalf of another cpu. See the patch below.

diff --git a/mm/slab.c b/mm/slab.c
index c610062..0f46d11 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -766,7 +766,17 @@ int slab_is_available(void)
  	return g_cpucache_up == FULL;
  }

-static DEFINE_PER_CPU(struct delayed_work, reap_work);
+struct slab_work {
+	struct delayed_work dw;
+	unsigned int       cpu;
+};
+
+static DEFINE_PER_CPU(struct slab_work, reap_work);
+
+static inline struct array_cache *cpu_cache_get_on(struct kmem_cache *cachep, unsigned int cpu)
+{
+	return cachep->array[cpu];
+}

  static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
  {
@@ -915,9 +925,9 @@ static void init_reap_node(int cpu)
  	per_cpu(reap_node, cpu) = node;
  }

-static void next_reap_node(void)
+static void next_reap_node(unsigned int cpu)
  {
-	int node = __get_cpu_var(reap_node);
+	int node = per_cpu(reap_node, cpu);

  	/*
  	 * Also drain per cpu pages on remote zones
@@ -928,12 +938,12 @@ static void next_reap_node(void)
  	node = next_node(node, node_online_map);
  	if (unlikely(node >= MAX_NUMNODES))
  		node = first_node(node_online_map);
-	__get_cpu_var(reap_node) = node;
+	per_cpu(reap_node, cpu) = node;
  }

  #else
  #define init_reap_node(cpu) do { } while (0)
-#define next_reap_node(void) do { } while (0)
+#define next_reap_node(cpu) do { } while (0)
  #endif

  /*
@@ -945,17 +955,18 @@ static void next_reap_node(void)
   */
  static void __devinit start_cpu_timer(int cpu)
  {
-	struct delayed_work *reap_work = &per_cpu(reap_work, cpu);
+	struct slab_work *reap_work = &per_cpu(reap_work, cpu);

  	/*
  	 * When this gets called from do_initcalls via cpucache_init(),
  	 * init_workqueues() has already run, so keventd will be setup
  	 * at that time.
  	 */
-	if (keventd_up() && reap_work->work.func == NULL) {
+	if (keventd_up() && reap_work->dw.work.func == NULL) {
  		init_reap_node(cpu);
-		INIT_DELAYED_WORK(reap_work, cache_reap);
-		schedule_delayed_work_on(cpu, reap_work,
+		INIT_DELAYED_WORK(&reap_work->dw, cache_reap);
+		reap_work->cpu = cpu;
+		schedule_delayed_work_on(cpu, &reap_work->dw,
  					__round_jiffies_relative(HZ, cpu));
  	}
  }
@@ -1004,7 +1015,7 @@ static int transfer_objects(struct array_cache *to,
  #ifndef CONFIG_NUMA

  #define drain_alien_cache(cachep, alien) do { } while (0)
-#define reap_alien(cachep, l3) do { } while (0)
+#define reap_alien(cachep, l3, cpu) do { } while (0)

  static inline struct array_cache **alloc_alien_cache(int node, int limit)
  {
@@ -1099,9 +1110,9 @@ static void __drain_alien_cache(struct kmem_cache *cachep,
  /*
   * Called from cache_reap() to regularly drain alien caches round robin.
   */
-static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
+static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3, unsigned int cpu)
  {
-	int node = __get_cpu_var(reap_node);
+	int node = per_cpu(reap_node, cpu);

  	if (l3->alien) {
  		struct array_cache *ac = l3->alien[node];
@@ -4017,16 +4028,17 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
   * If we cannot acquire the cache chain mutex then just give up - we'll try
   * again on the next iteration.
   */
-static void cache_reap(struct work_struct *unused)
+static void cache_reap(struct work_struct *_work)
  {
  	struct kmem_cache *searchp;
  	struct kmem_list3 *l3;
  	int node = numa_node_id();

+	struct slab_work *work = (struct slab_work *) _work;
+
  	if (!mutex_trylock(&cache_chain_mutex)) {
  		/* Give up. Setup the next iteration. */
-		schedule_delayed_work(&__get_cpu_var(reap_work),
-				      round_jiffies_relative(REAPTIMEOUT_CPUC));
+		schedule_delayed_work(&work->dw, round_jiffies_relative(REAPTIMEOUT_CPUC));
  		return;
  	}

@@ -4040,9 +4052,9 @@ static void cache_reap(struct work_struct *unused)
  		 */
  		l3 = searchp->nodelists[node];

-		reap_alien(searchp, l3);
+		reap_alien(searchp, l3, work->cpu);

-		drain_array(searchp, l3, cpu_cache_get(searchp), 0, node);
+		drain_array(searchp, l3, cpu_cache_get_on(searchp, work->cpu), 0, node);

  		/*
  		 * These are racy checks but it does not matter
@@ -4069,11 +4081,11 @@ next:
  	}
  	check_irq_on();
  	mutex_unlock(&cache_chain_mutex);
-	next_reap_node();
-	refresh_cpu_vm_stats(smp_processor_id());
+	next_reap_node(work->cpu);
+	refresh_cpu_vm_stats(work->cpu);
+
  	/* Set up the next iteration */
-	schedule_delayed_work(&__get_cpu_var(reap_work),
-		round_jiffies_relative(REAPTIMEOUT_CPUC));
+	schedule_delayed_work(&work->dw, round_jiffies_relative(REAPTIMEOUT_CPUC));
  }

  #ifdef CONFIG_PROC_FS


Max




^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 18:39         ` Max Krasnyansky
@ 2007-02-20 18:45           ` Christoph Lameter
  2007-02-20 20:05             ` Oleg Nesterov
  2007-02-20 21:05             ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky
  0 siblings, 2 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-02-20 18:45 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> > > Well seems that we have a set of unresolved issues with workqueues and cpu
> > > hotplug.
> 
> How about storing 'cpu' explicitly in the work queue instead of relying on the
> smp_processor_id() and friends ? That way there is no ambiguity when
> threads/timers get
> moved around.

The slab functionality is designed to work on the processor with the 
queue. These tricks will only cause more trouble in the future. The 
cache_reaper needs to be either disabled or run on the right processor. It 
should never run on the wrong processor.

The cache_reaper() is of no importance to hotplug. You just need to make 
sure that it is not in the way (disable it and if its running wait 
until the cache_reaper has finished).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 18:45           ` Christoph Lameter
@ 2007-02-20 20:05             ` Oleg Nesterov
  2007-02-20 21:22               ` Max Krasnyansky
  2007-02-20 21:05             ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky
  1 sibling, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-02-20 20:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Max Krasnyansky, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

On 02/20, Christoph Lameter wrote:
>
> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
> 
> > > > Well seems that we have a set of unresolved issues with workqueues and cpu
> > > > hotplug.
> > 
> > How about storing 'cpu' explicitly in the work queue instead of relying on the
> > smp_processor_id() and friends ? That way there is no ambiguity when
> > threads/timers get
> > moved around.
> 
> The slab functionality is designed to work on the processor with the 
> queue. These tricks will only cause more trouble in the future. The 
> cache_reaper needs to be either disabled or run on the right processor. It 
> should never run on the wrong processor.

I personally agree. Besides, cache_reaper is not alone. Note the comment
in debug_smp_processor_id() about cpu-bound processes. The slab does correct
thing right now, stops the timer on CPU_DEAD. Other problems imho should be
solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 18:45           ` Christoph Lameter
  2007-02-20 20:05             ` Oleg Nesterov
@ 2007-02-20 21:05             ` Max Krasnyansky
  2007-02-20 21:34               ` Christoph Lameter
  1 sibling, 1 reply; 25+ messages in thread
From: Max Krasnyansky @ 2007-02-20 21:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

Christoph Lameter wrote:
> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
> 
>>>> Well seems that we have a set of unresolved issues with workqueues and cpu
>>>> hotplug.
>> How about storing 'cpu' explicitly in the work queue instead of relying on the
>> smp_processor_id() and friends ? That way there is no ambiguity when
>> threads/timers get
>> moved around.
> 
> The slab functionality is designed to work on the processor with the 
> queue. These tricks will only cause more trouble in the future. The 
> cache_reaper needs to be either disabled or run on the right processor. It 
> should never run on the wrong processor.
> The cache_reaper() is of no importance to hotplug. You just need to make 
> sure that it is not in the way (disable it and if its running wait 
> until the cache_reaper has finished).

I agree that running the reaper on the wrong CPU is not the best way to go about it.
But it seems like disabling it is even worse, unless I missing something. ie wasting
memory.

btw What kind of troubles were you talking about ? Performance or robustness ?
As I said performance wise it does not make sense to run reaper on the wrong CPU but
it does seems to work just fine from the correctness (locking, etc) perspective. Again
it's totally possible that I'm missing something here :).

Thanks
Max

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 20:05             ` Oleg Nesterov
@ 2007-02-20 21:22               ` Max Krasnyansky
  2007-02-20 21:35                 ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Max Krasnyansky @ 2007-02-20 21:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Christoph Lameter, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

Oleg Nesterov wrote:
> On 02/20, Christoph Lameter wrote:
>> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
>>
>>>>> Well seems that we have a set of unresolved issues with workqueues and cpu
>>>>> hotplug.
>>> How about storing 'cpu' explicitly in the work queue instead of relying on the
>>> smp_processor_id() and friends ? That way there is no ambiguity when
>>> threads/timers get
>>> moved around.
>> The slab functionality is designed to work on the processor with the 
>> queue. These tricks will only cause more trouble in the future. The 
>> cache_reaper needs to be either disabled or run on the right processor. It 
>> should never run on the wrong processor.
> 
> I personally agree. Besides, cache_reaper is not alone. Note the comment
> in debug_smp_processor_id() about cpu-bound processes. The slab does correct
> thing right now, stops the timer on CPU_DEAD. Other problems imho should be
> solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that.

I guess I kind of hijacked the thread. The second part of my first email was
dropped. Basically I was saying that I'm working on CPU isolation extensions.
Where an isolated CPU is not supposed to do much kernel work. In which case
you'd want to run slab cache reaper on some other CPU on behalf of the isolated
one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
that you guys fixed the hotplug case it does not help in that scenario.

Max

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 21:05             ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky
@ 2007-02-20 21:34               ` Christoph Lameter
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-02-20 21:34 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> I agree that running the reaper on the wrong CPU is not the best way to go
> about it.
> But it seems like disabling it is even worse, unless I missing something. ie
> wasting
> memory.

Disabling during shutdown is no problem because the per cpu caches are 
going to be drained anyways by the cpu down code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 21:22               ` Max Krasnyansky
@ 2007-02-20 21:35                 ` Christoph Lameter
  2007-02-20 22:01                   ` Max Krasnyansky
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-02-20 21:35 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> I guess I kind of hijacked the thread. The second part of my first email was
> dropped. Basically I was saying that I'm working on CPU isolation extensions.
> Where an isolated CPU is not supposed to do much kernel work. In which case
> you'd want to run slab cache reaper on some other CPU on behalf of the
> isolated
> one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
> that you guys fixed the hotplug case it does not help in that scenario.

A cpu must have a per cpu cache in order to do slab allocations. The 
locking in the slab allocator depends on it.

If the isolated cpus have no need for slab allocations then you will also 
not need to run the slab_reaper().


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 21:35                 ` Christoph Lameter
@ 2007-02-20 22:01                   ` Max Krasnyansky
  2007-02-20 22:14                     ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Max Krasnyansky @ 2007-02-20 22:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

Christoph Lameter wrote:
> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
> 
>> I guess I kind of hijacked the thread. The second part of my first email was
>> dropped. Basically I was saying that I'm working on CPU isolation extensions.
>> Where an isolated CPU is not supposed to do much kernel work. In which case
>> you'd want to run slab cache reaper on some other CPU on behalf of the
>> isolated
>> one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now
>> that you guys fixed the hotplug case it does not help in that scenario.
> 
> A cpu must have a per cpu cache in order to do slab allocations. The 
> locking in the slab allocator depends on it.
> 
> If the isolated cpus have no need for slab allocations then you will also 
> not need to run the slab_reaper().

Ok. Sounds like disabling cache_reaper is a better option for now. Like you said
it's unlikely that slabs will grow much if that cpu is not heavily used by the
kernel.

Thanks
Max

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
  2007-02-20 22:01                   ` Max Krasnyansky
@ 2007-02-20 22:14                     ` Christoph Lameter
  2007-02-20 22:48                       ` SLAB cache reaper on isolated cpus Max Krasnyansky
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-02-20 22:14 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi,
	Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel

On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> Ok. Sounds like disabling cache_reaper is a better option for now. Like you
> said
> it's unlikely that slabs will grow much if that cpu is not heavily used by the
> kernel.

Running for prolonged times without cache_reaper is no good.

What we are talking about here is to disable the cache_reaper during cpu 
shutdown. The slab cpu shutdown will clean the per cpu caches anyways so 
we really do not need the slab_reaper running during cpu shutdown.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* SLAB cache reaper on isolated cpus
  2007-02-20 22:14                     ` Christoph Lameter
@ 2007-02-20 22:48                       ` Max Krasnyansky
  2007-02-20 23:19                         ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Max Krasnyansky @ 2007-02-20 22:48 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel

Christoph Lameter wrote:
> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
> 
>> Ok. Sounds like disabling cache_reaper is a better option for now. Like you said
>> it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel.
> 
> Running for prolonged times without cache_reaper is no good.
> 
> What we are talking about here is to disable the cache_reaper during cpu 
> shutdown. The slab cpu shutdown will clean the per cpu caches anyways so 
> we really do not need the slab_reaper running during cpu shutdown.

Ok. Let me restart the thread so that we're not confusing two issues :).

I'm not talking about CPU shutdown or CPU hotplug in general. My proposal seemed related
to the CPU shutdown issue that you guys were discussing, but it turns out it's not.

Suppose I need to isolate a CPU. We already support at the scheduler and irq levels (irq affinity).
But I want to go a bit further and avoid doing kernel work on isolated cpus as much as possible.
For example I would not want to schedule work queues and stuff on them.
Currently there are just a few users of the schedule_delayed_work_on(). cpufreq (don't care for
isolation purposes), oprofile (same here) and slab.
For the slab it'd be nice to run the reaper on some other CPU. But you're saying that locking
depends on CPU pinning. Is there any other option besides disabling cache reap ? Is there a way
for example to constraint the slabs on CPU X to not exceed N megs ?

Max





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: SLAB cache reaper on isolated cpus
  2007-02-20 22:48                       ` SLAB cache reaper on isolated cpus Max Krasnyansky
@ 2007-02-20 23:19                         ` Christoph Lameter
  2007-02-21  3:41                           ` Max Krasnyansky
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-02-20 23:19 UTC (permalink / raw)
  To: Max Krasnyansky; +Cc: linux-kernel

On Tue, 20 Feb 2007, Max Krasnyansky wrote:

> Suppose I need to isolate a CPU. We already support at the scheduler and 
> irq levels (irq affinity). But I want to go a bit further and avoid 
> doing kernel work on isolated cpus as much as possible. For example I 
> would not want to schedule work queues and stuff on them. Currently 
> there are just a few users of the schedule_delayed_work_on(). cpufreq 
> (don't care for isolation purposes), oprofile (same here) and slab. For 
> the slab it'd be nice to run the reaper on some other CPU. But you're 
> saying that locking depends on CPU pinning. Is there any other option 
> besides disabling cache reap ? Is there a way for example to constraint 
> the slabs on CPU X to not exceed N megs ?

There is no way to constrain the amount of slab work. In order to make the 
above work we would have to disable the per cpu caches for a certain cpu. 
Then there would be no need to run the cache reaper at all.

To some extend such functionality already exists. F.e. kmalloc_node() 
already bypasses the per cpu caches (most of the time).  kmalloc_node will 
have to take a spinlock on a shared cacheline on each invocation. kmalloc 
does only touch per cpu data during regular operations. Thus kmalloc() is much 
faster than kmalloc_node() and the cachelines for kmalloc() can be kept in 
the per cpu cache.

If we could disable all per cpu caches for certain cpus then you could 
make this work. All slab OS interference would be off the processor.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: SLAB cache reaper on isolated cpus
  2007-02-20 23:19                         ` Christoph Lameter
@ 2007-02-21  3:41                           ` Max Krasnyansky
  0 siblings, 0 replies; 25+ messages in thread
From: Max Krasnyansky @ 2007-02-21  3:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel

Christoph Lameter wrote:
> On Tue, 20 Feb 2007, Max Krasnyansky wrote:
> 
>> Suppose I need to isolate a CPU. We already support at the scheduler and 
>> irq levels (irq affinity). But I want to go a bit further and avoid 
>> doing kernel work on isolated cpus as much as possible. For example I 
>> would not want to schedule work queues and stuff on them. Currently 
>> there are just a few users of the schedule_delayed_work_on(). cpufreq 
>> (don't care for isolation purposes), oprofile (same here) and slab. For 
>> the slab it'd be nice to run the reaper on some other CPU. But you're 
>> saying that locking depends on CPU pinning. Is there any other option 
>> besides disabling cache reap ? Is there a way for example to constraint 
>> the slabs on CPU X to not exceed N megs ?
> 
> There is no way to constrain the amount of slab work. In order to make the 
> above work we would have to disable the per cpu caches for a certain cpu. 
> Then there would be no need to run the cache reaper at all.
> 
> To some extend such functionality already exists. F.e. kmalloc_node() 
> already bypasses the per cpu caches (most of the time).  kmalloc_node will 
> have to take a spinlock on a shared cacheline on each invocation. kmalloc 
> does only touch per cpu data during regular operations. Thus kmalloc() is much 
> faster than kmalloc_node() and the cachelines for kmalloc() can be kept in 
> the per cpu cache.
> 
> If we could disable all per cpu caches for certain cpus then you could 
> make this work. All slab OS interference would be off the processor.

Hmm. That's an idea. I'll play with it later today or tomorrow.

Thanks
Max


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-02-21  3:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-29  1:13 slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Oleg Nesterov
2007-01-29 16:54 ` Christoph Lameter
2007-01-29 17:19   ` Oleg Nesterov
2007-01-29 17:27     ` Christoph Lameter
2007-01-29 18:27       ` Oleg Nesterov
2007-01-29 19:09         ` Christoph Lameter
2007-01-29 19:29           ` Oleg Nesterov
2007-01-29 19:25         ` Christoph Lameter
2007-01-29 19:49           ` Oleg Nesterov
2007-01-29 20:29             ` Christoph Lameter
2007-01-29 21:05               ` Oleg Nesterov
2007-01-29 21:48                 ` Christoph Lameter
2007-01-29 22:14                   ` Oleg Nesterov
2007-02-20 18:39         ` Max Krasnyansky
2007-02-20 18:45           ` Christoph Lameter
2007-02-20 20:05             ` Oleg Nesterov
2007-02-20 21:22               ` Max Krasnyansky
2007-02-20 21:35                 ` Christoph Lameter
2007-02-20 22:01                   ` Max Krasnyansky
2007-02-20 22:14                     ` Christoph Lameter
2007-02-20 22:48                       ` SLAB cache reaper on isolated cpus Max Krasnyansky
2007-02-20 23:19                         ` Christoph Lameter
2007-02-21  3:41                           ` Max Krasnyansky
2007-02-20 21:05             ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky
2007-02-20 21:34               ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.