* slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems @ 2007-01-29 1:13 Oleg Nesterov 2007-01-29 16:54 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 1:13 UTC (permalink / raw) To: Christoph Lameter, Ingo Molnar Cc: Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel For the beginning, about another (but related) minor problem, debug_smp_processor_id: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ This is only true without CONFIG_HOTPLUG_CPU. Otherwise CPU can go away when the task takes a preemption or sleeps. I think we need #ifndef here. Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = &per_cpu(reap_work, cpu); if (keventd_up() && reap_work->work.func == NULL) { init_reap_node(cpu); INIT_DELAYED_WORK(reap_work, cache_reap); schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu)); } } This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu "cache_reap timer". With or without recent changes, it is possible that work->func() will run on another CPU (not that to which it was submitted) if CPU goes down. In fact, this can happen while work->func() is running, so even smp_processor_id() is not safe to use in work->func(). However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, this is the second problem. Is my understanding correct? Oleg. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 1:13 slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Oleg Nesterov @ 2007-01-29 16:54 ` Christoph Lameter 2007-01-29 17:19 ` Oleg Nesterov 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-01-29 16:54 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Mon, 29 Jan 2007, Oleg Nesterov wrote: > Now, > static void __devinit start_cpu_timer(int cpu) > { > struct delayed_work *reap_work = &per_cpu(reap_work, cpu); > > if (keventd_up() && reap_work->work.func == NULL) { > init_reap_node(cpu); > INIT_DELAYED_WORK(reap_work, cache_reap); > schedule_delayed_work_on(cpu, reap_work, > __round_jiffies_relative(HZ, cpu)); > } > } > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > CPU_UP will not restart a per-cpu "cache_reap timer". Why? > With or without recent changes, it is possible that work->func() will run on > another CPU (not that to which it was submitted) if CPU goes down. In fact, > this can happen while work->func() is running, so even smp_processor_id() > is not safe to use in work->func(). But the work func was scheduled by schedule_delayed_work_on(). Isnt that a general problem with schedule_delayed_work_on() and keventd? > However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, > this is the second problem. > > Is my understanding correct? cache_reap assumes that the processor id is stable based on the kevent thread being pinned to a processor. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 16:54 ` Christoph Lameter @ 2007-01-29 17:19 ` Oleg Nesterov 2007-01-29 17:27 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 17:19 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > Now, > > static void __devinit start_cpu_timer(int cpu) > > { > > struct delayed_work *reap_work = &per_cpu(reap_work, cpu); > > > > if (keventd_up() && reap_work->work.func == NULL) { > > init_reap_node(cpu); > > INIT_DELAYED_WORK(reap_work, cache_reap); > > schedule_delayed_work_on(cpu, reap_work, > > __round_jiffies_relative(HZ, cpu)); > > } > > } > > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > > CPU_UP will not restart a per-cpu "cache_reap timer". > > Why? Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's bug. > > With or without recent changes, it is possible that work->func() will run on > > another CPU (not that to which it was submitted) if CPU goes down. In fact, > > this can happen while work->func() is running, so even smp_processor_id() > > is not safe to use in work->func(). > > But the work func was scheduled by schedule_delayed_work_on(). Isnt that a > general problem with schedule_delayed_work_on() and keventd? I think this is yet another problem with workqueues/cpu-hotplug interaction. Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't garantee that smp_processor_id() is stable even if the thread is pinned to specific processor. > > However, cache_reap() seems to wrongly assume that smp_processor_id() is stable, > > this is the second problem. > > > > Is my understanding correct? > > cache_reap assumes that the processor id is stable based on the kevent > thread being pinned to a processor. I think cache_reap() is not alone, and this is not its fault. But please note another minor problem, void cache_reap(struct work_struct *unused) { ... schedule_delayed_work(&__get_cpu_var(reap_work), ...); } Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. This is absolutely harmless right now, but may be it's better to use container_of(unused, struct delayed_work, work). Oleg. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 17:19 ` Oleg Nesterov @ 2007-01-29 17:27 ` Christoph Lameter 2007-01-29 18:27 ` Oleg Nesterov 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-01-29 17:27 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > > > CPU_UP will not restart a per-cpu "cache_reap timer". > > > > Why? > > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL > we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's > bug. The CPU_DOWN would need to set work.func == NULL for this to work. But then the slab does not shut down the work queues for the processor. Isnt this another issue with workqueues? The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. > I think cache_reap() is not alone, and this is not its fault. > > But please note another minor problem, > > void cache_reap(struct work_struct *unused) > { > ... > > schedule_delayed_work(&__get_cpu_var(reap_work), ...); > } > > Even if smp_processor_id() was stable during the execution of cache_reap(), > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > avoid this, and this is correct. Uhh.... This may not be correct in terms of how the slab operates. > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. > This is absolutely harmless right now, but may be it's better to use > container_of(unused, struct delayed_work, work). Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 17:27 ` Christoph Lameter @ 2007-01-29 18:27 ` Oleg Nesterov 2007-01-29 19:09 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 18:27 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > > > > CPU_UP will not restart a per-cpu "cache_reap timer". > > > > > > Why? > > > > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL > > we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's > > bug. > > The CPU_DOWN would need to set work.func == NULL for this to work. But > then the slab does not shut down the work queues for the processor. Isnt > this another issue with workqueues? I think no, please see below. Actually, this is more related to timers for this particular case. > The slab would need a notification > that the workqueue for a processor was shutdown in order to set work.func > = NULL. The slab has a notification: CPU_XXX events. It should cancel a pending per cpu "reap_work timer". > > I think cache_reap() is not alone, and this is not its fault. > > > > But please note another minor problem, > > > > void cache_reap(struct work_struct *unused) > > { > > ... > > > > schedule_delayed_work(&__get_cpu_var(reap_work), ...); > > } > > > > Even if smp_processor_id() was stable during the execution of cache_reap(), > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > > avoid this, and this is correct. > > Uhh.... This may not be correct in terms of how the slab operates. But this is practically impossible to avoid. We can't delay CPU_DOWN until all workqueues flush their cwq->worklist. This is livelockable, the work can re-queue itself, and new works can be added since the dying CPU is still on cpu_online_map. This means that some pending works will be processed on another CPU. delayed_work is even worse, the timer can migrate as well. The first problem (smp_processor_id() is not stable) could be solved if we use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug. > > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. > > This is absolutely harmless right now, but may be it's better to use > > container_of(unused, struct delayed_work, work). > > Well seems that we have a set of unresolved issues with workqueues and cpu > hotplug. Yes. Oleg. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 18:27 ` Oleg Nesterov @ 2007-01-29 19:09 ` Christoph Lameter 2007-01-29 19:29 ` Oleg Nesterov 2007-01-29 19:25 ` Christoph Lameter 2007-02-20 18:39 ` Max Krasnyansky 2 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-01-29 19:09 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > The slab would need a notification > > that the workqueue for a processor was shutdown in order to set work.func > > = NULL. > > The slab has a notification: CPU_XXX events. It should cancel a pending per > cpu "reap_work timer". Ahh. Okay then the following patch would fix it? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.20-rc6/mm/slab.c =================================================================== --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.000000000 -0600 +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:08:05.773928988 -0600 @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru break; #ifdef CONFIG_HOTPLUG_CPU case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(&per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + mutex_lock(&cache_chain_mutex); break; case CPU_DOWN_FAILED: ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 19:09 ` Christoph Lameter @ 2007-01-29 19:29 ` Oleg Nesterov 0 siblings, 0 replies; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 19:29 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > The slab would need a notification > > > that the workqueue for a processor was shutdown in order to set work.func > > > = NULL. > > > > The slab has a notification: CPU_XXX events. It should cancel a pending per > > cpu "reap_work timer". > > Ahh. Okay then the following patch would fix it? > > Shutdown cache_reaper when cpu goes down > > Shutdown the cache_reaper in slab.c if the cpu is brought down > and set the cache_reap.func to NULL. Otherwise hotplug shuts > down the reaper for good. > > Signed-off-by: Christoph Lameter <clameter@sgi.com> > > Index: linux-2.6.20-rc6/mm/slab.c > =================================================================== > --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-24 20:19:28.000000000 -0600 > +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:08:05.773928988 -0600 > @@ -1269,6 +1269,10 @@ static int __cpuinit cpuup_callback(stru > break; > #ifdef CONFIG_HOTPLUG_CPU > case CPU_DOWN_PREPARE: > + /* Shutdown cache reaper */ > + cancel_rearming_delayed_work(&per_cpu(reap_work, cpu)); > + per_cpu(reap_work, cpu).work.func = NULL; > + > mutex_lock(&cache_chain_mutex); > break; > case CPU_DOWN_FAILED: Then CPU_DOWN_FAILED should do start_cpu_timer(cpu). Oleg. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 18:27 ` Oleg Nesterov 2007-01-29 19:09 ` Christoph Lameter @ 2007-01-29 19:25 ` Christoph Lameter 2007-01-29 19:49 ` Oleg Nesterov 2007-02-20 18:39 ` Max Krasnyansky 2 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-01-29 19:25 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > Even if smp_processor_id() was stable during the execution of cache_reap(), > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > > > avoid this, and this is correct. > > > > Uhh.... This may not be correct in terms of how the slab operates. > > But this is practically impossible to avoid. We can't delay CPU_DOWN until all > workqueues flush their cwq->worklist. This is livelockable, the work can re-queue > itself, and new works can be added since the dying CPU is still on cpu_online_map. > This means that some pending works will be processed on another CPU. But we could delay CPU_DOWN in the handler for the slab until we know that the cache_reaper is no longer running? > > > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. > > > This is absolutely harmless right now, but may be it's better to use > > > container_of(unused, struct delayed_work, work). There is more where that is coming from. cache_reap determines the current cpu in order to find the correct per cpu cache and also determines the current node. If you move cache_reap to another processor / node then it will just clean that one and not do anything for the processor that you wanted it to run for. If we change processors in the middle of the run then it may do some actions on one cpu and some on another.... Having said that here is the patch that insures that makes cache_reap no longer use per_cpu to access the work structure. This may be a cleanup on its own but it will not address your issues. Use parameter passed to cache_reap to determine pointer to work structure Use the pointer passed to cache_reap to determine the work pointer and consolidate exit paths. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.20-rc6/mm/slab.c =================================================================== --- linux-2.6.20-rc6.orig/mm/slab.c 2007-01-29 13:08:05.000000000 -0600 +++ linux-2.6.20-rc6/mm/slab.c 2007-01-29 13:17:01.695680898 -0600 @@ -4021,18 +4021,17 @@ void drain_array(struct kmem_cache *cach * If we cannot acquire the cache chain mutex then just give up - we'll try * again on the next iteration. */ -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *w) { struct kmem_cache *searchp; struct kmem_list3 *l3; int node = numa_node_id(); + struct delayed_work *work = + container_of(w, struct delayed_work, work); - if (!mutex_trylock(&cache_chain_mutex)) { + if (!mutex_trylock(&cache_chain_mutex)) /* Give up. Setup the next iteration. */ - schedule_delayed_work(&__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); - return; - } + goto out; list_for_each_entry(searchp, &cache_chain, next) { check_irq_on(); @@ -4075,9 +4074,9 @@ next: mutex_unlock(&cache_chain_mutex); next_reap_node(); refresh_cpu_vm_stats(smp_processor_id()); +out: /* Set up the next iteration */ - schedule_delayed_work(&__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC)); } #ifdef CONFIG_PROC_FS ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 19:25 ` Christoph Lameter @ 2007-01-29 19:49 ` Oleg Nesterov 2007-01-29 20:29 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 19:49 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > > Even if smp_processor_id() was stable during the execution of cache_reap(), > > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > > > > avoid this, and this is correct. > > > > > > Uhh.... This may not be correct in terms of how the slab operates. > > > > But this is practically impossible to avoid. We can't delay CPU_DOWN until all > > workqueues flush their cwq->worklist. This is livelockable, the work can re-queue > > itself, and new works can be added since the dying CPU is still on cpu_online_map. > > This means that some pending works will be processed on another CPU. > > But we could delay CPU_DOWN in the handler for the slab until we know that > the cache_reaper is no longer running? Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper like you did in the previous patch. Did you mean this? If yes - then yes :) > > > > This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. > > > > This is absolutely harmless right now, but may be it's better to use > > > > container_of(unused, struct delayed_work, work). > > There is more where that is coming from. cache_reap determines the current > cpu in order to find the correct per cpu cache and also determines the > current node. If you move cache_reap to another processor / node then it > will just clean that one and not do anything for the processor that you > wanted it to run for. Worse, we can have 2 handlers running in parallel on the same CPU. But this is fixed by your previous patch, I believe. > If we change processors in the middle of the run > then it may do some actions on one cpu and some on another.... Yep. For example, next_reap_node() will not be happy if we change CPU in the middle. But this is _extremely_ unlikely, can only happen on CPU_DOWN, and cache_reap() should not care about this. > -static void cache_reap(struct work_struct *unused) > +static void cache_reap(struct work_struct *w) > { > struct kmem_cache *searchp; > struct kmem_list3 *l3; > int node = numa_node_id(); > + struct delayed_work *work = > + container_of(w, struct delayed_work, work); > > ... > > + schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_CPUC)); Actually, I was wrong. Yes, this should work, but only with your previous patch. Otherwise, if the handler runs on the "wrong" CPU (this is not possible since you added cancel_delayed_work()), we are in fact starting a second reaper on the same CPU, not good. Oleg. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 19:49 ` Oleg Nesterov @ 2007-01-29 20:29 ` Christoph Lameter 2007-01-29 21:05 ` Oleg Nesterov 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-01-29 20:29 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > But we could delay CPU_DOWN in the handler for the slab until we know that > > the cache_reaper is no longer running? > > Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper > like you did in the previous patch. Did you mean this? If yes - then yes :) Yes sure. > Worse, we can have 2 handlers running in parallel on the same CPU. But this > is fixed by your previous patch, I believe. Good. Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and CPU_DOWN_FAILED somehow vanished in mm? Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.20-rc6-mm1/mm/slab.c =================================================================== --- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.000000000 -0600 +++ linux-2.6.20-rc6-mm1/mm/slab.c 2007-01-29 14:21:18.119155877 -0600 @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(&per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + + mutex_lock(&cache_chain_mutex); + break; + case CPU_DOWN_FAILED: + mutex_unlock(&cache_chain_mutex); + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 20:29 ` Christoph Lameter @ 2007-01-29 21:05 ` Oleg Nesterov 2007-01-29 21:48 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 21:05 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 01/29, Christoph Lameter wrote: > > Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and > CPU_DOWN_FAILED somehow vanished in mm? No, no, there are still in place, so I believe your patch is good. Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. > Shutdown cache_reaper when cpu goes down > > Shutdown the cache_reaper in slab.c if the cpu is brought down > and set the cache_reap.func to NULL. Otherwise hotplug shuts > down the reaper for good. > > Signed-off-by: Christoph Lameter <clameter@sgi.com> > > Index: linux-2.6.20-rc6-mm1/mm/slab.c > =================================================================== > --- linux-2.6.20-rc6-mm1.orig/mm/slab.c 2007-01-29 14:18:37.000000000 -0600 > +++ linux-2.6.20-rc6-mm1/mm/slab.c 2007-01-29 14:21:18.119155877 -0600 > @@ -1271,6 +1271,17 @@ static int __cpuinit cpuup_callback(stru > start_cpu_timer(cpu); > break; > #ifdef CONFIG_HOTPLUG_CPU > + case CPU_DOWN_PREPARE: > + /* Shutdown cache reaper */ > + cancel_rearming_delayed_work(&per_cpu(reap_work, cpu)); > + per_cpu(reap_work, cpu).work.func = NULL; > + > + mutex_lock(&cache_chain_mutex); > + break; > + case CPU_DOWN_FAILED: > + mutex_unlock(&cache_chain_mutex); > + start_cpu_timer(cpu); > + break; > case CPU_DEAD: > /* > * Even if all the cpus of a node are down, we don't free the ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 21:05 ` Oleg Nesterov @ 2007-01-29 21:48 ` Christoph Lameter 2007-01-29 22:14 ` Oleg Nesterov 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-01-29 21:48 UTC (permalink / raw) To: Oleg Nesterov Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Tue, 30 Jan 2007, Oleg Nesterov wrote: > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, > so cpuup_callback() can use them to lock/unlock cache_chain_mutex, > but this is not related. Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX anymore, right? Which brings us to this form of the patch: Shutdown cache_reaper when cpu goes down Shutdown the cache_reaper in slab.c if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.20-rc6-mm2/mm/slab.c =================================================================== --- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600 +++ linux-2.6.20-rc6-mm2/mm/slab.c 2007-01-29 15:47:18.293962726 -0600 @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru start_cpu_timer(cpu); break; #ifdef CONFIG_HOTPLUG_CPU + case CPU_DOWN_PREPARE: + /* Shutdown cache reaper */ + cancel_rearming_delayed_work(&per_cpu(reap_work, cpu)); + per_cpu(reap_work, cpu).work.func = NULL; + break; + case CPU_DOWN_FAILED: + start_cpu_timer(cpu); + break; case CPU_DEAD: /* * Even if all the cpus of a node are down, we don't free the ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 21:48 ` Christoph Lameter @ 2007-01-29 22:14 ` Oleg Nesterov 0 siblings, 0 replies; 25+ messages in thread From: Oleg Nesterov @ 2007-01-29 22:14 UTC (permalink / raw) To: Christoph Lameter Cc: Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 01/29, Christoph Lameter wrote: > > On Tue, 30 Jan 2007, Oleg Nesterov wrote: > > > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, > > so cpuup_callback() can use them to lock/unlock cache_chain_mutex, > > but this is not related. > > Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX > anymore, right? Which brings us to this form of the patch: Yes, if CPU_LOCK_ACQUIRE takes cache_chain_mutex, we don't need to do so on CPU_DOWN_PREPARE. I didn't know cpuup_callback() was already converted, sorry for the confusion! Note: with the patch below we are doing cancel_rearming_delayed_work() under cache_chain_mutex. This is ok since cache_reap() does mutex_trylock(), so deadlock is not possible. However, this means that mutex_trylock() becomes mandatory for cache_reap(), probably a little comment will be good. > Shutdown cache_reaper when cpu goes down > > Shutdown the cache_reaper in slab.c if the cpu is brought down > and set the cache_reap.func to NULL. Otherwise hotplug shuts > down the reaper for good. > > Signed-off-by: Christoph Lameter <clameter@sgi.com> > > Index: linux-2.6.20-rc6-mm2/mm/slab.c > =================================================================== > --- linux-2.6.20-rc6-mm2.orig/mm/slab.c 2007-01-29 14:27:34.199229828 -0600 > +++ linux-2.6.20-rc6-mm2/mm/slab.c 2007-01-29 15:47:18.293962726 -0600 > @@ -1271,6 +1271,14 @@ static int __cpuinit cpuup_callback(stru > start_cpu_timer(cpu); > break; > #ifdef CONFIG_HOTPLUG_CPU > + case CPU_DOWN_PREPARE: > + /* Shutdown cache reaper */ > + cancel_rearming_delayed_work(&per_cpu(reap_work, cpu)); > + per_cpu(reap_work, cpu).work.func = NULL; > + break; > + case CPU_DOWN_FAILED: > + start_cpu_timer(cpu); > + break; > case CPU_DEAD: > /* > * Even if all the cpus of a node are down, we don't free the ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-01-29 18:27 ` Oleg Nesterov 2007-01-29 19:09 ` Christoph Lameter 2007-01-29 19:25 ` Christoph Lameter @ 2007-02-20 18:39 ` Max Krasnyansky 2007-02-20 18:45 ` Christoph Lameter 2 siblings, 1 reply; 25+ messages in thread From: Max Krasnyansky @ 2007-02-20 18:39 UTC (permalink / raw) To: Oleg Nesterov Cc: Christoph Lameter, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel Folks, Oleg Nesterov wrote: >>> Even if smp_processor_id() was stable during the execution of cache_reap(), >>> this work_struct can be moved to another CPU if CPU_DEAD happens. We can't >>> avoid this, and this is correct. >> Uhh.... This may not be correct in terms of how the slab operates. > > But this is practically impossible to avoid. We can't delay CPU_DOWN until all > workqueues flush their cwq->worklist. This is livelockable, the work can re-queue > itself, and new works can be added since the dying CPU is still on cpu_online_map. > This means that some pending works will be processed on another CPU. > > delayed_work is even worse, the timer can migrate as well. > > The first problem (smp_processor_id() is not stable) could be solved if we > use freezer or with the help of not-yet-implemented scalable lock_cpu_hotplug. > >>> This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work. >>> This is absolutely harmless right now, but may be it's better to use >>> container_of(unused, struct delayed_work, work). >> Well seems that we have a set of unresolved issues with workqueues and cpu >> hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when threads/timers get moved around. I'm cooking a set of patches to extend cpu isolation concept a bit. In which case I'd like one CPU to run cache_reap timer on behalf of another cpu. See the patch below. diff --git a/mm/slab.c b/mm/slab.c index c610062..0f46d11 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -766,7 +766,17 @@ int slab_is_available(void) return g_cpucache_up == FULL; } -static DEFINE_PER_CPU(struct delayed_work, reap_work); +struct slab_work { + struct delayed_work dw; + unsigned int cpu; +}; + +static DEFINE_PER_CPU(struct slab_work, reap_work); + +static inline struct array_cache *cpu_cache_get_on(struct kmem_cache *cachep, unsigned int cpu) +{ + return cachep->array[cpu]; +} static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep) { @@ -915,9 +925,9 @@ static void init_reap_node(int cpu) per_cpu(reap_node, cpu) = node; } -static void next_reap_node(void) +static void next_reap_node(unsigned int cpu) { - int node = __get_cpu_var(reap_node); + int node = per_cpu(reap_node, cpu); /* * Also drain per cpu pages on remote zones @@ -928,12 +938,12 @@ static void next_reap_node(void) node = next_node(node, node_online_map); if (unlikely(node >= MAX_NUMNODES)) node = first_node(node_online_map); - __get_cpu_var(reap_node) = node; + per_cpu(reap_node, cpu) = node; } #else #define init_reap_node(cpu) do { } while (0) -#define next_reap_node(void) do { } while (0) +#define next_reap_node(cpu) do { } while (0) #endif /* @@ -945,17 +955,18 @@ static void next_reap_node(void) */ static void __devinit start_cpu_timer(int cpu) { - struct delayed_work *reap_work = &per_cpu(reap_work, cpu); + struct slab_work *reap_work = &per_cpu(reap_work, cpu); /* * When this gets called from do_initcalls via cpucache_init(), * init_workqueues() has already run, so keventd will be setup * at that time. */ - if (keventd_up() && reap_work->work.func == NULL) { + if (keventd_up() && reap_work->dw.work.func == NULL) { init_reap_node(cpu); - INIT_DELAYED_WORK(reap_work, cache_reap); - schedule_delayed_work_on(cpu, reap_work, + INIT_DELAYED_WORK(&reap_work->dw, cache_reap); + reap_work->cpu = cpu; + schedule_delayed_work_on(cpu, &reap_work->dw, __round_jiffies_relative(HZ, cpu)); } } @@ -1004,7 +1015,7 @@ static int transfer_objects(struct array_cache *to, #ifndef CONFIG_NUMA #define drain_alien_cache(cachep, alien) do { } while (0) -#define reap_alien(cachep, l3) do { } while (0) +#define reap_alien(cachep, l3, cpu) do { } while (0) static inline struct array_cache **alloc_alien_cache(int node, int limit) { @@ -1099,9 +1110,9 @@ static void __drain_alien_cache(struct kmem_cache *cachep, /* * Called from cache_reap() to regularly drain alien caches round robin. */ -static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3) +static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3, unsigned int cpu) { - int node = __get_cpu_var(reap_node); + int node = per_cpu(reap_node, cpu); if (l3->alien) { struct array_cache *ac = l3->alien[node]; @@ -4017,16 +4028,17 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3, * If we cannot acquire the cache chain mutex then just give up - we'll try * again on the next iteration. */ -static void cache_reap(struct work_struct *unused) +static void cache_reap(struct work_struct *_work) { struct kmem_cache *searchp; struct kmem_list3 *l3; int node = numa_node_id(); + struct slab_work *work = (struct slab_work *) _work; + if (!mutex_trylock(&cache_chain_mutex)) { /* Give up. Setup the next iteration. */ - schedule_delayed_work(&__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); + schedule_delayed_work(&work->dw, round_jiffies_relative(REAPTIMEOUT_CPUC)); return; } @@ -4040,9 +4052,9 @@ static void cache_reap(struct work_struct *unused) */ l3 = searchp->nodelists[node]; - reap_alien(searchp, l3); + reap_alien(searchp, l3, work->cpu); - drain_array(searchp, l3, cpu_cache_get(searchp), 0, node); + drain_array(searchp, l3, cpu_cache_get_on(searchp, work->cpu), 0, node); /* * These are racy checks but it does not matter @@ -4069,11 +4081,11 @@ next: } check_irq_on(); mutex_unlock(&cache_chain_mutex); - next_reap_node(); - refresh_cpu_vm_stats(smp_processor_id()); + next_reap_node(work->cpu); + refresh_cpu_vm_stats(work->cpu); + /* Set up the next iteration */ - schedule_delayed_work(&__get_cpu_var(reap_work), - round_jiffies_relative(REAPTIMEOUT_CPUC)); + schedule_delayed_work(&work->dw, round_jiffies_relative(REAPTIMEOUT_CPUC)); } #ifdef CONFIG_PROC_FS Max ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 18:39 ` Max Krasnyansky @ 2007-02-20 18:45 ` Christoph Lameter 2007-02-20 20:05 ` Oleg Nesterov 2007-02-20 21:05 ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky 0 siblings, 2 replies; 25+ messages in thread From: Christoph Lameter @ 2007-02-20 18:45 UTC (permalink / raw) To: Max Krasnyansky Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Tue, 20 Feb 2007, Max Krasnyansky wrote: > > > Well seems that we have a set of unresolved issues with workqueues and cpu > > > hotplug. > > How about storing 'cpu' explicitly in the work queue instead of relying on the > smp_processor_id() and friends ? That way there is no ambiguity when > threads/timers get > moved around. The slab functionality is designed to work on the processor with the queue. These tricks will only cause more trouble in the future. The cache_reaper needs to be either disabled or run on the right processor. It should never run on the wrong processor. The cache_reaper() is of no importance to hotplug. You just need to make sure that it is not in the way (disable it and if its running wait until the cache_reaper has finished). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 18:45 ` Christoph Lameter @ 2007-02-20 20:05 ` Oleg Nesterov 2007-02-20 21:22 ` Max Krasnyansky 2007-02-20 21:05 ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky 1 sibling, 1 reply; 25+ messages in thread From: Oleg Nesterov @ 2007-02-20 20:05 UTC (permalink / raw) To: Christoph Lameter Cc: Max Krasnyansky, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On 02/20, Christoph Lameter wrote: > > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > > > > > Well seems that we have a set of unresolved issues with workqueues and cpu > > > > hotplug. > > > > How about storing 'cpu' explicitly in the work queue instead of relying on the > > smp_processor_id() and friends ? That way there is no ambiguity when > > threads/timers get > > moved around. > > The slab functionality is designed to work on the processor with the > queue. These tricks will only cause more trouble in the future. The > cache_reaper needs to be either disabled or run on the right processor. It > should never run on the wrong processor. I personally agree. Besides, cache_reaper is not alone. Note the comment in debug_smp_processor_id() about cpu-bound processes. The slab does correct thing right now, stops the timer on CPU_DEAD. Other problems imho should be solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that. Oleg. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 20:05 ` Oleg Nesterov @ 2007-02-20 21:22 ` Max Krasnyansky 2007-02-20 21:35 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Max Krasnyansky @ 2007-02-20 21:22 UTC (permalink / raw) To: Oleg Nesterov Cc: Christoph Lameter, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel Oleg Nesterov wrote: > On 02/20, Christoph Lameter wrote: >> On Tue, 20 Feb 2007, Max Krasnyansky wrote: >> >>>>> Well seems that we have a set of unresolved issues with workqueues and cpu >>>>> hotplug. >>> How about storing 'cpu' explicitly in the work queue instead of relying on the >>> smp_processor_id() and friends ? That way there is no ambiguity when >>> threads/timers get >>> moved around. >> The slab functionality is designed to work on the processor with the >> queue. These tricks will only cause more trouble in the future. The >> cache_reaper needs to be either disabled or run on the right processor. It >> should never run on the wrong processor. > > I personally agree. Besides, cache_reaper is not alone. Note the comment > in debug_smp_processor_id() about cpu-bound processes. The slab does correct > thing right now, stops the timer on CPU_DEAD. Other problems imho should be > solved by fixing cpu-hotplug. Gautham and Srivatsa are working on that. I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run slab cache reaper on some other CPU on behalf of the isolated one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now that you guys fixed the hotplug case it does not help in that scenario. Max ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 21:22 ` Max Krasnyansky @ 2007-02-20 21:35 ` Christoph Lameter 2007-02-20 22:01 ` Max Krasnyansky 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-02-20 21:35 UTC (permalink / raw) To: Max Krasnyansky Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Tue, 20 Feb 2007, Max Krasnyansky wrote: > I guess I kind of hijacked the thread. The second part of my first email was > dropped. Basically I was saying that I'm working on CPU isolation extensions. > Where an isolated CPU is not supposed to do much kernel work. In which case > you'd want to run slab cache reaper on some other CPU on behalf of the > isolated > one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now > that you guys fixed the hotplug case it does not help in that scenario. A cpu must have a per cpu cache in order to do slab allocations. The locking in the slab allocator depends on it. If the isolated cpus have no need for slab allocations then you will also not need to run the slab_reaper(). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 21:35 ` Christoph Lameter @ 2007-02-20 22:01 ` Max Krasnyansky 2007-02-20 22:14 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Max Krasnyansky @ 2007-02-20 22:01 UTC (permalink / raw) To: Christoph Lameter Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel Christoph Lameter wrote: > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > >> I guess I kind of hijacked the thread. The second part of my first email was >> dropped. Basically I was saying that I'm working on CPU isolation extensions. >> Where an isolated CPU is not supposed to do much kernel work. In which case >> you'd want to run slab cache reaper on some other CPU on behalf of the >> isolated >> one. Hence the proposal to explicitly pass cpu_id to the reaper. I guess now >> that you guys fixed the hotplug case it does not help in that scenario. > > A cpu must have a per cpu cache in order to do slab allocations. The > locking in the slab allocator depends on it. > > If the isolated cpus have no need for slab allocations then you will also > not need to run the slab_reaper(). Ok. Sounds like disabling cache_reaper is a better option for now. Like you said it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel. Thanks Max ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 22:01 ` Max Krasnyansky @ 2007-02-20 22:14 ` Christoph Lameter 2007-02-20 22:48 ` SLAB cache reaper on isolated cpus Max Krasnyansky 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-02-20 22:14 UTC (permalink / raw) To: Max Krasnyansky Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Tue, 20 Feb 2007, Max Krasnyansky wrote: > Ok. Sounds like disabling cache_reaper is a better option for now. Like you > said > it's unlikely that slabs will grow much if that cpu is not heavily used by the > kernel. Running for prolonged times without cache_reaper is no good. What we are talking about here is to disable the cache_reaper during cpu shutdown. The slab cpu shutdown will clean the per cpu caches anyways so we really do not need the slab_reaper running during cpu shutdown. ^ permalink raw reply [flat|nested] 25+ messages in thread
* SLAB cache reaper on isolated cpus 2007-02-20 22:14 ` Christoph Lameter @ 2007-02-20 22:48 ` Max Krasnyansky 2007-02-20 23:19 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Max Krasnyansky @ 2007-02-20 22:48 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-kernel Christoph Lameter wrote: > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > >> Ok. Sounds like disabling cache_reaper is a better option for now. Like you said >> it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel. > > Running for prolonged times without cache_reaper is no good. > > What we are talking about here is to disable the cache_reaper during cpu > shutdown. The slab cpu shutdown will clean the per cpu caches anyways so > we really do not need the slab_reaper running during cpu shutdown. Ok. Let me restart the thread so that we're not confusing two issues :). I'm not talking about CPU shutdown or CPU hotplug in general. My proposal seemed related to the CPU shutdown issue that you guys were discussing, but it turns out it's not. Suppose I need to isolate a CPU. We already support at the scheduler and irq levels (irq affinity). But I want to go a bit further and avoid doing kernel work on isolated cpus as much as possible. For example I would not want to schedule work queues and stuff on them. Currently there are just a few users of the schedule_delayed_work_on(). cpufreq (don't care for isolation purposes), oprofile (same here) and slab. For the slab it'd be nice to run the reaper on some other CPU. But you're saying that locking depends on CPU pinning. Is there any other option besides disabling cache reap ? Is there a way for example to constraint the slabs on CPU X to not exceed N megs ? Max ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: SLAB cache reaper on isolated cpus 2007-02-20 22:48 ` SLAB cache reaper on isolated cpus Max Krasnyansky @ 2007-02-20 23:19 ` Christoph Lameter 2007-02-21 3:41 ` Max Krasnyansky 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2007-02-20 23:19 UTC (permalink / raw) To: Max Krasnyansky; +Cc: linux-kernel On Tue, 20 Feb 2007, Max Krasnyansky wrote: > Suppose I need to isolate a CPU. We already support at the scheduler and > irq levels (irq affinity). But I want to go a bit further and avoid > doing kernel work on isolated cpus as much as possible. For example I > would not want to schedule work queues and stuff on them. Currently > there are just a few users of the schedule_delayed_work_on(). cpufreq > (don't care for isolation purposes), oprofile (same here) and slab. For > the slab it'd be nice to run the reaper on some other CPU. But you're > saying that locking depends on CPU pinning. Is there any other option > besides disabling cache reap ? Is there a way for example to constraint > the slabs on CPU X to not exceed N megs ? There is no way to constrain the amount of slab work. In order to make the above work we would have to disable the per cpu caches for a certain cpu. Then there would be no need to run the cache reaper at all. To some extend such functionality already exists. F.e. kmalloc_node() already bypasses the per cpu caches (most of the time). kmalloc_node will have to take a spinlock on a shared cacheline on each invocation. kmalloc does only touch per cpu data during regular operations. Thus kmalloc() is much faster than kmalloc_node() and the cachelines for kmalloc() can be kept in the per cpu cache. If we could disable all per cpu caches for certain cpus then you could make this work. All slab OS interference would be off the processor. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: SLAB cache reaper on isolated cpus 2007-02-20 23:19 ` Christoph Lameter @ 2007-02-21 3:41 ` Max Krasnyansky 0 siblings, 0 replies; 25+ messages in thread From: Max Krasnyansky @ 2007-02-21 3:41 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-kernel Christoph Lameter wrote: > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > >> Suppose I need to isolate a CPU. We already support at the scheduler and >> irq levels (irq affinity). But I want to go a bit further and avoid >> doing kernel work on isolated cpus as much as possible. For example I >> would not want to schedule work queues and stuff on them. Currently >> there are just a few users of the schedule_delayed_work_on(). cpufreq >> (don't care for isolation purposes), oprofile (same here) and slab. For >> the slab it'd be nice to run the reaper on some other CPU. But you're >> saying that locking depends on CPU pinning. Is there any other option >> besides disabling cache reap ? Is there a way for example to constraint >> the slabs on CPU X to not exceed N megs ? > > There is no way to constrain the amount of slab work. In order to make the > above work we would have to disable the per cpu caches for a certain cpu. > Then there would be no need to run the cache reaper at all. > > To some extend such functionality already exists. F.e. kmalloc_node() > already bypasses the per cpu caches (most of the time). kmalloc_node will > have to take a spinlock on a shared cacheline on each invocation. kmalloc > does only touch per cpu data during regular operations. Thus kmalloc() is much > faster than kmalloc_node() and the cachelines for kmalloc() can be kept in > the per cpu cache. > > If we could disable all per cpu caches for certain cpus then you could > make this work. All slab OS interference would be off the processor. Hmm. That's an idea. I'll play with it later today or tomorrow. Thanks Max ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 18:45 ` Christoph Lameter 2007-02-20 20:05 ` Oleg Nesterov @ 2007-02-20 21:05 ` Max Krasnyansky 2007-02-20 21:34 ` Christoph Lameter 1 sibling, 1 reply; 25+ messages in thread From: Max Krasnyansky @ 2007-02-20 21:05 UTC (permalink / raw) To: Christoph Lameter Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel Christoph Lameter wrote: > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > >>>> Well seems that we have a set of unresolved issues with workqueues and cpu >>>> hotplug. >> How about storing 'cpu' explicitly in the work queue instead of relying on the >> smp_processor_id() and friends ? That way there is no ambiguity when >> threads/timers get >> moved around. > > The slab functionality is designed to work on the processor with the > queue. These tricks will only cause more trouble in the future. The > cache_reaper needs to be either disabled or run on the right processor. It > should never run on the wrong processor. > The cache_reaper() is of no importance to hotplug. You just need to make > sure that it is not in the way (disable it and if its running wait > until the cache_reaper has finished). I agree that running the reaper on the wrong CPU is not the best way to go about it. But it seems like disabling it is even worse, unless I missing something. ie wasting memory. btw What kind of troubles were you talking about ? Performance or robustness ? As I said performance wise it does not make sense to run reaper on the wrong CPU but it does seems to work just fine from the correctness (locking, etc) perspective. Again it's totally possible that I'm missing something here :). Thanks Max ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems 2007-02-20 21:05 ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky @ 2007-02-20 21:34 ` Christoph Lameter 0 siblings, 0 replies; 25+ messages in thread From: Christoph Lameter @ 2007-02-20 21:34 UTC (permalink / raw) To: Max Krasnyansky Cc: Oleg Nesterov, Ingo Molnar, Srivatsa Vaddagiri, Pallipadi, Venkatesh, Gautham shenoy, Andrew Morton, linux-kernel On Tue, 20 Feb 2007, Max Krasnyansky wrote: > I agree that running the reaper on the wrong CPU is not the best way to go > about it. > But it seems like disabling it is even worse, unless I missing something. ie > wasting > memory. Disabling during shutdown is no problem because the per cpu caches are going to be drained anyways by the cpu down code. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-02-21 3:42 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-01-29 1:13 slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Oleg Nesterov 2007-01-29 16:54 ` Christoph Lameter 2007-01-29 17:19 ` Oleg Nesterov 2007-01-29 17:27 ` Christoph Lameter 2007-01-29 18:27 ` Oleg Nesterov 2007-01-29 19:09 ` Christoph Lameter 2007-01-29 19:29 ` Oleg Nesterov 2007-01-29 19:25 ` Christoph Lameter 2007-01-29 19:49 ` Oleg Nesterov 2007-01-29 20:29 ` Christoph Lameter 2007-01-29 21:05 ` Oleg Nesterov 2007-01-29 21:48 ` Christoph Lameter 2007-01-29 22:14 ` Oleg Nesterov 2007-02-20 18:39 ` Max Krasnyansky 2007-02-20 18:45 ` Christoph Lameter 2007-02-20 20:05 ` Oleg Nesterov 2007-02-20 21:22 ` Max Krasnyansky 2007-02-20 21:35 ` Christoph Lameter 2007-02-20 22:01 ` Max Krasnyansky 2007-02-20 22:14 ` Christoph Lameter 2007-02-20 22:48 ` SLAB cache reaper on isolated cpus Max Krasnyansky 2007-02-20 23:19 ` Christoph Lameter 2007-02-21 3:41 ` Max Krasnyansky 2007-02-20 21:05 ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky 2007-02-20 21:34 ` Christoph Lameter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.