Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] mm, slab: reschedule cache_reap() on the same CPU
@ 2018-04-10  8:15 Vlastimil Babka
  2018-04-10 14:12 ` Christopher Lameter
  2018-04-11  7:00 ` [PATCH] " Vlastimil Babka
  0 siblings, 2 replies; 12+ messages in thread
From: Vlastimil Babka @ 2018-04-10  8:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Vlastimil Babka, Joonsoo Kim, David Rientjes,
	Pekka Enberg, Christoph Lameter, Tejun Heo, Lai Jiangshan,
	John Stultz, Thomas Gleixner, Stephen Boyd

cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled via
schedule_delayed_work(), thus using WORK_CPU_UNBOUND.

AFAIU there is thus no guarantee the future iterations will happen on the
intended cpu, although it's preferred. I was able to demonstrate this with
/sys/module/workqueue/parameters/debug_force_rr_cpu. IIUC the timer code, it
may also happen due to migrating timers in nohz context. As a result, some
cpu's would be calling cache_reap() more frequently and others never.

What would be even worse is a potential scenario where WORK_CPU_UNBOUND would
result in being run via kworker thread that's not pinned to any single CPU
(although I haven't observed that in my simple tests). Migration to another CPU
during cache_reap() e.g. between cpu_cache_get() and drain_array() would result
in operating on non-local cpu array cache and might race with the other cpu.
Migration to another numa node than the one obtained with numa_mem_id() could
result in slabs being moved to a list on a wrong node, which would then be
modified with a wrong lock, againn potentially racing.

This patch makes sure schedule_delayed_work_on() is used with the proper cpu
when scheduling the next iteration. The cpu is stored with delayed_work on a
new slab_reap_work_struct super-structure.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
---
Hi,

this patch is a result of hunting some rare crashes in our (4.4-based) kernel
where slabs misplaced on wrong nodes were identified in the crash dumps. I
don't yet know if cache_reap() is the culprit and if this patch fill fix it,
but the problem seems real to me nevertheless. I CC'd workqueue and timer
maintainers and would like to check if my assumptions in changelog are correct,
and especially if there's a guarantee that work scheduled with
schedule_delayed_work_on(cpu) will never migrate to another cpu. If that's not
guaranteed (including past stable kernel versions), we will have to be even
more careful and e.g. disable interrupts sooner.

Thanks,
Vlastimil

 mm/slab.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9095c3945425..b3e3d082099c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -429,7 +429,12 @@ static struct kmem_cache kmem_cache_boot = {
 	.name = "kmem_cache",
 };
 
-static DEFINE_PER_CPU(struct delayed_work, slab_reap_work);
+struct slab_reap_work_struct {
+	struct delayed_work dwork;
+	int cpu;
+};
+
+static DEFINE_PER_CPU(struct slab_reap_work_struct, slab_reap_work);
 
 static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
 {
@@ -551,12 +556,15 @@ static void next_reap_node(void)
  */
 static void start_cpu_timer(int cpu)
 {
-	struct delayed_work *reap_work = &per_cpu(slab_reap_work, cpu);
 
-	if (reap_work->work.func == NULL) {
+	struct slab_reap_work_struct *reap_work = &per_cpu(slab_reap_work, cpu);
+	struct delayed_work *dwork = &reap_work->dwork;
+
+	if (dwork->work.func == NULL) {
+		reap_work->cpu = cpu;
 		init_reap_node(cpu);
-		INIT_DEFERRABLE_WORK(reap_work, cache_reap);
-		schedule_delayed_work_on(cpu, reap_work,
+		INIT_DEFERRABLE_WORK(dwork, cache_reap);
+		schedule_delayed_work_on(cpu, dwork,
 					__round_jiffies_relative(HZ, cpu));
 	}
 }
@@ -1120,9 +1128,9 @@ static int slab_offline_cpu(unsigned int cpu)
 	 * expensive but will only modify reap_work and reschedule the
 	 * timer.
 	 */
-	cancel_delayed_work_sync(&per_cpu(slab_reap_work, cpu));
+	cancel_delayed_work_sync(&per_cpu(slab_reap_work, cpu).dwork);
 	/* Now the cache_reaper is guaranteed to be not running. */
-	per_cpu(slab_reap_work, cpu).work.func = NULL;
+	per_cpu(slab_reap_work, cpu).dwork.work.func = NULL;
 	return 0;
 }
 
@@ -4027,11 +4035,15 @@ static void cache_reap(struct work_struct *w)
 	struct kmem_cache_node *n;
 	int node = numa_mem_id();
 	struct delayed_work *work = to_delayed_work(w);
+	struct slab_reap_work_struct *reap_work =
+		container_of(work, struct slab_reap_work_struct, dwork);
 
 	if (!mutex_trylock(&slab_mutex))
 		/* Give up. Setup the next iteration. */
 		goto out;
 
+	WARN_ON_ONCE(reap_work->cpu != smp_processor_id());
+
 	list_for_each_entry(searchp, &slab_caches, list) {
 		check_irq_on();
 
@@ -4074,7 +4086,8 @@ static void cache_reap(struct work_struct *w)
 	next_reap_node();
 out:
 	/* Set up the next iteration */
-	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
+	schedule_delayed_work_on(reap_work->cpu, work,
+					round_jiffies_relative(REAPTIMEOUT_AC));
 }
 
 void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
-- 
2.16.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10  8:15 [RFC] mm, slab: reschedule cache_reap() on the same CPU Vlastimil Babka
@ 2018-04-10 14:12 ` Christopher Lameter
  2018-04-10 14:17   ` Tejun Heo
  2018-04-10 19:40   ` Vlastimil Babka
  2018-04-11  7:00 ` [PATCH] " Vlastimil Babka
  1 sibling, 2 replies; 12+ messages in thread
From: Christopher Lameter @ 2018-04-10 14:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Joonsoo Kim, David Rientjes,
	Pekka Enberg, Tejun Heo, Lai Jiangshan, John Stultz,
	Thomas Gleixner, Stephen Boyd

On Tue, 10 Apr 2018, Vlastimil Babka wrote:

> cache_reap() is initially scheduled in start_cpu_timer() via
> schedule_delayed_work_on(). But then the next iterations are scheduled via
> schedule_delayed_work(), thus using WORK_CPU_UNBOUND.

That is a bug.. cache_reap must run on the same cpu since it deals with
the per cpu queues of the current cpu. Scheduled_delayed_work() used to
guarantee running on teh same cpu.

> This patch makes sure schedule_delayed_work_on() is used with the proper cpu
> when scheduling the next iteration. The cpu is stored with delayed_work on a
> new slab_reap_work_struct super-structure.

The current cpu is readily available via smp_processor_id(). Why a
super structure?

> @@ -4074,7 +4086,8 @@ static void cache_reap(struct work_struct *w)
>  	next_reap_node();
>  out:
>  	/* Set up the next iteration */
> -	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
> +	schedule_delayed_work_on(reap_work->cpu, work,
> +					round_jiffies_relative(REAPTIMEOUT_AC));

schedule_delayed_work_on(smp_processor_id(), work, round_jiffies_relative(REAPTIMEOUT_AC));

instead all of the other changes?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10 14:12 ` Christopher Lameter
@ 2018-04-10 14:17   ` Tejun Heo
  2018-04-10 19:40   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2018-04-10 14:17 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Lai Jiangshan, John Stultz,
	Thomas Gleixner, Stephen Boyd

On Tue, Apr 10, 2018 at 09:12:08AM -0500, Christopher Lameter wrote:
> > @@ -4074,7 +4086,8 @@ static void cache_reap(struct work_struct *w)
> >  	next_reap_node();
> >  out:
> >  	/* Set up the next iteration */
> > -	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
> > +	schedule_delayed_work_on(reap_work->cpu, work,
> > +					round_jiffies_relative(REAPTIMEOUT_AC));
> 
> schedule_delayed_work_on(smp_processor_id(), work, round_jiffies_relative(REAPTIMEOUT_AC));
> 
> instead all of the other changes?

Yeah, that'd make more sense.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10 14:12 ` Christopher Lameter
  2018-04-10 14:17   ` Tejun Heo
@ 2018-04-10 19:40   ` Vlastimil Babka
  2018-04-10 19:53     ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2018-04-10 19:40 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: linux-mm, linux-kernel, Joonsoo Kim, David Rientjes,
	Pekka Enberg, Tejun Heo, Lai Jiangshan, John Stultz,
	Thomas Gleixner, Stephen Boyd

On 04/10/2018 04:12 PM, Christopher Lameter wrote:
> On Tue, 10 Apr 2018, Vlastimil Babka wrote:
> 
>> cache_reap() is initially scheduled in start_cpu_timer() via
>> schedule_delayed_work_on(). But then the next iterations are scheduled via
>> schedule_delayed_work(), thus using WORK_CPU_UNBOUND.
> 
> That is a bug.. cache_reap must run on the same cpu since it deals with
> the per cpu queues of the current cpu. Scheduled_delayed_work() used to
> guarantee running on teh same cpu.

Did it? When did it stop? (which stable kernels should we backport to?)
So is my assumption correct that without specifying a CPU, the next work
might be processed on a different cpu than the current one, *and also*
be executed with a kthread/u* that can migrate to another cpu *in the
middle of the work*? Tejun?

>> This patch makes sure schedule_delayed_work_on() is used with the proper cpu
>> when scheduling the next iteration. The cpu is stored with delayed_work on a
>> new slab_reap_work_struct super-structure.
> 
> The current cpu is readily available via smp_processor_id(). Why a
> super structure?

Mostly for the WARN_ON_ONCE, and general paranoia.

>> @@ -4074,7 +4086,8 @@ static void cache_reap(struct work_struct *w)
>>  	next_reap_node();
>>  out:
>>  	/* Set up the next iteration */
>> -	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
>> +	schedule_delayed_work_on(reap_work->cpu, work,
>> +					round_jiffies_relative(REAPTIMEOUT_AC));
> 
> schedule_delayed_work_on(smp_processor_id(), work, round_jiffies_relative(REAPTIMEOUT_AC));
> 
> instead all of the other changes?

If we can rely on that 100%, sure.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10 19:40   ` Vlastimil Babka
@ 2018-04-10 19:53     ` Tejun Heo
  2018-04-10 20:13       ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2018-04-10 19:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christopher Lameter, linux-mm, linux-kernel, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Lai Jiangshan, John Stultz,
	Thomas Gleixner, Stephen Boyd

Hello,

On Tue, Apr 10, 2018 at 09:40:19PM +0200, Vlastimil Babka wrote:
> On 04/10/2018 04:12 PM, Christopher Lameter wrote:
> > On Tue, 10 Apr 2018, Vlastimil Babka wrote:
> > 
> >> cache_reap() is initially scheduled in start_cpu_timer() via
> >> schedule_delayed_work_on(). But then the next iterations are scheduled via
> >> schedule_delayed_work(), thus using WORK_CPU_UNBOUND.
> > 
> > That is a bug.. cache_reap must run on the same cpu since it deals with
> > the per cpu queues of the current cpu. Scheduled_delayed_work() used to
> > guarantee running on teh same cpu.
> 
> Did it? When did it stop? (which stable kernels should we backport to?)

It goes back to v4.5 - ef557180447f ("workqueue: schedule
WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") which made
WQ_CPU_UNBOUND on percpu workqueues honor wq_unbound_cpusmask so that
cpu isolation works better.  Unless the force_rr option or
unbound_cpumask is set, it still follows local cpu.

> So is my assumption correct that without specifying a CPU, the next work
> might be processed on a different cpu than the current one, *and also*
> be executed with a kthread/u* that can migrate to another cpu *in the
> middle of the work*? Tejun?

For percpu work items, they'll keep executing on the same cpu it
started on unless the cpu goes down while executing.

> > schedule_delayed_work_on(smp_processor_id(), work, round_jiffies_relative(REAPTIMEOUT_AC));
> > 
> > instead all of the other changes?
> 
> If we can rely on that 100%, sure.

Yeah, you can.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10 19:53     ` Tejun Heo
@ 2018-04-10 20:13       ` Vlastimil Babka
  2018-04-10 20:23         ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2018-04-10 20:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christopher Lameter, linux-mm, linux-kernel, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Lai Jiangshan, John Stultz,
	Thomas Gleixner, Stephen Boyd

On 04/10/2018 09:53 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Apr 10, 2018 at 09:40:19PM +0200, Vlastimil Babka wrote:
>> On 04/10/2018 04:12 PM, Christopher Lameter wrote:
>>> On Tue, 10 Apr 2018, Vlastimil Babka wrote:
>>>
>>>> cache_reap() is initially scheduled in start_cpu_timer() via
>>>> schedule_delayed_work_on(). But then the next iterations are scheduled via
>>>> schedule_delayed_work(), thus using WORK_CPU_UNBOUND.
>>>
>>> That is a bug.. cache_reap must run on the same cpu since it deals with
>>> the per cpu queues of the current cpu. Scheduled_delayed_work() used to
>>> guarantee running on teh same cpu.
>>
>> Did it? When did it stop? (which stable kernels should we backport to?)
> 
> It goes back to v4.5 - ef557180447f ("workqueue: schedule
> WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") which made
> WQ_CPU_UNBOUND on percpu workqueues honor wq_unbound_cpusmask so that
> cpu isolation works better.  Unless the force_rr option or
> unbound_cpumask is set, it still follows local cpu.

I see, thanks.

>> So is my assumption correct that without specifying a CPU, the next work
>> might be processed on a different cpu than the current one, *and also*
>> be executed with a kthread/u* that can migrate to another cpu *in the
>> middle of the work*? Tejun?
> 
> For percpu work items, they'll keep executing on the same cpu it
> started on unless the cpu goes down while executing.

Right, but before this patch, with just schedule_delayed_work() i.e.
non-percpu? If such work can migrate in the middle, the slab bug is
potentially much more serious.

>>> schedule_delayed_work_on(smp_processor_id(), work, round_jiffies_relative(REAPTIMEOUT_AC));
>>>
>>> instead all of the other changes?
>>
>> If we can rely on that 100%, sure.
> 
> Yeah, you can.

Great, thanks.

> Thanks.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10 20:13       ` Vlastimil Babka
@ 2018-04-10 20:23         ` Tejun Heo
  0 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2018-04-10 20:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christopher Lameter, linux-mm, linux-kernel, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Lai Jiangshan, John Stultz,
	Thomas Gleixner, Stephen Boyd

Hello,

On Tue, Apr 10, 2018 at 10:13:33PM +0200, Vlastimil Babka wrote:
> > For percpu work items, they'll keep executing on the same cpu it
> > started on unless the cpu goes down while executing.
> 
> Right, but before this patch, with just schedule_delayed_work() i.e.
> non-percpu? If such work can migrate in the middle, the slab bug is
> potentially much more serious.

That's still per-cpu.  The only time the local binding breaks is when
the kernel is explicitly told to do so through explicit unbound_mask
or force_rr debug option.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-10  8:15 [RFC] mm, slab: reschedule cache_reap() on the same CPU Vlastimil Babka
  2018-04-10 14:12 ` Christopher Lameter
@ 2018-04-11  7:00 ` Vlastimil Babka
  2018-04-11 10:53   ` Pekka Enberg
  2018-04-12  0:47   ` Minchan Kim
  1 sibling, 2 replies; 12+ messages in thread
From: Vlastimil Babka @ 2018-04-11  7:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, stable, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Christoph Lameter, Tejun Heo,
	Lai Jiangshan, John Stultz, Thomas Gleixner, Stephen Boyd

cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled via
schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.

Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on
wq_unbound_cpumask CPUs") there is no guarantee the future iterations will run
on the originally intended cpu, although it's still preferred. I was able to
demonstrate this with /sys/module/workqueue/parameters/debug_force_rr_cpu.
IIUC, it may also happen due to migrating timers in nohz context. As a result,
some cpu's would be calling cache_reap() more frequently and others never.

This patch uses schedule_delayed_work_on() with the current cpu when scheduling
the next iteration.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
CC: <stable@vger.kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
---
 mm/slab.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9095c3945425..a76006aae857 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4074,7 +4074,8 @@ static void cache_reap(struct work_struct *w)
 	next_reap_node();
 out:
 	/* Set up the next iteration */
-	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
+	schedule_delayed_work_on(smp_processor_id(), work,
+				round_jiffies_relative(REAPTIMEOUT_AC));
 }
 
 void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
-- 
2.16.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-11  7:00 ` [PATCH] " Vlastimil Babka
@ 2018-04-11 10:53   ` Pekka Enberg
  2018-04-11 13:41     ` Christopher Lameter
  2018-04-12  0:47   ` Minchan Kim
  1 sibling, 1 reply; 12+ messages in thread
From: Pekka Enberg @ 2018-04-11 10:53 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, stable, Joonsoo Kim, David Rientjes,
	Pekka Enberg, Christoph Lameter, Tejun Heo, Lai Jiangshan,
	John Stultz, Thomas Gleixner, Stephen Boyd



On 11/04/2018 10.00, Vlastimil Babka wrote:
> cache_reap() is initially scheduled in start_cpu_timer() via
> schedule_delayed_work_on(). But then the next iterations are scheduled via
> schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.
> 
> Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on
> wq_unbound_cpumask CPUs") there is no guarantee the future iterations will run
> on the originally intended cpu, although it's still preferred. I was able to
> demonstrate this with /sys/module/workqueue/parameters/debug_force_rr_cpu.
> IIUC, it may also happen due to migrating timers in nohz context. As a result,
> some cpu's would be calling cache_reap() more frequently and others never.
> 
> This patch uses schedule_delayed_work_on() with the current cpu when scheduling
> the next iteration.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
> CC: <stable@vger.kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

> ---
>   mm/slab.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 9095c3945425..a76006aae857 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4074,7 +4074,8 @@ static void cache_reap(struct work_struct *w)
>   	next_reap_node();
>   out:
>   	/* Set up the next iteration */
> -	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
> +	schedule_delayed_work_on(smp_processor_id(), work,
> +				round_jiffies_relative(REAPTIMEOUT_AC));
>   }
>   
>   void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-11 10:53   ` Pekka Enberg
@ 2018-04-11 13:41     ` Christopher Lameter
  0 siblings, 0 replies; 12+ messages in thread
From: Christopher Lameter @ 2018-04-11 13:41 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel, stable,
	Joonsoo Kim, David Rientjes, Pekka Enberg, Tejun Heo,
	Lai Jiangshan, John Stultz, Thomas Gleixner, Stephen Boyd

On Wed, 11 Apr 2018, Pekka Enberg wrote:

> Acked-by: Pekka Enberg <penberg@kernel.org>

Good to hear from you again.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-11  7:00 ` [PATCH] " Vlastimil Babka
  2018-04-11 10:53   ` Pekka Enberg
@ 2018-04-12  0:47   ` Minchan Kim
  2018-04-13  8:44     ` Vlastimil Babka
  1 sibling, 1 reply; 12+ messages in thread
From: Minchan Kim @ 2018-04-12  0:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, stable, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Christoph Lameter, Tejun Heo,
	Lai Jiangshan, John Stultz, Thomas Gleixner, Stephen Boyd

On Wed, Apr 11, 2018 at 09:00:07AM +0200, Vlastimil Babka wrote:
> cache_reap() is initially scheduled in start_cpu_timer() via
> schedule_delayed_work_on(). But then the next iterations are scheduled via
> schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.
> 
> Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on
> wq_unbound_cpumask CPUs") there is no guarantee the future iterations will run
> on the originally intended cpu, although it's still preferred. I was able to
> demonstrate this with /sys/module/workqueue/parameters/debug_force_rr_cpu.
> IIUC, it may also happen due to migrating timers in nohz context. As a result,
> some cpu's would be calling cache_reap() more frequently and others never.
> 
> This patch uses schedule_delayed_work_on() with the current cpu when scheduling
> the next iteration.

Could you write down part about "so what's the user effect on some condition?".
It would really help to pick up the patch.

> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
> CC: <stable@vger.kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> ---
>  mm/slab.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 9095c3945425..a76006aae857 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4074,7 +4074,8 @@ static void cache_reap(struct work_struct *w)
>  	next_reap_node();
>  out:
>  	/* Set up the next iteration */
> -	schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
> +	schedule_delayed_work_on(smp_processor_id(), work,
> +				round_jiffies_relative(REAPTIMEOUT_AC));
>  }
>  
>  void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
> -- 
> 2.16.3
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm, slab: reschedule cache_reap() on the same CPU
  2018-04-12  0:47   ` Minchan Kim
@ 2018-04-13  8:44     ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2018-04-13  8:44 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, stable, Joonsoo Kim,
	David Rientjes, Pekka Enberg, Christoph Lameter, Tejun Heo,
	Lai Jiangshan, John Stultz, Thomas Gleixner, Stephen Boyd

On 04/12/2018 02:47 AM, Minchan Kim wrote:
> On Wed, Apr 11, 2018 at 09:00:07AM +0200, Vlastimil Babka wrote:
>> cache_reap() is initially scheduled in start_cpu_timer() via
>> schedule_delayed_work_on(). But then the next iterations are scheduled via
>> schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.
>>
>> Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on
>> wq_unbound_cpumask CPUs") there is no guarantee the future iterations will run
>> on the originally intended cpu, although it's still preferred. I was able to
>> demonstrate this with /sys/module/workqueue/parameters/debug_force_rr_cpu.
>> IIUC, it may also happen due to migrating timers in nohz context. As a result,
>> some cpu's would be calling cache_reap() more frequently and others never.
>>
>> This patch uses schedule_delayed_work_on() with the current cpu when scheduling
>> the next iteration.
> 
> Could you write down part about "so what's the user effect on some condition?".
> It would really help to pick up the patch.

Ugh, so let's continue the last paragraph with something like:

After this patch, cache_reap() executions will always remain on the
expected cpu. This can save some memory that could otherwise remain
indefinitely in array caches of some cpus or nodes, and prevent waste of
cpu cycles by executing cache_reap() too often on other cpus.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10  8:15 [RFC] mm, slab: reschedule cache_reap() on the same CPU Vlastimil Babka
2018-04-10 14:12 ` Christopher Lameter
2018-04-10 14:17   ` Tejun Heo
2018-04-10 19:40   ` Vlastimil Babka
2018-04-10 19:53     ` Tejun Heo
2018-04-10 20:13       ` Vlastimil Babka
2018-04-10 20:23         ` Tejun Heo
2018-04-11  7:00 ` [PATCH] " Vlastimil Babka
2018-04-11 10:53   ` Pekka Enberg
2018-04-11 13:41     ` Christopher Lameter
2018-04-12  0:47   ` Minchan Kim
2018-04-13  8:44     ` Vlastimil Babka

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git