linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
@ 2012-05-09  6:10 Rusty Russell
  2012-05-09  8:44 ` Ingo Molnar
  2012-05-10  1:32 ` KOSAKI Motohiro
  0 siblings, 2 replies; 12+ messages in thread
From: Rusty Russell @ 2012-05-09  6:10 UTC (permalink / raw)
  To: Ingo Molnar, x86
  Cc: LKML, anton, Arnd Bergmann, KOSAKI Motohiro, Mike Travis,
	Thomas Gleixner, Linus Torvalds, Al Viro

Hi Ingo,

        I finally rebased this on top of your tip tree, and tested it
locally.  Some more old-style cpumask usages have crept in, but it's a
fairly simple series.

The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
'struct cpumask' becomes an undefined type.  You can't accidentally take
the size of it, assign it, or pass it by value.  And thus it's safe for
us to make it smaller if nr_cpu_ids < NR_CPUS, as the final patch does.

It unfortunately requires the lglock cleanup patch, which Al already has
queued, so I've included it here.

Cheers,
Rusty.

The following changes since commit 76b12156b42f876ae399dd9db1cbef27c24a4899:

  Merge branch 'x86/apic' (2012-05-07 19:21:48 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/cpumask.git master

for you to fetch changes up to 45e788415aced8d518817d6be968ab4a132724ed:

  cpumask: reduce cpumask_size (2012-05-09 15:02:15 +0930)

----------------------------------------------------------------
Rusty Russell (9):
      lglock: remove online variants of lock
      page_alloc: use cpumask_var_t.
      cpumask: prepare for reduced cpumask_allocation.
      cpumask: make task_struct.cpus_allowed a cpumask_var_t
      cpumask: select disabling obsolete cpumask functions on x86
      cpumask: get rid of cpumask_t.
      irq: remove sizeof(struct cpumask)
      cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
      cpumask: reduce cpumask_size

 arch/arm/mach-integrator/cpu.c               |    4 +-
 arch/ia64/kernel/cpufreq/acpi-cpufreq.c      |    4 +-
 arch/ia64/kernel/mca.c                       |    2 +-
 arch/ia64/kernel/salinfo.c                   |    2 +-
 arch/ia64/kernel/topology.c                  |    2 +-
 arch/ia64/sn/kernel/sn2/sn_hwperf.c          |    2 +-
 arch/mips/kernel/cpufreq/loongson2_cpufreq.c |    2 +-
 arch/mips/kernel/traps.c                     |    8 ++--
 arch/sh/kernel/cpufreq.c                     |    2 +-
 arch/x86/kernel/cpu/mcheck/mce_intel.c       |    2 +-
 drivers/acpi/processor_throttling.c          |    4 +-
 drivers/firmware/dcdbas.c                    |    2 +-
 fs/proc/array.c                              |    4 +-
 include/linux/cpumask.h                      |   25 ++++++++---
 include/linux/init_task.h                    |    9 +++-
 include/linux/interrupt.h                    |    2 +-
 include/linux/lglock.h                       |   58 +-------------------------
 include/linux/mm_types.h                     |   25 +++++++++--
 include/linux/sched.h                        |    6 +--
 kernel/cpuset.c                              |    2 +-
 kernel/fork.c                                |   31 +++++++++-----
 kernel/irq/irqdesc.c                         |    2 +-
 kernel/rcutree_plugin.h                      |    4 +-
 kernel/sched/core.c                          |    6 +--
 kernel/sched/cpupri.c                        |    4 +-
 kernel/trace/trace_workqueue.c               |    6 +--
 kernel/workqueue.c                           |    2 +-
 lib/Kconfig                                  |    8 +++-
 lib/cpu_rmap.c                               |    2 +-
 lib/cpumask.c                                |    2 +
 mm/page_alloc.c                              |   17 ++++----
 31 files changed, 124 insertions(+), 127 deletions(-)


commit a9e14f56ae7519d3c98651c927be53d094a3841a
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 14:55:15 2012 +0930

    lglock: remove online variants of lock
    
    Optimizing the slow paths adds a lot of complexity.  If you need to
    grab every lock often, you have other problems.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Acked-by: Nick Piggin <npiggin@kernel.dk>

commit fab26054c0500d426cf1bc2ce227a6a428dc815e
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 14:55:15 2012 +0930

    page_alloc: use cpumask_var_t.
    
    The BSS trick works, but it still wastes space.  Especially since there's
    a nice fallback in the case where we fail to allocate a temporary cpumask.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit 0196da928b332fba819877f5ab7aa02d8cd78e9b
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 14:56:15 2012 +0930

    cpumask: prepare for reduced cpumask_allocation.
    
    Thomas and Linus already made CONFIG_CPUMASK_OFFSTACK use a cpumask
    at the end of struct mm_struct, this just changes it into a bitmap
    and allocates it using cpumask_size().
    
    This means it will shrink when cpumask_size() is changed to reflect
    nr_cpu_ids not NR_CPUS.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: anton@samba.org
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Mike Travis <travis@sgi.com>

commit 5583b004a294063cbb03e586680de91f316955b8
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 14:57:15 2012 +0930

    cpumask: make task_struct.cpus_allowed a cpumask_var_t
    
    This turns it into a pointer for everyone.  No change for those
    already using the tsk_cpus_allowed() accessor; I've enhanced some of
    the sched/ code to use that.  For others, I just changed them
    directly.
    
    For CONFIG_CPUMASK_OFFSTACK=y, we now allocate it off the end; it
    would be better to avoid the indirection and use a dangling bitmap,
    but I didn't want to alter the layout of task_struct and risk breaking
    carefully balanced caches.
    
    Even better would be to point to the fixed "one cpu" and "all cpus"
    masks where possible, and make a copy when setting it to something
    else.  But you'd have to track down those naughty places which frob it
    directly...
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: anton@samba.org
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Mike Travis <travis@sgi.com>

commit c04a0a0d6ec248a533401f284ef2173c706a8e94
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 14:58:15 2012 +0930

    cpumask: select disabling obsolete cpumask functions on x86
    
    It currently depends on CONFIG_BROKEN; change to be set by
    CONFIG_CPUMASK_OFFSTACK now it all compiles.
    
    We also complete it: the header shouldn't refer to the deprected
    CPU_MASK_LAST_WORD, and the deprecated implementations are removed.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: anton@samba.org
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Mike Travis <travis@sgi.com>

commit d690ee2ccc74e2cb936ddb47322617a544417d6a
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 14:59:15 2012 +0930

    cpumask: get rid of cpumask_t.
    
    Very shortly, struct cpumask will be declared but undefined for
    CONFIG_CPUMASK_OFFSTACK, so use 'struct cpumask' rather than
    the obsolescent 'cpumask_t'.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit 124e4f2c2b72117c303a86691881f3abd952b31c
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 15:00:15 2012 +0930

    irq: remove sizeof(struct cpumask)
    
    Very shortly, struct cpumask will be declared but undefined for
    CONFIG_CPUMASK_OFFSTACK, so sizeof() won't compile.  This is a Good
    Thing, since we want to use cpumask_size() here anyway.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit 898eb73305e2277be91b931c5a75484f8c87ae36
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 15:01:15 2012 +0930

    cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
    
    We're about to change CONFIG_CPUMASK_OFFSTACK so it only allocate
    nr_cpu_ids bits for all cpumasks.  We need to make sure that when
    CONFIG_CPUMASK_OFFSTACK is set:
    
    1) Noone uses the old bitmap ops, which use NR_CPUS bits (use cpumask_*)
    2) Noone uses assignment of struct cpumask (use cpumask_copy)
    3) Noone passes a struct cpumask (pass a pointer)
    4) Noone declares them on the stack (use cpumask_var_t)
    
    So we finally remove the definition of struct cpumask when
    CONFIG_CPUMASK_OFFSTACK=y.  This means that these usages will hit a compile
    error the moment that config option is turned on.
    
    Note that it also means you can't declare a static cpumask.  You
    should avoid this anyway (use cpumask_var_t), but there's a
    deliberately-ugly workaround for special cases, using DECLARE_BITMAP()
    and to_cpumask().
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: anton@samba.org
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Mike Travis <travis@sgi.com>

commit 45e788415aced8d518817d6be968ab4a132724ed
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Wed May 9 15:02:15 2012 +0930

    cpumask: reduce cpumask_size
    
    Now we're sure noone is using old cpumask operators, nor *cpumask, we can
    allocate less bits safely.  This reduces the memory usage of off-stack
    cpumasks when CONFIG_CPUMASK_OFFSTACK=y but we don't have NR_CPUS actual
    cpus.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: anton@samba.org
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Mike Travis <travis@sgi.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-09  6:10 [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK Rusty Russell
@ 2012-05-09  8:44 ` Ingo Molnar
  2012-05-10  0:29   ` Rusty Russell
  2012-05-10  1:32 ` KOSAKI Motohiro
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2012-05-09  8:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, x86, LKML, anton, Arnd Bergmann, KOSAKI Motohiro,
	Mike Travis, Thomas Gleixner, Linus Torvalds, Al Viro


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Hi Ingo,
> 
>         I finally rebased this on top of your tip tree, and tested it
> locally.  Some more old-style cpumask usages have crept in, but it's a
> fairly simple series.

Cool! Most of it looks pretty sane. I have a question about the 
gist of the series:

> commit 898eb73305e2277be91b931c5a75484f8c87ae36
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Wed May 9 15:01:15 2012 +0930
> 
>     cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
>     
>     We're about to change CONFIG_CPUMASK_OFFSTACK so it only allocate
>     nr_cpu_ids bits for all cpumasks.  We need to make sure that when
>     CONFIG_CPUMASK_OFFSTACK is set:
>     
>     1) Noone uses the old bitmap ops, which use NR_CPUS bits (use cpumask_*)
>     2) Noone uses assignment of struct cpumask (use cpumask_copy)
>     3) Noone passes a struct cpumask (pass a pointer)
>     4) Noone declares them on the stack (use cpumask_var_t)
>     
>     So we finally remove the definition of struct cpumask when
>     CONFIG_CPUMASK_OFFSTACK=y.  This means that these usages will hit a compile
>     error the moment that config option is turned on.
>     
>     Note that it also means you can't declare a static cpumask.  You
>     should avoid this anyway (use cpumask_var_t), but there's a
>     deliberately-ugly workaround for special cases, using DECLARE_BITMAP()
>     and to_cpumask().
>     
>     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>     Cc: Arnd Bergmann <arnd@arndb.de>
>     Cc: anton@samba.org
>     Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>     Cc: Mike Travis <travis@sgi.com>

Is there any good reason to not remove it altogether, regardless 
of whether the OFFSTACK config is set? I mean, triggering build 
failures for a relatively rarely turned on config option is 
asking for constant maintenance trouble.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-09  8:44 ` Ingo Molnar
@ 2012-05-10  0:29   ` Rusty Russell
  2012-05-10  7:42     ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Rusty Russell @ 2012-05-10  0:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, x86, LKML, anton, Arnd Bergmann, KOSAKI Motohiro,
	Mike Travis, Thomas Gleixner, Linus Torvalds, Al Viro

On Wed, 9 May 2012 10:44:53 +0200, Ingo Molnar <mingo@kernel.org> wrote:
> 
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > Hi Ingo,
> > 
> >         I finally rebased this on top of your tip tree, and tested it
> > locally.  Some more old-style cpumask usages have crept in, but it's a
> > fairly simple series.
> 
> Cool! Most of it looks pretty sane. I have a question about the 
> gist of the series:
> 
> > commit 898eb73305e2277be91b931c5a75484f8c87ae36
> > Author: Rusty Russell <rusty@rustcorp.com.au>
> > Date:   Wed May 9 15:01:15 2012 +0930
> > 
> >     cpumask: remove struct cpumask definition when CONFIG_CPUMASK_OFFSTACK=y
> >     
> >     We're about to change CONFIG_CPUMASK_OFFSTACK so it only allocate
> >     nr_cpu_ids bits for all cpumasks.  We need to make sure that when
> >     CONFIG_CPUMASK_OFFSTACK is set:
> >     
> >     1) Noone uses the old bitmap ops, which use NR_CPUS bits (use cpumask_*)
> >     2) Noone uses assignment of struct cpumask (use cpumask_copy)
> >     3) Noone passes a struct cpumask (pass a pointer)
> >     4) Noone declares them on the stack (use cpumask_var_t)
> >     
> >     So we finally remove the definition of struct cpumask when
> >     CONFIG_CPUMASK_OFFSTACK=y.  This means that these usages will hit a compile
> >     error the moment that config option is turned on.
> >     
> >     Note that it also means you can't declare a static cpumask.  You
> >     should avoid this anyway (use cpumask_var_t), but there's a
> >     deliberately-ugly workaround for special cases, using DECLARE_BITMAP()
> >     and to_cpumask().
> >     
> >     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> >     Cc: Arnd Bergmann <arnd@arndb.de>
> >     Cc: anton@samba.org
> >     Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> >     Cc: Mike Travis <travis@sgi.com>
> 
> Is there any good reason to not remove it altogether, regardless 
> of whether the OFFSTACK config is set? I mean, triggering build 
> failures for a relatively rarely turned on config option is 
> asking for constant maintenance trouble.

Mainly because I didn't want to disturb the archs which don't care at
all about large cpumasks.  After all, putting a struct cpumask on the
stack is pretty convenient.

But we could add a new arch config which removes it, and set it from
x86.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-09  6:10 [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK Rusty Russell
  2012-05-09  8:44 ` Ingo Molnar
@ 2012-05-10  1:32 ` KOSAKI Motohiro
  2012-05-10  2:16   ` Rusty Russell
  1 sibling, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2012-05-10  1:32 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, x86, LKML, anton, Arnd Bergmann, KOSAKI Motohiro,
	Mike Travis, Thomas Gleixner, Linus Torvalds, Al Viro,
	kosaki.motohiro

(5/9/12 2:10 AM), Rusty Russell wrote:
> Hi Ingo,
>
>          I finally rebased this on top of your tip tree, and tested it
> locally.  Some more old-style cpumask usages have crept in, but it's a
> fairly simple series.
>
> The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
> 'struct cpumask' becomes an undefined type.  You can't accidentally take
> the size of it, assign it, or pass it by value.  And thus it's safe for
> us to make it smaller if nr_cpu_ids<  NR_CPUS, as the final patch does.
>
> It unfortunately requires the lglock cleanup patch, which Al already has
> queued, so I've included it here.

Hi

Thanks this effort. This is very cleaner than I expected.
However I should NAK following one patch. sorry. because of, lru-drain is
called from memory reclaim context. It mean, additional allocation may not
work. Please just use bare NR_CPUS bitmap instead. space wasting is minor
issue than that.

> commit fab26054c0500d426cf1bc2ce227a6a428dc815e
> Author: Rusty Russell<rusty@rustcorp.com.au>
> Date:   Wed May 9 14:55:15 2012 +0930
>
>      page_alloc: use cpumask_var_t.
>
>      The BSS trick works, but it still wastes space.  Especially since there's
>      a nice fallback in the case where we fail to allocate a temporary cpumask.
>
>      Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>



Other patches looks very goold.
	Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  1:32 ` KOSAKI Motohiro
@ 2012-05-10  2:16   ` Rusty Russell
  2012-05-10  2:43     ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Rusty Russell @ 2012-05-10  2:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Ingo Molnar, x86, LKML, anton, Arnd Bergmann, KOSAKI Motohiro,
	Mike Travis, Thomas Gleixner, Linus Torvalds, Al Viro,
	kosaki.motohiro

On Wed, 09 May 2012 21:32:57 -0400, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> (5/9/12 2:10 AM), Rusty Russell wrote:
> > Hi Ingo,
> >
> >          I finally rebased this on top of your tip tree, and tested it
> > locally.  Some more old-style cpumask usages have crept in, but it's a
> > fairly simple series.
> >
> > The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
> > 'struct cpumask' becomes an undefined type.  You can't accidentally take
> > the size of it, assign it, or pass it by value.  And thus it's safe for
> > us to make it smaller if nr_cpu_ids<  NR_CPUS, as the final patch does.
> >
> > It unfortunately requires the lglock cleanup patch, which Al already has
> > queued, so I've included it here.
> 
> Hi
> 
> Thanks this effort. This is very cleaner than I expected.
> However I should NAK following one patch. sorry. because of, lru-drain is
> called from memory reclaim context. It mean, additional allocation may not
> work. Please just use bare NR_CPUS bitmap instead. space wasting is minor
> issue than that.

But if it fails the allocation, that's fine: we just send a few more
IPIs to every CPU:

+	if (!zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL)) {
+		on_each_cpu(drain_local_pages, NULL, 1);
+		return;
+	}

We can do it the other way, but it sets a bad example, and after we get
rid of cpumask, it becomes:

        static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);

        ......

 		if (has_pcps)
			cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
		else
			cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
 	}
	on_each_cpu_mask(to_cpumask(cpus_with_pcps), drain_local_pages, NULL, 1);

Or is there a reason we shouldn't even try to allocate here?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  2:16   ` Rusty Russell
@ 2012-05-10  2:43     ` KOSAKI Motohiro
  2012-05-10  4:54       ` Rusty Russell
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2012-05-10  2:43 UTC (permalink / raw)
  To: Rusty Russell
  Cc: KOSAKI Motohiro, Ingo Molnar, x86, LKML, anton, Arnd Bergmann,
	KOSAKI Motohiro, Mike Travis, Thomas Gleixner, Linus Torvalds,
	Al Viro

(5/9/12 10:16 PM), Rusty Russell wrote:
> On Wed, 09 May 2012 21:32:57 -0400, KOSAKI Motohiro<kosaki.motohiro@gmail.com>  wrote:
>> (5/9/12 2:10 AM), Rusty Russell wrote:
>>> Hi Ingo,
>>>
>>>           I finally rebased this on top of your tip tree, and tested it
>>> locally.  Some more old-style cpumask usages have crept in, but it's a
>>> fairly simple series.
>>>
>>> The final result is that if you enable CONFIG_CPUMASK_OFFSTACK, then
>>> 'struct cpumask' becomes an undefined type.  You can't accidentally take
>>> the size of it, assign it, or pass it by value.  And thus it's safe for
>>> us to make it smaller if nr_cpu_ids<   NR_CPUS, as the final patch does.
>>>
>>> It unfortunately requires the lglock cleanup patch, which Al already has
>>> queued, so I've included it here.
>>
>> Hi
>>
>> Thanks this effort. This is very cleaner than I expected.
>> However I should NAK following one patch. sorry. because of, lru-drain is
>> called from memory reclaim context. It mean, additional allocation may not
>> work. Please just use bare NR_CPUS bitmap instead. space wasting is minor
>> issue than that.
>
> But if it fails the allocation, that's fine: we just send a few more
> IPIs to every CPU:
>
> +	if (!zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL)) {
> +		on_each_cpu(drain_local_pages, NULL, 1);
> +		return;
> +	}
>
> We can do it the other way, but it sets a bad example, and after we get
> rid of cpumask, it becomes:
>
>          static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);
>
>          ......
>
>   		if (has_pcps)
> 			cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
> 		else
> 			cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
>   	}
> 	on_each_cpu_mask(to_cpumask(cpus_with_pcps), drain_local_pages, NULL, 1);
>
> Or is there a reason we shouldn't even try to allocate here?

1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.
2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
cause stack overflow. because of, alloc_pages() can be called from very deep call stack.

Thought?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  2:43     ` KOSAKI Motohiro
@ 2012-05-10  4:54       ` Rusty Russell
  2012-05-10  6:42         ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Rusty Russell @ 2012-05-10  4:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KOSAKI Motohiro, Ingo Molnar, x86, LKML, anton, Arnd Bergmann,
	KOSAKI Motohiro, Mike Travis, Thomas Gleixner, Linus Torvalds,
	Al Viro

On Wed, 09 May 2012 22:43:39 -0400, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> > Or is there a reason we shouldn't even try to allocate here?
> 
> 1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.

Oh :(

How about the below instead?

> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
> cause stack overflow. because of, alloc_pages() can be called from
> very deep call stack.

You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
otherwise you'll get many other stack overflows, too.

Thanks,
Rusty.

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 581e74b..7c1db9c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -367,7 +367,7 @@ extern void free_hot_cold_page_list(struct list_head *list, int cold);
 
 void page_alloc_init(void);
 void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
-void drain_all_pages(void);
+void drain_all_pages(gfp_t gfp_flags);
 void drain_local_pages(void *dummy);
 
 /*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 97cc273..daf0d7b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -237,7 +237,7 @@ void shake_page(struct page *p, int access)
 		lru_add_drain_all();
 		if (PageLRU(p))
 			return;
-		drain_all_pages();
+		drain_all_pages(GFP_KERNEL);
 		if (PageLRU(p) || is_free_buddy_page(p))
 			return;
 	}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6629faf..1372a9b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -922,7 +922,7 @@ repeat:
 	if (drain) {
 		lru_add_drain_all();
 		cond_resched();
-		drain_all_pages();
+		drain_all_pages(GFP_KERNEL);
 	}
 
 	pfn = scan_lru_pages(start_pfn, end_pfn);
@@ -944,7 +944,7 @@ repeat:
 	lru_add_drain_all();
 	yield();
 	/* drain pcp pages , this is synchrouns. */
-	drain_all_pages();
+	drain_all_pages(GFP_KERNEL);
 	/* check again */
 	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
 	if (offlined_pages < 0) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a712fb9..aaac25c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1169,17 +1169,17 @@ void drain_local_pages(void *arg)
  * nothing keeps CPUs from showing up after we populated the cpumask and
  * before the call to on_each_cpu_mask().
  */
-void drain_all_pages(void)
+void drain_all_pages(gfp_t gfp_flags)
 {
 	int cpu;
 	struct per_cpu_pageset *pcp;
 	struct zone *zone;
+	cpumask_var_t cpus_with_pcps;
 
-	/*
-	 * Allocate in the BSS so we wont require allocation in
-	 * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
-	 */
-	static cpumask_t cpus_with_pcps;
+	if (!zalloc_cpumask_var(&cpus_with_pcps, gfp_flags)) {
+		on_each_cpu(drain_local_pages, NULL, 1);
+		return;
+	}
 
 	/*
 	 * We don't care about racing with CPU hotplug event
@@ -1197,11 +1197,10 @@ void drain_all_pages(void)
 			}
 		}
 		if (has_pcps)
-			cpumask_set_cpu(cpu, &cpus_with_pcps);
-		else
-			cpumask_clear_cpu(cpu, &cpus_with_pcps);
+			cpumask_set_cpu(cpu, cpus_with_pcps);
 	}
-	on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
+	on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
+	free_cpumask_var(cpus_with_pcps);
 }
 
 #ifdef CONFIG_HIBERNATION
@@ -2132,7 +2131,7 @@ retry:
 	 * pages are pinned on the per-cpu lists. Drain them and try again
 	 */
 	if (!page && !drained) {
-		drain_all_pages();
+		drain_all_pages(GFP_ATOMIC);
 		drained = true;
 		goto retry;
 	}
@@ -5532,7 +5531,7 @@ out:
 
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret)
-		drain_all_pages();
+		drain_all_pages(GFP_KERNEL);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  4:54       ` Rusty Russell
@ 2012-05-10  6:42         ` KOSAKI Motohiro
  2012-05-14  2:58           ` Rusty Russell
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2012-05-10  6:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: KOSAKI Motohiro, Ingo Molnar, x86, LKML, anton, Arnd Bergmann,
	KOSAKI Motohiro, Mike Travis, Thomas Gleixner, Linus Torvalds,
	Al Viro

(5/10/12 12:54 AM), Rusty Russell wrote:
> On Wed, 09 May 2012 22:43:39 -0400, KOSAKI Motohiro<kosaki.motohiro@gmail.com>  wrote:
>>> Or is there a reason we shouldn't even try to allocate here?
>>
>> 1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.
>
> Oh :(
>
> How about the below instead?

This code still slow than original. when calling reclaim path, new allocation is almost always
fail. then, your code almost always invoke all cpu batch invalidation. i.e. many ipi.


>> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
>> cause stack overflow. because of, alloc_pages() can be called from
>> very deep call stack.
>
> You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
> otherwise you'll get many other stack overflows, too.

Original code put cpumask bss instead stack then. :-)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  0:29   ` Rusty Russell
@ 2012-05-10  7:42     ` Ingo Molnar
  2012-05-14  3:22       ` Rusty Russell
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2012-05-10  7:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, x86, LKML, anton, Arnd Bergmann, KOSAKI Motohiro,
	Mike Travis, Thomas Gleixner, Linus Torvalds, Al Viro


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Mainly because I didn't want to disturb the archs which don't 
> care at all about large cpumasks.  After all, putting a struct 
> cpumask on the stack is pretty convenient.

Yes.

> But we could add a new arch config which removes it, and set 
> it from x86.

Could we just use a single cpumask type, cpumask_t or so, which 
would be the *only* generic method to use cpumasks?

(Current cpumask_t would move to cpumask_full_t.)

This would be the 'final' destiation for the cpumask code: the 
natural type to use in new code is cpumask_t, while in special 
cases we could use cpumask_full_t - but the name signals that 
it's a potentially large structure.

On architectures that don't worry about large cpumasks (yet ...) 
cpumask_t and cpumask_full_t maps to the same thing, so there's 
no difference.

This would make things more natural IMO.

There would be no 'struct cpumask'. (and 'cpumask_var_t' would 
disappear too due to the rename.)

Thoughts?

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  6:42         ` KOSAKI Motohiro
@ 2012-05-14  2:58           ` Rusty Russell
  2012-05-15  1:38             ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Rusty Russell @ 2012-05-14  2:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KOSAKI Motohiro, Ingo Molnar, x86, LKML, anton, Arnd Bergmann,
	KOSAKI Motohiro, Mike Travis, Thomas Gleixner, Linus Torvalds,
	Al Viro

On Thu, 10 May 2012 02:42:43 -0400, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> (5/10/12 12:54 AM), Rusty Russell wrote:
> > On Wed, 09 May 2012 22:43:39 -0400, KOSAKI Motohiro<kosaki.motohiro@gmail.com>  wrote:
> >>> Or is there a reason we shouldn't even try to allocate here?
> >>
> >> 1) your code always use GFP_KERNEL. it is trouble maker when alloc_pages w/ GFP_ATOMIC.
> >
> > Oh :(
> >
> > How about the below instead?
> 
> This code still slow than original. when calling reclaim path, new allocation is almost always
> fail. then, your code almost always invoke all cpu batch invalidation. i.e. many ipi.

I don't know this code.  Does that happen often?  Do we really need to
optimize the out-of-memory path?

But I should have used on_each_cpu_cond() helper which does this for us
(except it falls back to individial IPIs) which would make this code
neater.

> >> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
> >> cause stack overflow. because of, alloc_pages() can be called from
> >> very deep call stack.
> >
> > You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
> > otherwise you'll get many other stack overflows, too.
> 
> Original code put cpumask bss instead stack then. :-)

Yes, and this is what it looks like if we convert it directly, but I
still don't want to encourage people to do this :(

Cheers,
Rusty.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1179,7 +1179,7 @@ void drain_all_pages(void)
 	 * Allocate in the BSS so we wont require allocation in
 	 * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
 	 */
-	static cpumask_t cpus_with_pcps;
+	static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);
 
 	/*
 	 * We don't care about racing with CPU hotplug event
@@ -1197,11 +1197,12 @@ void drain_all_pages(void)
 			}
 		}
 		if (has_pcps)
-			cpumask_set_cpu(cpu, &cpus_with_pcps);
+			cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
 		else
-			cpumask_clear_cpu(cpu, &cpus_with_pcps);
+			cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
 	}
-	on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
+	on_each_cpu_mask(to_cpumask(cpus_with_pcps),
+			 drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-10  7:42     ` Ingo Molnar
@ 2012-05-14  3:22       ` Rusty Russell
  0 siblings, 0 replies; 12+ messages in thread
From: Rusty Russell @ 2012-05-14  3:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, x86, LKML, anton, Arnd Bergmann, KOSAKI Motohiro,
	Mike Travis, Thomas Gleixner, Linus Torvalds, Al Viro

On Thu, 10 May 2012 09:42:15 +0200, Ingo Molnar <mingo@kernel.org> wrote:
> 
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > Mainly because I didn't want to disturb the archs which don't 
> > care at all about large cpumasks.  After all, putting a struct 
> > cpumask on the stack is pretty convenient.
> 
> Yes.
> 
> > But we could add a new arch config which removes it, and set 
> > it from x86.
> 
> Could we just use a single cpumask type, cpumask_t or so, which 
> would be the *only* generic method to use cpumasks?
> 
> (Current cpumask_t would move to cpumask_full_t.)
> 
> This would be the 'final' destiation for the cpumask code: the 
> natural type to use in new code is cpumask_t, while in special 
> cases we could use cpumask_full_t - but the name signals that 
> it's a potentially large structure.
> 
> On architectures that don't worry about large cpumasks (yet ...) 
> cpumask_t and cpumask_full_t maps to the same thing, so there's 
> no difference.
> 
> This would make things more natural IMO.
> 
> There would be no 'struct cpumask'. (and 'cpumask_var_t' would 
> disappear too due to the rename.)
> 
> Thoughts?

I don't understand, sorry.  I think I'd need some code to understand.

Unfortunately I was wrong about being able to remove struct cpumask's
definition when CONFIG_CPUMASK_OFFSTACK=n: we need it for cpumask_var_t
in that case :(

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK.
  2012-05-14  2:58           ` Rusty Russell
@ 2012-05-15  1:38             ` KOSAKI Motohiro
  0 siblings, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2012-05-15  1:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: KOSAKI Motohiro, Ingo Molnar, x86, LKML, anton, Arnd Bergmann,
	KOSAKI Motohiro, Mike Travis, Thomas Gleixner, Linus Torvalds,
	Al Viro

>> This code still slow than original. when calling reclaim path, new allocation is almost always
>> fail. then, your code almost always invoke all cpu batch invalidation. i.e. many ipi.
>
> I don't know this code.  Does that happen often?Do we really need to
> optimize the out-of-memory path?

we don't need optimize out-of-memory path. but it's not out-of-memory path. our reclaim code
has two steps 1) purge small file cache (try_to_free_pages) 2) get new page (get_page_from_freelist).
but if you have smp box, it's racy. To success 1) doesn't guarantee to success 2). then, drain_all_pages()
is called frequently than you expected.



> But I should have used on_each_cpu_cond() helper which does this for us
> (except it falls back to individial IPIs) which would make this code
> neater.

Ah, yes. that definitely makes sense.


>>>> 2) When CONFIG_CPUMASK_OFFSTACK=n and NR_CPUS is relatively large, cpumask on stack may
>>>> cause stack overflow. because of, alloc_pages() can be called from
>>>> very deep call stack.
>>>
>>> You can't have large NR_CPUS without CONFIG_CPUMASK_OFFSTACK=y,
>>> otherwise you'll get many other stack overflows, too.
>>
>> Original code put cpumask bss instead stack then. :-)
>
> Yes, and this is what it looks like if we convert it directly, but I
> still don't want to encourage people to do this :(
>
> Cheers,
> Rusty.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1179,7 +1179,7 @@ void drain_all_pages(void)
>   	 * Allocate in the BSS so we wont require allocation in
>   	 * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
>   	 */
> -	static cpumask_t cpus_with_pcps;
> +	static DECLARE_BITMAP(cpus_with_pcps, NR_CPUS);
>
>   	/*
>   	 * We don't care about racing with CPU hotplug event
> @@ -1197,11 +1197,12 @@ void drain_all_pages(void)
>   			}
>   		}
>   		if (has_pcps)
> -			cpumask_set_cpu(cpu,&cpus_with_pcps);
> +			cpumask_set_cpu(cpu, to_cpumask(cpus_with_pcps));
>   		else
> -			cpumask_clear_cpu(cpu,&cpus_with_pcps);
> +			cpumask_clear_cpu(cpu, to_cpumask(cpus_with_pcps));
>   	}
> -	on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
> +	on_each_cpu_mask(to_cpumask(cpus_with_pcps),
> +			 drain_local_pages, NULL, 1);
>   }

Looks good to me. thanks.

   Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>






^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-05-15  1:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-09  6:10 [PULL] cpumask: finally make them variable size w/ CPUMASK_OFFSTACK Rusty Russell
2012-05-09  8:44 ` Ingo Molnar
2012-05-10  0:29   ` Rusty Russell
2012-05-10  7:42     ` Ingo Molnar
2012-05-14  3:22       ` Rusty Russell
2012-05-10  1:32 ` KOSAKI Motohiro
2012-05-10  2:16   ` Rusty Russell
2012-05-10  2:43     ` KOSAKI Motohiro
2012-05-10  4:54       ` Rusty Russell
2012-05-10  6:42         ` KOSAKI Motohiro
2012-05-14  2:58           ` Rusty Russell
2012-05-15  1:38             ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).