All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
@ 2011-04-25  9:41 KOSAKI Motohiro
  2011-04-26 10:42 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2011-04-25  9:41 UTC (permalink / raw)
  To: LKML, Andrew Morton, Peter Zijlstra, Mike Galbraith, Ingo Molnar
  Cc: kosaki.motohiro

This patch adapt the code to new api fashion. 

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
---
 kernel/kthread.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3b34d27..4102518 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -202,7 +202,7 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
 		return;
 	}
 
-	p->cpus_allowed = cpumask_of_cpu(cpu);
+	cpumask_copy(&p->cpus_allowed, cpumask_of(cpu));
 	p->rt.nr_cpus_allowed = 1;
 	p->flags |= PF_THREAD_BOUND;
 }
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
  2011-04-25  9:41 [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of() KOSAKI Motohiro
@ 2011-04-26 10:42 ` Peter Zijlstra
  2011-04-26 11:33   ` KOSAKI Motohiro
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2011-04-26 10:42 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Andrew Morton, Mike Galbraith, Ingo Molnar

On Mon, 2011-04-25 at 18:41 +0900, KOSAKI Motohiro wrote:
> This patch adapt the code to new api fashion. 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/kthread.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 3b34d27..4102518 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -202,7 +202,7 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
>  		return;
>  	}
>  
> -	p->cpus_allowed = cpumask_of_cpu(cpu);
> +	cpumask_copy(&p->cpus_allowed, cpumask_of(cpu));
>  	p->rt.nr_cpus_allowed = 1;
>  	p->flags |= PF_THREAD_BOUND;
>  }

But why? Are we going to get rid of cpumask_t (which is a fixed sized
struct to direct assignment is perfectly fine)?

Also, do we want to convert cpus_allowed to cpumask_var_t? It would save
quite a lot of memory on distro configs that set NR_CPUS silly high.
Currently NR_CPUS=4096 configs allocate 512 bytes per task for this
bitmap, 511 of which will never be used on most machines (510 in the
near future).

The cost if of course an extra memory dereference in scheduler hot
paths.. also not nice.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
  2011-04-26 10:42 ` Peter Zijlstra
@ 2011-04-26 11:33   ` KOSAKI Motohiro
  2011-04-27 10:32     ` KOSAKI Motohiro
  0 siblings, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2011-04-26 11:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, LKML, Andrew Morton, Mike Galbraith, Ingo Molnar

> On Mon, 2011-04-25 at 18:41 +0900, KOSAKI Motohiro wrote:
> > This patch adapt the code to new api fashion. 
> > 
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Mike Galbraith <efault@gmx.de>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > ---
> >  kernel/kthread.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 3b34d27..4102518 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -202,7 +202,7 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
> >  		return;
> >  	}
> >  
> > -	p->cpus_allowed = cpumask_of_cpu(cpu);
> > +	cpumask_copy(&p->cpus_allowed, cpumask_of(cpu));
> >  	p->rt.nr_cpus_allowed = 1;
> >  	p->flags |= PF_THREAD_BOUND;
> >  }
> 
> But why? Are we going to get rid of cpumask_t (which is a fixed sized
> struct to direct assignment is perfectly fine)?
> 
> Also, do we want to convert cpus_allowed to cpumask_var_t? It would save
> quite a lot of memory on distro configs that set NR_CPUS silly high.
> Currently NR_CPUS=4096 configs allocate 512 bytes per task for this
> bitmap, 511 of which will never be used on most machines (510 in the
> near future).
>
> The cost if of course an extra memory dereference in scheduler hot
> paths.. also not nice.

To be honest, I dislike current cpumask_size() always return NR_CPUS. It
screw up almost all of cpumask_var_t benefit. But, we have to eliminate
all dangerous = operator usage before changing its implementation.

(btw, I wonder this limitation doesn't documented at all in code. Should
 we add this?)

Thus, now I'm finding all of =operator by tool and replacing it. The second
motivation of eliminating old api is to easy detect obsolete usage by tool.

So, I personally hope task->cpus_allowed convert to cpumask_var_t 
as cpuset->cpus_allowed. For some years, storage guys repeatedly alert
stack overflow chance is increasing as time goes.

However, yes, extra memory dereference is also bad. If scheduler folks
dislike cpumask_var_t, I can drop to think convert task->cpus_allowed.

But, if we can't convert cpus_allowed, I'd like to move it into last of
task_struct. because usually cpu_allowed is only accessed first byte
(as you described).

Thanks.



>From 8844bba7597ac1c7fd2e88406da17d818ce271d1 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 26 Apr 2011 19:58:39 +0900
Subject: [PATCH] cpumask: add cpumask_var_t documentation

cpumask_var_t has one nortable difference against cpumask_t.
This patch adds the explanation.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/cpumask.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index a3ff24f..3d09505 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -617,6 +617,12 @@ static inline size_t cpumask_size(void)
  *	  ... use 'tmpmask' like a normal struct cpumask * ...
  *
  *	free_cpumask_var(tmpmask);
+ *
+ * However, one notable exception is there. cpumask_var_t is allocated
+ * only nr_cpu_ids bits (in the other hand, real cpumask_t always has
+ * NR_CPUS bits). therefore cpumask_var_t can't use '=' operator.
+ * It makes NR_CPUS size memcopy and bring memroy corruption. You have
+ * to use cpumask_copy() instead.
  */
 #ifdef CONFIG_CPUMASK_OFFSTACK
 typedef struct cpumask *cpumask_var_t;
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
  2011-04-26 11:33   ` KOSAKI Motohiro
@ 2011-04-27 10:32     ` KOSAKI Motohiro
  2011-05-26 20:38       ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2011-04-27 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, LKML, Andrew Morton, Mike Galbraith, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]

> > But why? Are we going to get rid of cpumask_t (which is a fixed sized
> > struct to direct assignment is perfectly fine)?
> > 
> > Also, do we want to convert cpus_allowed to cpumask_var_t? It would save
> > quite a lot of memory on distro configs that set NR_CPUS silly high.
> > Currently NR_CPUS=4096 configs allocate 512 bytes per task for this
> > bitmap, 511 of which will never be used on most machines (510 in the
> > near future).
> >
> > The cost if of course an extra memory dereference in scheduler hot
> > paths.. also not nice.

Probably, mesurement data is verbose than my poor english...

I've made concept proof patch today. The result is better than I expected.

<before>
 Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

         1603777813  cache-references         #     56.987 M/sec   ( +-   1.824% )  (scaled from 25.36%)
           13780381  cache-misses             #      0.490 M/sec   ( +-   1.360% )  (scaled from 25.55%)
        24872032348  L1-dcache-loads          #    883.770 M/sec   ( +-   0.666% )  (scaled from 25.51%)
          640394580  L1-dcache-load-misses    #     22.755 M/sec   ( +-   0.796% )  (scaled from 25.47%)

       14.162411769  seconds time elapsed   ( +-   0.675% )

<after>
 Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

         1416147603  cache-references         #     51.566 M/sec   ( +-   4.407% )  (scaled from 25.40%)
           10920284  cache-misses             #      0.398 M/sec   ( +-   5.454% )  (scaled from 25.56%)
        24666962632  L1-dcache-loads          #    898.196 M/sec   ( +-   1.747% )  (scaled from 25.54%)
          598640329  L1-dcache-load-misses    #     21.798 M/sec   ( +-   2.504% )  (scaled from 25.50%)

       13.812193312  seconds time elapsed   ( +-   1.696% )

 * datail data is in result.txt


The trick is,
 - Typical linux userland applications don't use mempolicy and/or cpusets
   API at all.
 - Then, 99.99% thread's  tsk->cpus_alloed have cpu_all_mask.
 - cpu_all_mask case, every thread can share the same bitmap. It may help to
   reduce L1 cache miss in scheduler.

What do you think?


[-- Attachment #2: result.txt --]
[-- Type: application/octet-stream, Size: 3177 bytes --]

<before>

 % perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench  10 thread 1000

Time: 14.297
Time: 14.163
Time: 13.853
Time: 13.845
Time: 14.493
Time: 14.222
Time: 14.086
Time: 14.177
Time: 13.436
Time: 14.289

 Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

       28143.117149  task-clock-msecs         #      1.987 CPUs    ( +-   0.672% )
            1794735  context-switches         #      0.064 M/sec   ( +-   3.190% )
                992  page-faults              #      0.000 M/sec   ( +-   0.020% )
        82001920882  cycles                   #   2913.747 M/sec   ( +-   0.682% )  (scaled from 25.46%)
        85383145305  instructions             #      1.041 IPC     ( +-   0.484% )  (scaled from 38.18%)
        15450521139  branches                 #    548.998 M/sec   ( +-   0.443% )  (scaled from 37.94%)
          120279550  branch-misses            #      0.778 %       ( +-   0.563% )  (scaled from 37.98%)
         1603777813  cache-references         #     56.987 M/sec   ( +-   1.824% )  (scaled from 25.36%)
           13780381  cache-misses             #      0.490 M/sec   ( +-   1.360% )  (scaled from 25.55%)
        24872032348  L1-dcache-loads          #    883.770 M/sec   ( +-   0.666% )  (scaled from 25.51%)
          640394580  L1-dcache-load-misses    #     22.755 M/sec   ( +-   0.796% )  (scaled from 25.47%)

       14.162411769  seconds time elapsed   ( +-   0.675% )



<after>

 % perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench  10 thread 1000

Time: 13.533
Time: 14.019
Time: 15.578
Time: 13.906
Time: 13.497
Time: 12.930
Time: 13.044
Time: 13.699
Time: 13.863
Time: 13.290

 Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

       27462.769739  task-clock-msecs         #      1.988 CPUs    ( +-   1.708% )
            1655307  context-switches         #      0.060 M/sec   ( +-   7.206% )
                992  page-faults              #      0.000 M/sec   ( +-   0.015% )
        80029833041  cycles                   #   2914.121 M/sec   ( +-   1.766% )  (scaled from 25.43%)
        84901647154  instructions             #      1.061 IPC     ( +-   1.211% )  (scaled from 38.14%)
        15381961224  branches                 #    560.102 M/sec   ( +-   1.049% )  (scaled from 37.90%)
          118862277  branch-misses            #      0.773 %       ( +-   1.213% )  (scaled from 38.00%)
         1416147603  cache-references         #     51.566 M/sec   ( +-   4.407% )  (scaled from 25.40%)
           10920284  cache-misses             #      0.398 M/sec   ( +-   5.454% )  (scaled from 25.56%)
        24666962632  L1-dcache-loads          #    898.196 M/sec   ( +-   1.747% )  (scaled from 25.54%)
          598640329  L1-dcache-load-misses    #     21.798 M/sec   ( +-   2.504% )  (scaled from 25.50%)

       13.812193312  seconds time elapsed   ( +-   1.696% )



[-- Attachment #3: result.txt --]
[-- Type: application/octet-stream, Size: 3177 bytes --]

<before>

 % perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench  10 thread 1000

Time: 14.297
Time: 14.163
Time: 13.853
Time: 13.845
Time: 14.493
Time: 14.222
Time: 14.086
Time: 14.177
Time: 13.436
Time: 14.289

 Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

       28143.117149  task-clock-msecs         #      1.987 CPUs    ( +-   0.672% )
            1794735  context-switches         #      0.064 M/sec   ( +-   3.190% )
                992  page-faults              #      0.000 M/sec   ( +-   0.020% )
        82001920882  cycles                   #   2913.747 M/sec   ( +-   0.682% )  (scaled from 25.46%)
        85383145305  instructions             #      1.041 IPC     ( +-   0.484% )  (scaled from 38.18%)
        15450521139  branches                 #    548.998 M/sec   ( +-   0.443% )  (scaled from 37.94%)
          120279550  branch-misses            #      0.778 %       ( +-   0.563% )  (scaled from 37.98%)
         1603777813  cache-references         #     56.987 M/sec   ( +-   1.824% )  (scaled from 25.36%)
           13780381  cache-misses             #      0.490 M/sec   ( +-   1.360% )  (scaled from 25.55%)
        24872032348  L1-dcache-loads          #    883.770 M/sec   ( +-   0.666% )  (scaled from 25.51%)
          640394580  L1-dcache-load-misses    #     22.755 M/sec   ( +-   0.796% )  (scaled from 25.47%)

       14.162411769  seconds time elapsed   ( +-   0.675% )



<after>

 % perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench  10 thread 1000

Time: 13.533
Time: 14.019
Time: 15.578
Time: 13.906
Time: 13.497
Time: 12.930
Time: 13.044
Time: 13.699
Time: 13.863
Time: 13.290

 Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

       27462.769739  task-clock-msecs         #      1.988 CPUs    ( +-   1.708% )
            1655307  context-switches         #      0.060 M/sec   ( +-   7.206% )
                992  page-faults              #      0.000 M/sec   ( +-   0.015% )
        80029833041  cycles                   #   2914.121 M/sec   ( +-   1.766% )  (scaled from 25.43%)
        84901647154  instructions             #      1.061 IPC     ( +-   1.211% )  (scaled from 38.14%)
        15381961224  branches                 #    560.102 M/sec   ( +-   1.049% )  (scaled from 37.90%)
          118862277  branch-misses            #      0.773 %       ( +-   1.213% )  (scaled from 38.00%)
         1416147603  cache-references         #     51.566 M/sec   ( +-   4.407% )  (scaled from 25.40%)
           10920284  cache-misses             #      0.398 M/sec   ( +-   5.454% )  (scaled from 25.56%)
        24666962632  L1-dcache-loads          #    898.196 M/sec   ( +-   1.747% )  (scaled from 25.54%)
          598640329  L1-dcache-load-misses    #     21.798 M/sec   ( +-   2.504% )  (scaled from 25.50%)

       13.812193312  seconds time elapsed   ( +-   1.696% )



[-- Attachment #4: 0001-s-task-cpus_allowed-tsk_cpus_allowed.patch --]
[-- Type: application/octet-stream, Size: 19412 bytes --]

From 18dbb425dc0d805894a163997518b6ce034632f6 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 27 Apr 2011 11:33:33 +0900
Subject: [PATCH 1/2] s/task->cpus_allowed/tsk_cpus_allowed/

TODO: some arch depent code is still unchanged.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 arch/powerpc/kernel/smp.c                    |    2 +-
 arch/tile/kernel/hardwall.c                  |   10 +++++-----
 arch/x86/kernel/cpu/mcheck/mce_intel.c       |    2 +-
 drivers/acpi/processor_throttling.c          |    4 ++--
 drivers/crypto/n2_core.c                     |    2 +-
 drivers/firmware/dcdbas.c                    |    2 +-
 drivers/infiniband/hw/ipath/ipath_file_ops.c |    6 +++---
 fs/proc/array.c                              |    4 ++--
 include/linux/cpuset.h                       |    2 +-
 kernel/cpuset.c                              |    8 ++++----
 kernel/kthread.c                             |    2 +-
 kernel/rcutree.c                             |    4 ++--
 kernel/sched.c                               |   16 ++++++++--------
 kernel/sched_cpupri.c                        |    4 ++--
 kernel/sched_fair.c                          |   12 ++++++------
 kernel/sched_rt.c                            |    6 +++---
 kernel/trace/trace_workqueue.c               |    6 +++---
 kernel/workqueue.c                           |    2 +-
 lib/smp_processor_id.c                       |    2 +-
 19 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 9f9c204..2253bfa 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -608,7 +608,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	 * se we pin us down to CPU 0 for a short while
 	 */
 	alloc_cpumask_var(&old_mask, GFP_NOWAIT);
-	cpumask_copy(old_mask, &current->cpus_allowed);
+	cpumask_copy(old_mask, tsk_cpus_allowed(current));
 	set_cpus_allowed_ptr(current, cpumask_of(boot_cpuid));
 	
 	if (smp_ops && smp_ops->setup_cpu)
diff --git a/arch/tile/kernel/hardwall.c b/arch/tile/kernel/hardwall.c
index e910530..5d7f4e0 100644
--- a/arch/tile/kernel/hardwall.c
+++ b/arch/tile/kernel/hardwall.c
@@ -413,12 +413,12 @@ static int hardwall_activate(struct hardwall_info *rect)
 	 * Get our affinity; if we're not bound to this tile uniquely,
 	 * we can't access the network registers.
 	 */
-	if (cpumask_weight(&p->cpus_allowed) != 1)
+	if (cpumask_weight(tsk_cpus_allowed(p)) != 1)
 		return -EPERM;
 
 	/* Make sure we are bound to a cpu in this rectangle. */
 	cpu = smp_processor_id();
-	BUG_ON(cpumask_first(&p->cpus_allowed) != cpu);
+	BUG_ON(cpumask_first(tsk_cpus_allowed(p)) != cpu);
 	x = cpu_x(cpu);
 	y = cpu_y(cpu);
 	if (!contains(rect, x, y))
@@ -451,11 +451,11 @@ static void _hardwall_deactivate(struct task_struct *task)
 {
 	struct thread_struct *ts = &task->thread;
 
-	if (cpumask_weight(&task->cpus_allowed) != 1) {
+	if (cpumask_weight(tsk_cpus_allowed(task)) != 1) {
 		pr_err("pid %d (%s) releasing networks with"
 		       " an affinity mask containing %d cpus!\n",
 		       task->pid, task->comm,
-		       cpumask_weight(&task->cpus_allowed));
+		       cpumask_weight(tsk_cpus_allowed(task)));
 		BUG();
 	}
 
@@ -674,7 +674,7 @@ int proc_tile_hardwall_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "%dx%d %d,%d pids:",
 			   r->width, r->height, r->ulhc_x, r->ulhc_y);
 		list_for_each_entry(p, &r->task_head, thread.hardwall_list) {
-			unsigned int cpu = cpumask_first(&p->cpus_allowed);
+			unsigned int cpu = cpumask_first(tsk_cpus_allowed(p));
 			unsigned int x = cpu % smp_width;
 			unsigned int y = cpu / smp_width;
 			seq_printf(sf, " %d@%d,%d", p->pid, x, y);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 8694ef56..5bd27ee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -177,7 +177,7 @@ void cmci_rediscover(int dying)
 		return;
 	if (!alloc_cpumask_var(&old, GFP_KERNEL))
 		return;
-	cpumask_copy(old, &current->cpus_allowed);
+	cpumask_copy(old, tsk_cpus_allowed(current));
 
 	for_each_online_cpu(cpu) {
 		if (cpu == dying)
diff --git a/drivers/acpi/processor_throttling.c b/drivers/acpi/processor_throttling.c
index 605a295..648f14c 100644
--- a/drivers/acpi/processor_throttling.c
+++ b/drivers/acpi/processor_throttling.c
@@ -910,7 +910,7 @@ static int acpi_processor_get_throttling(struct acpi_processor *pr)
 	/*
 	 * Migrate task to the cpu pointed by pr.
 	 */
-	cpumask_copy(saved_mask, &current->cpus_allowed);
+	cpumask_copy(saved_mask, tsk_cpus_allowed(current));
 	/* FIXME: use work_on_cpu() */
 	if (set_cpus_allowed_ptr(current, cpumask_of(pr->id))) {
 		/* Can't migrate to the target pr->id CPU. Exit */
@@ -1099,7 +1099,7 @@ int acpi_processor_set_throttling(struct acpi_processor *pr,
 		return -ENODEV;
 	}
 
-	cpumask_copy(saved_mask, &current->cpus_allowed);
+	cpumask_copy(saved_mask, tsk_cpus_allowed(current));
 	t_state.target_state = state;
 	p_throttling = &(pr->throttling);
 	cpumask_and(online_throttling_cpus, cpu_online_mask,
diff --git a/drivers/crypto/n2_core.c b/drivers/crypto/n2_core.c
index 2e5b204..4e620eb 100644
--- a/drivers/crypto/n2_core.c
+++ b/drivers/crypto/n2_core.c
@@ -1664,7 +1664,7 @@ static int spu_queue_register(struct spu_queue *p, unsigned long q_type)
 	if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
 		return -ENOMEM;
 
-	cpumask_copy(old_allowed, &current->cpus_allowed);
+	cpumask_copy(old_allowed, tsk_cpus_allowed(current));
 
 	set_cpus_allowed_ptr(current, &p->sharing);
 
diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
index ea5ac2d..a1be63c 100644
--- a/drivers/firmware/dcdbas.c
+++ b/drivers/firmware/dcdbas.c
@@ -258,7 +258,7 @@ int dcdbas_smi_request(struct smi_cmd *smi_cmd)
 	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
 		return -ENOMEM;
 
-	cpumask_copy(old_mask, &current->cpus_allowed);
+	cpumask_copy(old_mask, tsk_cpus_allowed(current));
 	set_cpus_allowed_ptr(current, cpumask_of(0));
 	if (smp_processor_id() != 0) {
 		dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index bdf4422..5462447 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1684,11 +1684,11 @@ static int find_best_unit(struct file *fp,
 	 * information.  There may be some issues with dual core numbering
 	 * as well.  This needs more work prior to release.
 	 */
-	if (!cpumask_empty(&current->cpus_allowed) &&
-	    !cpumask_full(&current->cpus_allowed)) {
+	if (!cpumask_empty(tsk_cpus_allowed(current)) &&
+	    !cpumask_full(tsk_cpus_allowed(current))) {
 		int ncpus = num_online_cpus(), curcpu = -1, nset = 0;
 		for (i = 0; i < ncpus; i++)
-			if (cpumask_test_cpu(i, &current->cpus_allowed)) {
+			if (cpumask_test_cpu(i, tsk_cpus_allowed(current))) {
 				ipath_cdbg(PROC, "%s[%u] affinity set for "
 					   "cpu %d/%d\n", current->comm,
 					   current->pid, i, ncpus);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b45ee8..b860301 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -330,10 +330,10 @@ static inline void task_context_switch_counts(struct seq_file *m,
 static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
 {
 	seq_puts(m, "Cpus_allowed:\t");
-	seq_cpumask(m, &task->cpus_allowed);
+	seq_cpumask(m, tsk_cpus_allowed(task));
 	seq_putc(m, '\n');
 	seq_puts(m, "Cpus_allowed_list:\t");
-	seq_cpumask_list(m, &task->cpus_allowed);
+	seq_cpumask_list(m, tsk_cpus_allowed(task));
 	seq_putc(m, '\n');
 }
 
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index f20eb8f..684fe71 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -146,7 +146,7 @@ static inline void cpuset_cpus_allowed(struct task_struct *p,
 
 static inline int cpuset_cpus_allowed_fallback(struct task_struct *p)
 {
-	cpumask_copy(&p->cpus_allowed, cpu_possible_mask);
+	cpumask_copy(tsk_cpus_allowed(p), cpu_possible_mask);
 	return cpumask_any(cpu_active_mask);
 }
 
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1ceeb04..0deb871 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -797,7 +797,7 @@ void rebuild_sched_domains(void)
 static int cpuset_test_cpumask(struct task_struct *tsk,
 			       struct cgroup_scanner *scan)
 {
-	return !cpumask_equal(&tsk->cpus_allowed,
+	return !cpumask_equal(tsk_cpus_allowed(tsk),
 			(cgroup_cs(scan->cg))->cpus_allowed);
 }
 
@@ -2190,7 +2190,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 	rcu_read_lock();
 	cs = task_cs(tsk);
 	if (cs)
-		cpumask_copy(&tsk->cpus_allowed, cs->cpus_allowed);
+		cpumask_copy(tsk_cpus_allowed(tsk), cs->cpus_allowed);
 	rcu_read_unlock();
 
 	/*
@@ -2208,7 +2208,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 	 * the pending set_cpus_allowed_ptr() will fix things.
 	 */
 
-	cpu = cpumask_any_and(&tsk->cpus_allowed, cpu_active_mask);
+	cpu = cpumask_any_and(tsk_cpus_allowed(tsk), cpu_active_mask);
 	if (cpu >= nr_cpu_ids) {
 		/*
 		 * Either tsk->cpus_allowed is wrong (see above) or it
@@ -2217,7 +2217,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 		 * Like above we can temporary set any mask and rely on
 		 * set_cpus_allowed_ptr() as synchronization point.
 		 */
-		cpumask_copy(&tsk->cpus_allowed, cpu_possible_mask);
+		cpumask_copy(tsk_cpus_allowed(tsk), cpu_possible_mask);
 		cpu = cpumask_any(cpu_active_mask);
 	}
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4102518..5f35501 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -202,7 +202,7 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
 		return;
 	}
 
-	cpumask_copy(&p->cpus_allowed, cpumask_of(cpu));
+	cpumask_copy(tsk_cpus_allowed(p), cpumask_of(cpu));
 	p->rt.nr_cpus_allowed = 1;
 	p->flags |= PF_THREAD_BOUND;
 }
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 01b1dca..ac530d9 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1488,13 +1488,13 @@ static void rcu_yield(int cpu)
 static int rcu_cpu_kthread_should_stop(int cpu)
 {
 	while (cpu_is_offline(cpu) ||
-	       !cpumask_equal(&current->cpus_allowed, cpumask_of(cpu)) ||
+	       !cpumask_equal(tsk_cpus_allowed(current), cpumask_of(cpu)) ||
 	       smp_processor_id() != cpu) {
 		if (kthread_should_stop())
 			return 1;
 		local_bh_enable();
 		schedule_timeout_uninterruptible(1);
-		if (!cpumask_equal(&current->cpus_allowed, cpumask_of(cpu)))
+		if (!cpumask_equal(tsk_cpus_allowed(current), cpumask_of(cpu)))
 			set_cpus_allowed_ptr(current, cpumask_of(cpu));
 		local_bh_disable();
 	}
diff --git a/kernel/sched.c b/kernel/sched.c
index fd4625f..254d299 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2344,11 +2344,11 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 
 	/* Look for allowed, online CPU in same node. */
 	for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
-		if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
+		if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 			return dest_cpu;
 
 	/* Any allowed, online CPU? */
-	dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
+	dest_cpu = cpumask_any_and(tsk_cpus_allowed(p), cpu_active_mask);
 	if (dest_cpu < nr_cpu_ids)
 		return dest_cpu;
 
@@ -2385,7 +2385,7 @@ int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
 	 * [ this allows ->select_task() to simply return task_cpu(p) and
 	 *   not worry about this generic constraint ]
 	 */
-	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
+	if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) ||
 		     !cpu_online(cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
@@ -5387,7 +5387,7 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
 		goto out_unlock;
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
+	cpumask_and(mask, tsk_cpus_allowed(p), cpu_online_mask);
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 out_unlock:
@@ -5815,7 +5815,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	idle->state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
 
-	cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
+	cpumask_copy(tsk_cpus_allowed(idle), cpumask_of(cpu));
 	/*
 	 * We're having a chicken and egg problem, even though we are
 	 * holding rq->lock, the cpu isn't yet set to this cpu so the
@@ -5944,7 +5944,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	}
 
 	if (unlikely((p->flags & PF_THREAD_BOUND) && p != current &&
-		     !cpumask_equal(&p->cpus_allowed, new_mask))) {
+		     !cpumask_equal(tsk_cpus_allowed(p), new_mask))) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -5952,7 +5952,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	if (p->sched_class->set_cpus_allowed)
 		p->sched_class->set_cpus_allowed(p, new_mask);
 	else {
-		cpumask_copy(&p->cpus_allowed, new_mask);
+		cpumask_copy(tsk_cpus_allowed(p), new_mask);
 		p->rt.nr_cpus_allowed = cpumask_weight(new_mask);
 	}
 
@@ -6004,7 +6004,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
 	if (task_cpu(p) != src_cpu)
 		goto done;
 	/* Affinity changed (again). */
-	if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
+	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 		goto fail;
 
 	/*
diff --git a/kernel/sched_cpupri.c b/kernel/sched_cpupri.c
index 2722dc1..199a231 100644
--- a/kernel/sched_cpupri.c
+++ b/kernel/sched_cpupri.c
@@ -77,11 +77,11 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		if (idx >= task_pri)
 			break;
 
-		if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
+		if (cpumask_any_and(tsk_cpus_allowed(p), vec->mask) >= nr_cpu_ids)
 			continue;
 
 		if (lowest_mask) {
-			cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
+			cpumask_and(lowest_mask, tsk_cpus_allowed(p), vec->mask);
 
 			/*
 			 * We have to ensure that we have at least one bit
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8744593..bccd303 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1554,7 +1554,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 
 		/* Skip over this group if it has no CPUs allowed */
 		if (!cpumask_intersects(sched_group_cpus(group),
-					&p->cpus_allowed))
+					tsk_cpus_allowed(p)))
 			continue;
 
 		local_group = cpumask_test_cpu(this_cpu,
@@ -1600,7 +1600,7 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 	int i;
 
 	/* Traverse only the allowed CPUs */
-	for_each_cpu_and(i, sched_group_cpus(group), &p->cpus_allowed) {
+	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
 		load = weighted_cpuload(i);
 
 		if (load < min_load || (load == min_load && i == this_cpu)) {
@@ -1644,7 +1644,7 @@ static int select_idle_sibling(struct task_struct *p, int target)
 		if (!(sd->flags & SD_SHARE_PKG_RESOURCES))
 			break;
 
-		for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) {
+		for_each_cpu_and(i, sched_domain_span(sd), tsk_cpus_allowed(p)) {
 			if (idle_cpu(i)) {
 				target = i;
 				break;
@@ -1687,7 +1687,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 	int sync = wake_flags & WF_SYNC;
 
 	if (sd_flag & SD_BALANCE_WAKE) {
-		if (cpumask_test_cpu(cpu, &p->cpus_allowed))
+		if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
 			want_affine = 1;
 		new_cpu = prev_cpu;
 	}
@@ -2046,7 +2046,7 @@ int can_migrate_task(struct task_struct *p, struct rq *rq, int this_cpu,
 	 * 2) cannot be migrated to this CPU due to cpus_allowed, or
 	 * 3) are cache-hot on their current CPU.
 	 */
-	if (!cpumask_test_cpu(this_cpu, &p->cpus_allowed)) {
+	if (!cpumask_test_cpu(this_cpu, tsk_cpus_allowed(p))) {
 		schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
 		return 0;
 	}
@@ -3399,7 +3399,7 @@ redo:
 			 * moved to this_cpu
 			 */
 			if (!cpumask_test_cpu(this_cpu,
-					      &busiest->curr->cpus_allowed)) {
+					      tsk_cpus_allowed(busiest->curr))) {
 				raw_spin_unlock_irqrestore(&busiest->lock,
 							    flags);
 				all_pinned = 1;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..92cdf8c 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1164,7 +1164,7 @@ static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep);
 static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
 {
 	if (!task_running(rq, p) &&
-	    (cpu < 0 || cpumask_test_cpu(cpu, &p->cpus_allowed)) &&
+	    (cpu < 0 || cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) &&
 	    (p->rt.nr_cpus_allowed > 1))
 		return 1;
 	return 0;
@@ -1299,7 +1299,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 			 */
 			if (unlikely(task_rq(task) != rq ||
 				     !cpumask_test_cpu(lowest_rq->cpu,
-						       &task->cpus_allowed) ||
+						       tsk_cpus_allowed(task)) ||
 				     task_running(rq, task) ||
 				     !task->on_rq)) {
 
@@ -1583,7 +1583,7 @@ static void set_cpus_allowed_rt(struct task_struct *p,
 		update_rt_migration(&rq->rt);
 	}
 
-	cpumask_copy(&p->cpus_allowed, new_mask);
+	cpumask_copy(tsk_cpus_allowed(p), new_mask);
 	p->rt.nr_cpus_allowed = weight;
 }
 
diff --git a/kernel/trace/trace_workqueue.c b/kernel/trace/trace_workqueue.c
index 209b379..c557119 100644
--- a/kernel/trace/trace_workqueue.c
+++ b/kernel/trace/trace_workqueue.c
@@ -53,7 +53,7 @@ probe_workqueue_insertion(void *ignore,
 			  struct task_struct *wq_thread,
 			  struct work_struct *work)
 {
-	int cpu = cpumask_first(&wq_thread->cpus_allowed);
+	int cpu = cpumask_first(tsk_cpus_allowed(wq_thread));
 	struct cpu_workqueue_stats *node;
 	unsigned long flags;
 
@@ -75,7 +75,7 @@ probe_workqueue_execution(void *ignore,
 			  struct task_struct *wq_thread,
 			  struct work_struct *work)
 {
-	int cpu = cpumask_first(&wq_thread->cpus_allowed);
+	int cpu = cpumask_first(tsk_cpus_allowed(wq_thread));
 	struct cpu_workqueue_stats *node;
 	unsigned long flags;
 
@@ -121,7 +121,7 @@ static void
 probe_workqueue_destruction(void *ignore, struct task_struct *wq_thread)
 {
 	/* Workqueue only execute on one cpu */
-	int cpu = cpumask_first(&wq_thread->cpus_allowed);
+	int cpu = cpumask_first(tsk_cpus_allowed(wq_thread));
 	struct cpu_workqueue_stats *node, *next;
 	unsigned long flags;
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b082d70..acaf8a9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1286,7 +1286,7 @@ __acquires(&gcwq->lock)
 		if (gcwq->flags & GCWQ_DISASSOCIATED)
 			return false;
 		if (task_cpu(task) == gcwq->cpu &&
-		    cpumask_equal(&current->cpus_allowed,
+		    cpumask_equal(tsk_cpus_allowed(current),
 				  get_cpu_mask(gcwq->cpu)))
 			return true;
 		spin_unlock_irq(&gcwq->lock);
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 4689cb0..503f087 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
-	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
+	if (cpumask_equal(tsk_cpus_allowed(current), cpumask_of(this_cpu)))
 		goto out;
 
 	/*
-- 
1.7.3.1


[-- Attachment #5: 0002-change-task-cpus_allowed-to-pointer.patch --]
[-- Type: application/octet-stream, Size: 8123 bytes --]

From 9693b69bb8897580752265ad34df03343ef78e4d Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 27 Apr 2011 13:51:08 +0900
Subject: [PATCH 2/2] change task->cpus_allowed to pointer

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 arch/x86/kernel/init_task.c |    2 ++
 include/linux/cpuset.h      |    3 ++-
 include/linux/init_task.h   |    4 ++--
 include/linux/sched.h       |   15 +++++++++++----
 kernel/cpuset.c             |    9 ++++++---
 kernel/fork.c               |   19 +++++++++++++++++++
 kernel/kthread.c            |    3 ++-
 kernel/sched.c              |   12 ++++++++++--
 kernel/sched_rt.c           |    2 +-
 9 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/init_task.c b/arch/x86/kernel/init_task.c
index 43e9ccf..4715d82 100644
--- a/arch/x86/kernel/init_task.c
+++ b/arch/x86/kernel/init_task.c
@@ -23,6 +23,8 @@ static struct sighand_struct init_sighand = INIT_SIGHAND(init_sighand);
 union thread_union init_thread_union __init_task_data =
 	{ INIT_THREAD_INFO(init_task) };
 
+struct cpumask init_cpus_allowed = CPU_MASK_ALL;
+
 /*
  * Initial task structure.
  *
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 684fe71..c20a45d 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -146,7 +146,8 @@ static inline void cpuset_cpus_allowed(struct task_struct *p,
 
 static inline int cpuset_cpus_allowed_fallback(struct task_struct *p)
 {
-	cpumask_copy(tsk_cpus_allowed(p), cpu_possible_mask);
+	cpumask_copy(p->cpus_allowed_ptr, cpu_possible_mask);
+	p->flags |= PF_THREAD_UNBOUND;
 	return cpumask_any(cpu_active_mask);
 }
 
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 10bdf82..4142dda 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -135,13 +135,13 @@ extern struct cred init_cred;
 	.state		= 0,						\
 	.stack		= &init_thread_info,				\
 	.usage		= ATOMIC_INIT(2),				\
-	.flags		= PF_KTHREAD,					\
+	.flags		= PF_KTHREAD | PF_THREAD_UNBOUND,		\
 	.lock_depth	= -1,						\
 	.prio		= MAX_PRIO-20,					\
 	.static_prio	= MAX_PRIO-20,					\
 	.normal_prio	= MAX_PRIO-20,					\
 	.policy		= SCHED_NORMAL,					\
-	.cpus_allowed	= CPU_MASK_ALL,					\
+	.cpus_allowed_ptr = &init_cpus_allowed,				\
 	.mm		= NULL,						\
 	.active_mm	= &init_mm,					\
 	.se		= {						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3f7d3f9..716b24a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1233,7 +1233,7 @@ struct task_struct {
 #endif
 
 	unsigned int policy;
-	cpumask_t cpus_allowed;
+	struct cpumask *cpus_allowed_ptr;
 
 #ifdef CONFIG_PREEMPT_RCU
 	int rcu_read_lock_nesting;
@@ -1544,9 +1544,6 @@ struct task_struct {
 #endif
 };
 
-/* Future-safe accessor for struct task_struct's cpus_allowed. */
-#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
-
 /*
  * Priority of a process goes from 0..MAX_PRIO-1, valid RT
  * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
@@ -1729,6 +1726,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * Per process flags
  */
+#define PF_THREAD_UNBOUND 0x00000001
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
@@ -1759,6 +1757,15 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezable */
 #define PF_FREEZER_NOSIG 0x80000000	/* Freezer won't send signals to it */
 
+/* Future-safe accessor for struct task_struct's cpus_allowed. */
+static inline const struct cpumask* tsk_cpus_allowed(struct task_struct *task)
+{
+	if (task->flags & PF_THREAD_UNBOUND)
+		return cpu_possible_mask;
+
+	return task->cpus_allowed_ptr;
+}
+
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
  * tasks can access tsk->flags in readonly mode for example
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 0deb871..ccb4890 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2189,8 +2189,10 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 
 	rcu_read_lock();
 	cs = task_cs(tsk);
-	if (cs)
-		cpumask_copy(tsk_cpus_allowed(tsk), cs->cpus_allowed);
+	if (cs) {
+		cpumask_copy(tsk->cpus_allowed_ptr, cs->cpus_allowed);
+		tsk->flags &= ~PF_THREAD_UNBOUND;
+	}
 	rcu_read_unlock();
 
 	/*
@@ -2217,7 +2219,8 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 		 * Like above we can temporary set any mask and rely on
 		 * set_cpus_allowed_ptr() as synchronization point.
 		 */
-		cpumask_copy(tsk_cpus_allowed(tsk), cpu_possible_mask);
+		cpumask_copy(tsk->cpus_allowed_ptr, cpu_possible_mask);
+		tsk->flags |= PF_THREAD_UNBOUND;
 		cpu = cpumask_any(cpu_active_mask);
 	}
 
diff --git a/kernel/fork.c b/kernel/fork.c
index cc04197..485ab7d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -169,6 +169,8 @@ void free_task(struct task_struct *tsk)
 	free_thread_info(tsk->stack);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
+	if (tsk->cpus_allowed_ptr)
+		kfree(tsk->cpus_allowed_ptr);
 	free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -250,6 +252,19 @@ int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
 	return 0;
 }
 
+static int dup_task_cpus_allowed(struct task_struct *task, struct task_struct *orig)
+{
+	struct cpumask *cpumask;
+
+	cpumask = kmalloc(cpumask_size(), GFP_KERNEL);
+	if (!cpumask)
+		return -ENOMEM;
+	cpumask_copy(cpumask, orig->cpus_allowed_ptr);
+	task->cpus_allowed_ptr = cpumask;
+
+	return 0;
+}
+
 static struct task_struct *dup_task_struct(struct task_struct *orig)
 {
 	struct task_struct *tsk;
@@ -280,6 +295,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 	if (err)
 		goto out;
 
+	err = dup_task_cpus_allowed(tsk, orig);
+	if (err)
+		goto out;
+
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
 	clear_tsk_need_resched(tsk);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5f35501..6d32c72 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -202,8 +202,9 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
 		return;
 	}
 
-	cpumask_copy(tsk_cpus_allowed(p), cpumask_of(cpu));
+	cpumask_copy(p->cpus_allowed_ptr, cpumask_of(cpu));
 	p->rt.nr_cpus_allowed = 1;
+	p->flags &= ~PF_THREAD_UNBOUND;
 	p->flags |= PF_THREAD_BOUND;
 }
 EXPORT_SYMBOL(kthread_bind);
diff --git a/kernel/sched.c b/kernel/sched.c
index 254d299..764576c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5815,7 +5815,10 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	idle->state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
 
-	cpumask_copy(tsk_cpus_allowed(idle), cpumask_of(cpu));
+	WARN_ON(!idle->cpus_allowed_ptr);
+	cpumask_copy(idle->cpus_allowed_ptr, cpumask_of(cpu));
+	idle->flags &= ~PF_THREAD_UNBOUND;
+
 	/*
 	 * We're having a chicken and egg problem, even though we are
 	 * holding rq->lock, the cpu isn't yet set to this cpu so the
@@ -5952,10 +5955,15 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	if (p->sched_class->set_cpus_allowed)
 		p->sched_class->set_cpus_allowed(p, new_mask);
 	else {
-		cpumask_copy(tsk_cpus_allowed(p), new_mask);
+		cpumask_copy(p->cpus_allowed_ptr, new_mask);
 		p->rt.nr_cpus_allowed = cpumask_weight(new_mask);
 	}
 
+	if (cpumask_equal(new_mask, cpu_possible_mask))
+		p->flags |= PF_THREAD_UNBOUND;
+	else
+		p->flags &= ~PF_THREAD_UNBOUND;
+
 	/* Can the task run on the task's current CPU? If so, we're done */
 	if (cpumask_test_cpu(task_cpu(p), new_mask))
 		goto out;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 92cdf8c..291f33e 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1583,7 +1583,7 @@ static void set_cpus_allowed_rt(struct task_struct *p,
 		update_rt_migration(&rq->rt);
 	}
 
-	cpumask_copy(tsk_cpus_allowed(p), new_mask);
+	cpumask_copy(p->cpus_allowed_ptr, new_mask);
 	p->rt.nr_cpus_allowed = weight;
 }
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
  2011-04-27 10:32     ` KOSAKI Motohiro
@ 2011-05-26 20:38       ` Peter Zijlstra
  2011-05-30  1:40         ` KOSAKI Motohiro
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2011-05-26 20:38 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Andrew Morton, Mike Galbraith, Ingo Molnar

On Wed, 2011-04-27 at 19:32 +0900, KOSAKI Motohiro wrote:
> 
> I've made concept proof patch today. The result is better than I expected.
> 
> <before>
>  Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
> 
>          1603777813  cache-references         #     56.987 M/sec   ( +-   1.824% )  (scaled from 25.36%)
>            13780381  cache-misses             #      0.490 M/sec   ( +-   1.360% )  (scaled from 25.55%)
>         24872032348  L1-dcache-loads          #    883.770 M/sec   ( +-   0.666% )  (scaled from 25.51%)
>           640394580  L1-dcache-load-misses    #     22.755 M/sec   ( +-   0.796% )  (scaled from 25.47%)
> 
>        14.162411769  seconds time elapsed   ( +-   0.675% )
> 
> <after>
>  Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
> 
>          1416147603  cache-references         #     51.566 M/sec   ( +-   4.407% )  (scaled from 25.40%)
>            10920284  cache-misses             #      0.398 M/sec   ( +-   5.454% )  (scaled from 25.56%)
>         24666962632  L1-dcache-loads          #    898.196 M/sec   ( +-   1.747% )  (scaled from 25.54%)
>           598640329  L1-dcache-load-misses    #     21.798 M/sec   ( +-   2.504% )  (scaled from 25.50%)
> 
>        13.812193312  seconds time elapsed   ( +-   1.696% )
> 
>  * datail data is in result.txt
> 
> 
> The trick is,
>  - Typical linux userland applications don't use mempolicy and/or cpusets
>    API at all.
>  - Then, 99.99% thread's  tsk->cpus_alloed have cpu_all_mask.
>  - cpu_all_mask case, every thread can share the same bitmap. It may help to
>    reduce L1 cache miss in scheduler.
> 
> What do you think? 

Nice!

If you finish the first patch (sort the TODOs) I'll take it.

I'm unsure about the PF_THREAD_UNBOUND thing though, then again, the
alternative is adding another struct cpumask * and have that point to
the shared mask or the private mask.

But yeah, looks quite feasible.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
  2011-05-26 20:38       ` Peter Zijlstra
@ 2011-05-30  1:40         ` KOSAKI Motohiro
  0 siblings, 0 replies; 6+ messages in thread
From: KOSAKI Motohiro @ 2011-05-30  1:40 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: linux-kernel, akpm, efault, mingo

>> The trick is,
>>  - Typical linux userland applications don't use mempolicy and/or cpusets
>>    API at all.
>>  - Then, 99.99% thread's  tsk->cpus_alloed have cpu_all_mask.
>>  - cpu_all_mask case, every thread can share the same bitmap. It may help to
>>    reduce L1 cache miss in scheduler.
>>
>> What do you think? 
> 
> Nice!
> 
> If you finish the first patch (sort the TODOs) I'll take it.

Yeah, now I'm submitting a lot of cpumask cleanup patches to various arch and
subsystems. So, I expect I can finish this work in June.

> I'm unsure about the PF_THREAD_UNBOUND thing though, then again, the
> alternative is adding another struct cpumask * and have that point to
> the shared mask or the private mask.

Ahhh, I'm sorry. My explanation was bad. PF_THREAD_UNBOUND is not my point.
It's only concept proof patch, not for submitting. yes, I did cheat for getting
number easily. I think the good way is probably to add another cpumask* and
implement COW shared mask. but I'm ok other way too.


> But yeah, looks quite feasible.

Thank you to pay attention my patch!


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-30  1:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-25  9:41 [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of() KOSAKI Motohiro
2011-04-26 10:42 ` Peter Zijlstra
2011-04-26 11:33   ` KOSAKI Motohiro
2011-04-27 10:32     ` KOSAKI Motohiro
2011-05-26 20:38       ` Peter Zijlstra
2011-05-30  1:40         ` KOSAKI Motohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.