* Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()
2011-04-26 11:33 ` KOSAKI Motohiro
@ 2011-04-27 10:32 ` KOSAKI Motohiro
2011-05-26 20:38 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2011-04-27 10:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, LKML, Andrew Morton, Mike Galbraith, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]
> > But why? Are we going to get rid of cpumask_t (which is a fixed sized
> > struct to direct assignment is perfectly fine)?
> >
> > Also, do we want to convert cpus_allowed to cpumask_var_t? It would save
> > quite a lot of memory on distro configs that set NR_CPUS silly high.
> > Currently NR_CPUS=4096 configs allocate 512 bytes per task for this
> > bitmap, 511 of which will never be used on most machines (510 in the
> > near future).
> >
> > The cost if of course an extra memory dereference in scheduler hot
> > paths.. also not nice.
Probably, mesurement data is verbose than my poor english...
I've made concept proof patch today. The result is better than I expected.
<before>
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
1603777813 cache-references # 56.987 M/sec ( +- 1.824% ) (scaled from 25.36%)
13780381 cache-misses # 0.490 M/sec ( +- 1.360% ) (scaled from 25.55%)
24872032348 L1-dcache-loads # 883.770 M/sec ( +- 0.666% ) (scaled from 25.51%)
640394580 L1-dcache-load-misses # 22.755 M/sec ( +- 0.796% ) (scaled from 25.47%)
14.162411769 seconds time elapsed ( +- 0.675% )
<after>
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
1416147603 cache-references # 51.566 M/sec ( +- 4.407% ) (scaled from 25.40%)
10920284 cache-misses # 0.398 M/sec ( +- 5.454% ) (scaled from 25.56%)
24666962632 L1-dcache-loads # 898.196 M/sec ( +- 1.747% ) (scaled from 25.54%)
598640329 L1-dcache-load-misses # 21.798 M/sec ( +- 2.504% ) (scaled from 25.50%)
13.812193312 seconds time elapsed ( +- 1.696% )
* datail data is in result.txt
The trick is,
- Typical linux userland applications don't use mempolicy and/or cpusets
API at all.
- Then, 99.99% thread's tsk->cpus_alloed have cpu_all_mask.
- cpu_all_mask case, every thread can share the same bitmap. It may help to
reduce L1 cache miss in scheduler.
What do you think?
[-- Attachment #2: result.txt --]
[-- Type: application/octet-stream, Size: 3177 bytes --]
<before>
% perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench 10 thread 1000
Time: 14.297
Time: 14.163
Time: 13.853
Time: 13.845
Time: 14.493
Time: 14.222
Time: 14.086
Time: 14.177
Time: 13.436
Time: 14.289
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
28143.117149 task-clock-msecs # 1.987 CPUs ( +- 0.672% )
1794735 context-switches # 0.064 M/sec ( +- 3.190% )
992 page-faults # 0.000 M/sec ( +- 0.020% )
82001920882 cycles # 2913.747 M/sec ( +- 0.682% ) (scaled from 25.46%)
85383145305 instructions # 1.041 IPC ( +- 0.484% ) (scaled from 38.18%)
15450521139 branches # 548.998 M/sec ( +- 0.443% ) (scaled from 37.94%)
120279550 branch-misses # 0.778 % ( +- 0.563% ) (scaled from 37.98%)
1603777813 cache-references # 56.987 M/sec ( +- 1.824% ) (scaled from 25.36%)
13780381 cache-misses # 0.490 M/sec ( +- 1.360% ) (scaled from 25.55%)
24872032348 L1-dcache-loads # 883.770 M/sec ( +- 0.666% ) (scaled from 25.51%)
640394580 L1-dcache-load-misses # 22.755 M/sec ( +- 0.796% ) (scaled from 25.47%)
14.162411769 seconds time elapsed ( +- 0.675% )
<after>
% perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench 10 thread 1000
Time: 13.533
Time: 14.019
Time: 15.578
Time: 13.906
Time: 13.497
Time: 12.930
Time: 13.044
Time: 13.699
Time: 13.863
Time: 13.290
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
27462.769739 task-clock-msecs # 1.988 CPUs ( +- 1.708% )
1655307 context-switches # 0.060 M/sec ( +- 7.206% )
992 page-faults # 0.000 M/sec ( +- 0.015% )
80029833041 cycles # 2914.121 M/sec ( +- 1.766% ) (scaled from 25.43%)
84901647154 instructions # 1.061 IPC ( +- 1.211% ) (scaled from 38.14%)
15381961224 branches # 560.102 M/sec ( +- 1.049% ) (scaled from 37.90%)
118862277 branch-misses # 0.773 % ( +- 1.213% ) (scaled from 38.00%)
1416147603 cache-references # 51.566 M/sec ( +- 4.407% ) (scaled from 25.40%)
10920284 cache-misses # 0.398 M/sec ( +- 5.454% ) (scaled from 25.56%)
24666962632 L1-dcache-loads # 898.196 M/sec ( +- 1.747% ) (scaled from 25.54%)
598640329 L1-dcache-load-misses # 21.798 M/sec ( +- 2.504% ) (scaled from 25.50%)
13.812193312 seconds time elapsed ( +- 1.696% )
[-- Attachment #3: result.txt --]
[-- Type: application/octet-stream, Size: 3177 bytes --]
<before>
% perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench 10 thread 1000
Time: 14.297
Time: 14.163
Time: 13.853
Time: 13.845
Time: 14.493
Time: 14.222
Time: 14.086
Time: 14.177
Time: 13.436
Time: 14.289
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
28143.117149 task-clock-msecs # 1.987 CPUs ( +- 0.672% )
1794735 context-switches # 0.064 M/sec ( +- 3.190% )
992 page-faults # 0.000 M/sec ( +- 0.020% )
82001920882 cycles # 2913.747 M/sec ( +- 0.682% ) (scaled from 25.46%)
85383145305 instructions # 1.041 IPC ( +- 0.484% ) (scaled from 38.18%)
15450521139 branches # 548.998 M/sec ( +- 0.443% ) (scaled from 37.94%)
120279550 branch-misses # 0.778 % ( +- 0.563% ) (scaled from 37.98%)
1603777813 cache-references # 56.987 M/sec ( +- 1.824% ) (scaled from 25.36%)
13780381 cache-misses # 0.490 M/sec ( +- 1.360% ) (scaled from 25.55%)
24872032348 L1-dcache-loads # 883.770 M/sec ( +- 0.666% ) (scaled from 25.51%)
640394580 L1-dcache-load-misses # 22.755 M/sec ( +- 0.796% ) (scaled from 25.47%)
14.162411769 seconds time elapsed ( +- 0.675% )
<after>
% perf stat -e 'task-clock cs page-faults cycles instructions branches branch-misses cache-references cache-misses L1-dcache-load L1-dcache-load-misses' --repeat 10 hackbench 10 thread 1000
Time: 13.533
Time: 14.019
Time: 15.578
Time: 13.906
Time: 13.497
Time: 12.930
Time: 13.044
Time: 13.699
Time: 13.863
Time: 13.290
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
27462.769739 task-clock-msecs # 1.988 CPUs ( +- 1.708% )
1655307 context-switches # 0.060 M/sec ( +- 7.206% )
992 page-faults # 0.000 M/sec ( +- 0.015% )
80029833041 cycles # 2914.121 M/sec ( +- 1.766% ) (scaled from 25.43%)
84901647154 instructions # 1.061 IPC ( +- 1.211% ) (scaled from 38.14%)
15381961224 branches # 560.102 M/sec ( +- 1.049% ) (scaled from 37.90%)
118862277 branch-misses # 0.773 % ( +- 1.213% ) (scaled from 38.00%)
1416147603 cache-references # 51.566 M/sec ( +- 4.407% ) (scaled from 25.40%)
10920284 cache-misses # 0.398 M/sec ( +- 5.454% ) (scaled from 25.56%)
24666962632 L1-dcache-loads # 898.196 M/sec ( +- 1.747% ) (scaled from 25.54%)
598640329 L1-dcache-load-misses # 21.798 M/sec ( +- 2.504% ) (scaled from 25.50%)
13.812193312 seconds time elapsed ( +- 1.696% )
[-- Attachment #4: 0001-s-task-cpus_allowed-tsk_cpus_allowed.patch --]
[-- Type: application/octet-stream, Size: 19412 bytes --]
From 18dbb425dc0d805894a163997518b6ce034632f6 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 27 Apr 2011 11:33:33 +0900
Subject: [PATCH 1/2] s/task->cpus_allowed/tsk_cpus_allowed/
TODO: some arch depent code is still unchanged.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
arch/powerpc/kernel/smp.c | 2 +-
arch/tile/kernel/hardwall.c | 10 +++++-----
arch/x86/kernel/cpu/mcheck/mce_intel.c | 2 +-
drivers/acpi/processor_throttling.c | 4 ++--
drivers/crypto/n2_core.c | 2 +-
drivers/firmware/dcdbas.c | 2 +-
drivers/infiniband/hw/ipath/ipath_file_ops.c | 6 +++---
fs/proc/array.c | 4 ++--
include/linux/cpuset.h | 2 +-
kernel/cpuset.c | 8 ++++----
kernel/kthread.c | 2 +-
kernel/rcutree.c | 4 ++--
kernel/sched.c | 16 ++++++++--------
kernel/sched_cpupri.c | 4 ++--
kernel/sched_fair.c | 12 ++++++------
kernel/sched_rt.c | 6 +++---
kernel/trace/trace_workqueue.c | 6 +++---
kernel/workqueue.c | 2 +-
lib/smp_processor_id.c | 2 +-
19 files changed, 48 insertions(+), 48 deletions(-)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 9f9c204..2253bfa 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -608,7 +608,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
* se we pin us down to CPU 0 for a short while
*/
alloc_cpumask_var(&old_mask, GFP_NOWAIT);
- cpumask_copy(old_mask, ¤t->cpus_allowed);
+ cpumask_copy(old_mask, tsk_cpus_allowed(current));
set_cpus_allowed_ptr(current, cpumask_of(boot_cpuid));
if (smp_ops && smp_ops->setup_cpu)
diff --git a/arch/tile/kernel/hardwall.c b/arch/tile/kernel/hardwall.c
index e910530..5d7f4e0 100644
--- a/arch/tile/kernel/hardwall.c
+++ b/arch/tile/kernel/hardwall.c
@@ -413,12 +413,12 @@ static int hardwall_activate(struct hardwall_info *rect)
* Get our affinity; if we're not bound to this tile uniquely,
* we can't access the network registers.
*/
- if (cpumask_weight(&p->cpus_allowed) != 1)
+ if (cpumask_weight(tsk_cpus_allowed(p)) != 1)
return -EPERM;
/* Make sure we are bound to a cpu in this rectangle. */
cpu = smp_processor_id();
- BUG_ON(cpumask_first(&p->cpus_allowed) != cpu);
+ BUG_ON(cpumask_first(tsk_cpus_allowed(p)) != cpu);
x = cpu_x(cpu);
y = cpu_y(cpu);
if (!contains(rect, x, y))
@@ -451,11 +451,11 @@ static void _hardwall_deactivate(struct task_struct *task)
{
struct thread_struct *ts = &task->thread;
- if (cpumask_weight(&task->cpus_allowed) != 1) {
+ if (cpumask_weight(tsk_cpus_allowed(task)) != 1) {
pr_err("pid %d (%s) releasing networks with"
" an affinity mask containing %d cpus!\n",
task->pid, task->comm,
- cpumask_weight(&task->cpus_allowed));
+ cpumask_weight(tsk_cpus_allowed(task)));
BUG();
}
@@ -674,7 +674,7 @@ int proc_tile_hardwall_show(struct seq_file *sf, void *v)
seq_printf(sf, "%dx%d %d,%d pids:",
r->width, r->height, r->ulhc_x, r->ulhc_y);
list_for_each_entry(p, &r->task_head, thread.hardwall_list) {
- unsigned int cpu = cpumask_first(&p->cpus_allowed);
+ unsigned int cpu = cpumask_first(tsk_cpus_allowed(p));
unsigned int x = cpu % smp_width;
unsigned int y = cpu / smp_width;
seq_printf(sf, " %d@%d,%d", p->pid, x, y);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 8694ef56..5bd27ee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -177,7 +177,7 @@ void cmci_rediscover(int dying)
return;
if (!alloc_cpumask_var(&old, GFP_KERNEL))
return;
- cpumask_copy(old, ¤t->cpus_allowed);
+ cpumask_copy(old, tsk_cpus_allowed(current));
for_each_online_cpu(cpu) {
if (cpu == dying)
diff --git a/drivers/acpi/processor_throttling.c b/drivers/acpi/processor_throttling.c
index 605a295..648f14c 100644
--- a/drivers/acpi/processor_throttling.c
+++ b/drivers/acpi/processor_throttling.c
@@ -910,7 +910,7 @@ static int acpi_processor_get_throttling(struct acpi_processor *pr)
/*
* Migrate task to the cpu pointed by pr.
*/
- cpumask_copy(saved_mask, ¤t->cpus_allowed);
+ cpumask_copy(saved_mask, tsk_cpus_allowed(current));
/* FIXME: use work_on_cpu() */
if (set_cpus_allowed_ptr(current, cpumask_of(pr->id))) {
/* Can't migrate to the target pr->id CPU. Exit */
@@ -1099,7 +1099,7 @@ int acpi_processor_set_throttling(struct acpi_processor *pr,
return -ENODEV;
}
- cpumask_copy(saved_mask, ¤t->cpus_allowed);
+ cpumask_copy(saved_mask, tsk_cpus_allowed(current));
t_state.target_state = state;
p_throttling = &(pr->throttling);
cpumask_and(online_throttling_cpus, cpu_online_mask,
diff --git a/drivers/crypto/n2_core.c b/drivers/crypto/n2_core.c
index 2e5b204..4e620eb 100644
--- a/drivers/crypto/n2_core.c
+++ b/drivers/crypto/n2_core.c
@@ -1664,7 +1664,7 @@ static int spu_queue_register(struct spu_queue *p, unsigned long q_type)
if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
return -ENOMEM;
- cpumask_copy(old_allowed, ¤t->cpus_allowed);
+ cpumask_copy(old_allowed, tsk_cpus_allowed(current));
set_cpus_allowed_ptr(current, &p->sharing);
diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
index ea5ac2d..a1be63c 100644
--- a/drivers/firmware/dcdbas.c
+++ b/drivers/firmware/dcdbas.c
@@ -258,7 +258,7 @@ int dcdbas_smi_request(struct smi_cmd *smi_cmd)
if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
return -ENOMEM;
- cpumask_copy(old_mask, ¤t->cpus_allowed);
+ cpumask_copy(old_mask, tsk_cpus_allowed(current));
set_cpus_allowed_ptr(current, cpumask_of(0));
if (smp_processor_id() != 0) {
dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index bdf4422..5462447 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1684,11 +1684,11 @@ static int find_best_unit(struct file *fp,
* information. There may be some issues with dual core numbering
* as well. This needs more work prior to release.
*/
- if (!cpumask_empty(¤t->cpus_allowed) &&
- !cpumask_full(¤t->cpus_allowed)) {
+ if (!cpumask_empty(tsk_cpus_allowed(current)) &&
+ !cpumask_full(tsk_cpus_allowed(current))) {
int ncpus = num_online_cpus(), curcpu = -1, nset = 0;
for (i = 0; i < ncpus; i++)
- if (cpumask_test_cpu(i, ¤t->cpus_allowed)) {
+ if (cpumask_test_cpu(i, tsk_cpus_allowed(current))) {
ipath_cdbg(PROC, "%s[%u] affinity set for "
"cpu %d/%d\n", current->comm,
current->pid, i, ncpus);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b45ee8..b860301 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -330,10 +330,10 @@ static inline void task_context_switch_counts(struct seq_file *m,
static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
{
seq_puts(m, "Cpus_allowed:\t");
- seq_cpumask(m, &task->cpus_allowed);
+ seq_cpumask(m, tsk_cpus_allowed(task));
seq_putc(m, '\n');
seq_puts(m, "Cpus_allowed_list:\t");
- seq_cpumask_list(m, &task->cpus_allowed);
+ seq_cpumask_list(m, tsk_cpus_allowed(task));
seq_putc(m, '\n');
}
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index f20eb8f..684fe71 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -146,7 +146,7 @@ static inline void cpuset_cpus_allowed(struct task_struct *p,
static inline int cpuset_cpus_allowed_fallback(struct task_struct *p)
{
- cpumask_copy(&p->cpus_allowed, cpu_possible_mask);
+ cpumask_copy(tsk_cpus_allowed(p), cpu_possible_mask);
return cpumask_any(cpu_active_mask);
}
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1ceeb04..0deb871 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -797,7 +797,7 @@ void rebuild_sched_domains(void)
static int cpuset_test_cpumask(struct task_struct *tsk,
struct cgroup_scanner *scan)
{
- return !cpumask_equal(&tsk->cpus_allowed,
+ return !cpumask_equal(tsk_cpus_allowed(tsk),
(cgroup_cs(scan->cg))->cpus_allowed);
}
@@ -2190,7 +2190,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
rcu_read_lock();
cs = task_cs(tsk);
if (cs)
- cpumask_copy(&tsk->cpus_allowed, cs->cpus_allowed);
+ cpumask_copy(tsk_cpus_allowed(tsk), cs->cpus_allowed);
rcu_read_unlock();
/*
@@ -2208,7 +2208,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
* the pending set_cpus_allowed_ptr() will fix things.
*/
- cpu = cpumask_any_and(&tsk->cpus_allowed, cpu_active_mask);
+ cpu = cpumask_any_and(tsk_cpus_allowed(tsk), cpu_active_mask);
if (cpu >= nr_cpu_ids) {
/*
* Either tsk->cpus_allowed is wrong (see above) or it
@@ -2217,7 +2217,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
* Like above we can temporary set any mask and rely on
* set_cpus_allowed_ptr() as synchronization point.
*/
- cpumask_copy(&tsk->cpus_allowed, cpu_possible_mask);
+ cpumask_copy(tsk_cpus_allowed(tsk), cpu_possible_mask);
cpu = cpumask_any(cpu_active_mask);
}
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4102518..5f35501 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -202,7 +202,7 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
return;
}
- cpumask_copy(&p->cpus_allowed, cpumask_of(cpu));
+ cpumask_copy(tsk_cpus_allowed(p), cpumask_of(cpu));
p->rt.nr_cpus_allowed = 1;
p->flags |= PF_THREAD_BOUND;
}
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 01b1dca..ac530d9 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1488,13 +1488,13 @@ static void rcu_yield(int cpu)
static int rcu_cpu_kthread_should_stop(int cpu)
{
while (cpu_is_offline(cpu) ||
- !cpumask_equal(¤t->cpus_allowed, cpumask_of(cpu)) ||
+ !cpumask_equal(tsk_cpus_allowed(current), cpumask_of(cpu)) ||
smp_processor_id() != cpu) {
if (kthread_should_stop())
return 1;
local_bh_enable();
schedule_timeout_uninterruptible(1);
- if (!cpumask_equal(¤t->cpus_allowed, cpumask_of(cpu)))
+ if (!cpumask_equal(tsk_cpus_allowed(current), cpumask_of(cpu)))
set_cpus_allowed_ptr(current, cpumask_of(cpu));
local_bh_disable();
}
diff --git a/kernel/sched.c b/kernel/sched.c
index fd4625f..254d299 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2344,11 +2344,11 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
/* Look for allowed, online CPU in same node. */
for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
- if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
+ if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
return dest_cpu;
/* Any allowed, online CPU? */
- dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
+ dest_cpu = cpumask_any_and(tsk_cpus_allowed(p), cpu_active_mask);
if (dest_cpu < nr_cpu_ids)
return dest_cpu;
@@ -2385,7 +2385,7 @@ int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
* [ this allows ->select_task() to simply return task_cpu(p) and
* not worry about this generic constraint ]
*/
- if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
+ if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) ||
!cpu_online(cpu)))
cpu = select_fallback_rq(task_cpu(p), p);
@@ -5387,7 +5387,7 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
goto out_unlock;
raw_spin_lock_irqsave(&p->pi_lock, flags);
- cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
+ cpumask_and(mask, tsk_cpus_allowed(p), cpu_online_mask);
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
out_unlock:
@@ -5815,7 +5815,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
- cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
+ cpumask_copy(tsk_cpus_allowed(idle), cpumask_of(cpu));
/*
* We're having a chicken and egg problem, even though we are
* holding rq->lock, the cpu isn't yet set to this cpu so the
@@ -5944,7 +5944,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
}
if (unlikely((p->flags & PF_THREAD_BOUND) && p != current &&
- !cpumask_equal(&p->cpus_allowed, new_mask))) {
+ !cpumask_equal(tsk_cpus_allowed(p), new_mask))) {
ret = -EINVAL;
goto out;
}
@@ -5952,7 +5952,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
if (p->sched_class->set_cpus_allowed)
p->sched_class->set_cpus_allowed(p, new_mask);
else {
- cpumask_copy(&p->cpus_allowed, new_mask);
+ cpumask_copy(tsk_cpus_allowed(p), new_mask);
p->rt.nr_cpus_allowed = cpumask_weight(new_mask);
}
@@ -6004,7 +6004,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
if (task_cpu(p) != src_cpu)
goto done;
/* Affinity changed (again). */
- if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
+ if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
goto fail;
/*
diff --git a/kernel/sched_cpupri.c b/kernel/sched_cpupri.c
index 2722dc1..199a231 100644
--- a/kernel/sched_cpupri.c
+++ b/kernel/sched_cpupri.c
@@ -77,11 +77,11 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
if (idx >= task_pri)
break;
- if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
+ if (cpumask_any_and(tsk_cpus_allowed(p), vec->mask) >= nr_cpu_ids)
continue;
if (lowest_mask) {
- cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
+ cpumask_and(lowest_mask, tsk_cpus_allowed(p), vec->mask);
/*
* We have to ensure that we have at least one bit
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8744593..bccd303 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1554,7 +1554,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
/* Skip over this group if it has no CPUs allowed */
if (!cpumask_intersects(sched_group_cpus(group),
- &p->cpus_allowed))
+ tsk_cpus_allowed(p)))
continue;
local_group = cpumask_test_cpu(this_cpu,
@@ -1600,7 +1600,7 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
int i;
/* Traverse only the allowed CPUs */
- for_each_cpu_and(i, sched_group_cpus(group), &p->cpus_allowed) {
+ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
load = weighted_cpuload(i);
if (load < min_load || (load == min_load && i == this_cpu)) {
@@ -1644,7 +1644,7 @@ static int select_idle_sibling(struct task_struct *p, int target)
if (!(sd->flags & SD_SHARE_PKG_RESOURCES))
break;
- for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) {
+ for_each_cpu_and(i, sched_domain_span(sd), tsk_cpus_allowed(p)) {
if (idle_cpu(i)) {
target = i;
break;
@@ -1687,7 +1687,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
int sync = wake_flags & WF_SYNC;
if (sd_flag & SD_BALANCE_WAKE) {
- if (cpumask_test_cpu(cpu, &p->cpus_allowed))
+ if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
want_affine = 1;
new_cpu = prev_cpu;
}
@@ -2046,7 +2046,7 @@ int can_migrate_task(struct task_struct *p, struct rq *rq, int this_cpu,
* 2) cannot be migrated to this CPU due to cpus_allowed, or
* 3) are cache-hot on their current CPU.
*/
- if (!cpumask_test_cpu(this_cpu, &p->cpus_allowed)) {
+ if (!cpumask_test_cpu(this_cpu, tsk_cpus_allowed(p))) {
schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
return 0;
}
@@ -3399,7 +3399,7 @@ redo:
* moved to this_cpu
*/
if (!cpumask_test_cpu(this_cpu,
- &busiest->curr->cpus_allowed)) {
+ tsk_cpus_allowed(busiest->curr))) {
raw_spin_unlock_irqrestore(&busiest->lock,
flags);
all_pinned = 1;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 19ecb31..92cdf8c 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1164,7 +1164,7 @@ static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep);
static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
{
if (!task_running(rq, p) &&
- (cpu < 0 || cpumask_test_cpu(cpu, &p->cpus_allowed)) &&
+ (cpu < 0 || cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) &&
(p->rt.nr_cpus_allowed > 1))
return 1;
return 0;
@@ -1299,7 +1299,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
*/
if (unlikely(task_rq(task) != rq ||
!cpumask_test_cpu(lowest_rq->cpu,
- &task->cpus_allowed) ||
+ tsk_cpus_allowed(task)) ||
task_running(rq, task) ||
!task->on_rq)) {
@@ -1583,7 +1583,7 @@ static void set_cpus_allowed_rt(struct task_struct *p,
update_rt_migration(&rq->rt);
}
- cpumask_copy(&p->cpus_allowed, new_mask);
+ cpumask_copy(tsk_cpus_allowed(p), new_mask);
p->rt.nr_cpus_allowed = weight;
}
diff --git a/kernel/trace/trace_workqueue.c b/kernel/trace/trace_workqueue.c
index 209b379..c557119 100644
--- a/kernel/trace/trace_workqueue.c
+++ b/kernel/trace/trace_workqueue.c
@@ -53,7 +53,7 @@ probe_workqueue_insertion(void *ignore,
struct task_struct *wq_thread,
struct work_struct *work)
{
- int cpu = cpumask_first(&wq_thread->cpus_allowed);
+ int cpu = cpumask_first(tsk_cpus_allowed(wq_thread));
struct cpu_workqueue_stats *node;
unsigned long flags;
@@ -75,7 +75,7 @@ probe_workqueue_execution(void *ignore,
struct task_struct *wq_thread,
struct work_struct *work)
{
- int cpu = cpumask_first(&wq_thread->cpus_allowed);
+ int cpu = cpumask_first(tsk_cpus_allowed(wq_thread));
struct cpu_workqueue_stats *node;
unsigned long flags;
@@ -121,7 +121,7 @@ static void
probe_workqueue_destruction(void *ignore, struct task_struct *wq_thread)
{
/* Workqueue only execute on one cpu */
- int cpu = cpumask_first(&wq_thread->cpus_allowed);
+ int cpu = cpumask_first(tsk_cpus_allowed(wq_thread));
struct cpu_workqueue_stats *node, *next;
unsigned long flags;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b082d70..acaf8a9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1286,7 +1286,7 @@ __acquires(&gcwq->lock)
if (gcwq->flags & GCWQ_DISASSOCIATED)
return false;
if (task_cpu(task) == gcwq->cpu &&
- cpumask_equal(¤t->cpus_allowed,
+ cpumask_equal(tsk_cpus_allowed(current),
get_cpu_mask(gcwq->cpu)))
return true;
spin_unlock_irq(&gcwq->lock);
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 4689cb0..503f087 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
* Kernel threads bound to a single CPU can safely use
* smp_processor_id():
*/
- if (cpumask_equal(¤t->cpus_allowed, cpumask_of(this_cpu)))
+ if (cpumask_equal(tsk_cpus_allowed(current), cpumask_of(this_cpu)))
goto out;
/*
--
1.7.3.1
[-- Attachment #5: 0002-change-task-cpus_allowed-to-pointer.patch --]
[-- Type: application/octet-stream, Size: 8123 bytes --]
From 9693b69bb8897580752265ad34df03343ef78e4d Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 27 Apr 2011 13:51:08 +0900
Subject: [PATCH 2/2] change task->cpus_allowed to pointer
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
arch/x86/kernel/init_task.c | 2 ++
include/linux/cpuset.h | 3 ++-
include/linux/init_task.h | 4 ++--
include/linux/sched.h | 15 +++++++++++----
kernel/cpuset.c | 9 ++++++---
kernel/fork.c | 19 +++++++++++++++++++
kernel/kthread.c | 3 ++-
kernel/sched.c | 12 ++++++++++--
kernel/sched_rt.c | 2 +-
9 files changed, 55 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/init_task.c b/arch/x86/kernel/init_task.c
index 43e9ccf..4715d82 100644
--- a/arch/x86/kernel/init_task.c
+++ b/arch/x86/kernel/init_task.c
@@ -23,6 +23,8 @@ static struct sighand_struct init_sighand = INIT_SIGHAND(init_sighand);
union thread_union init_thread_union __init_task_data =
{ INIT_THREAD_INFO(init_task) };
+struct cpumask init_cpus_allowed = CPU_MASK_ALL;
+
/*
* Initial task structure.
*
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 684fe71..c20a45d 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -146,7 +146,8 @@ static inline void cpuset_cpus_allowed(struct task_struct *p,
static inline int cpuset_cpus_allowed_fallback(struct task_struct *p)
{
- cpumask_copy(tsk_cpus_allowed(p), cpu_possible_mask);
+ cpumask_copy(p->cpus_allowed_ptr, cpu_possible_mask);
+ p->flags |= PF_THREAD_UNBOUND;
return cpumask_any(cpu_active_mask);
}
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 10bdf82..4142dda 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -135,13 +135,13 @@ extern struct cred init_cred;
.state = 0, \
.stack = &init_thread_info, \
.usage = ATOMIC_INIT(2), \
- .flags = PF_KTHREAD, \
+ .flags = PF_KTHREAD | PF_THREAD_UNBOUND, \
.lock_depth = -1, \
.prio = MAX_PRIO-20, \
.static_prio = MAX_PRIO-20, \
.normal_prio = MAX_PRIO-20, \
.policy = SCHED_NORMAL, \
- .cpus_allowed = CPU_MASK_ALL, \
+ .cpus_allowed_ptr = &init_cpus_allowed, \
.mm = NULL, \
.active_mm = &init_mm, \
.se = { \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3f7d3f9..716b24a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1233,7 +1233,7 @@ struct task_struct {
#endif
unsigned int policy;
- cpumask_t cpus_allowed;
+ struct cpumask *cpus_allowed_ptr;
#ifdef CONFIG_PREEMPT_RCU
int rcu_read_lock_nesting;
@@ -1544,9 +1544,6 @@ struct task_struct {
#endif
};
-/* Future-safe accessor for struct task_struct's cpus_allowed. */
-#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
-
/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
@@ -1729,6 +1726,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* Per process flags
*/
+#define PF_THREAD_UNBOUND 0x00000001
#define PF_STARTING 0x00000002 /* being created */
#define PF_EXITING 0x00000004 /* getting shut down */
#define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */
@@ -1759,6 +1757,15 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezable */
#define PF_FREEZER_NOSIG 0x80000000 /* Freezer won't send signals to it */
+/* Future-safe accessor for struct task_struct's cpus_allowed. */
+static inline const struct cpumask* tsk_cpus_allowed(struct task_struct *task)
+{
+ if (task->flags & PF_THREAD_UNBOUND)
+ return cpu_possible_mask;
+
+ return task->cpus_allowed_ptr;
+}
+
/*
* Only the _current_ task can read/write to tsk->flags, but other
* tasks can access tsk->flags in readonly mode for example
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 0deb871..ccb4890 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2189,8 +2189,10 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
rcu_read_lock();
cs = task_cs(tsk);
- if (cs)
- cpumask_copy(tsk_cpus_allowed(tsk), cs->cpus_allowed);
+ if (cs) {
+ cpumask_copy(tsk->cpus_allowed_ptr, cs->cpus_allowed);
+ tsk->flags &= ~PF_THREAD_UNBOUND;
+ }
rcu_read_unlock();
/*
@@ -2217,7 +2219,8 @@ int cpuset_cpus_allowed_fallback(struct task_struct *tsk)
* Like above we can temporary set any mask and rely on
* set_cpus_allowed_ptr() as synchronization point.
*/
- cpumask_copy(tsk_cpus_allowed(tsk), cpu_possible_mask);
+ cpumask_copy(tsk->cpus_allowed_ptr, cpu_possible_mask);
+ tsk->flags |= PF_THREAD_UNBOUND;
cpu = cpumask_any(cpu_active_mask);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index cc04197..485ab7d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -169,6 +169,8 @@ void free_task(struct task_struct *tsk)
free_thread_info(tsk->stack);
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
+ if (tsk->cpus_allowed_ptr)
+ kfree(tsk->cpus_allowed_ptr);
free_task_struct(tsk);
}
EXPORT_SYMBOL(free_task);
@@ -250,6 +252,19 @@ int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
return 0;
}
+static int dup_task_cpus_allowed(struct task_struct *task, struct task_struct *orig)
+{
+ struct cpumask *cpumask;
+
+ cpumask = kmalloc(cpumask_size(), GFP_KERNEL);
+ if (!cpumask)
+ return -ENOMEM;
+ cpumask_copy(cpumask, orig->cpus_allowed_ptr);
+ task->cpus_allowed_ptr = cpumask;
+
+ return 0;
+}
+
static struct task_struct *dup_task_struct(struct task_struct *orig)
{
struct task_struct *tsk;
@@ -280,6 +295,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
if (err)
goto out;
+ err = dup_task_cpus_allowed(tsk, orig);
+ if (err)
+ goto out;
+
setup_thread_stack(tsk, orig);
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5f35501..6d32c72 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -202,8 +202,9 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
return;
}
- cpumask_copy(tsk_cpus_allowed(p), cpumask_of(cpu));
+ cpumask_copy(p->cpus_allowed_ptr, cpumask_of(cpu));
p->rt.nr_cpus_allowed = 1;
+ p->flags &= ~PF_THREAD_UNBOUND;
p->flags |= PF_THREAD_BOUND;
}
EXPORT_SYMBOL(kthread_bind);
diff --git a/kernel/sched.c b/kernel/sched.c
index 254d299..764576c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5815,7 +5815,10 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
- cpumask_copy(tsk_cpus_allowed(idle), cpumask_of(cpu));
+ WARN_ON(!idle->cpus_allowed_ptr);
+ cpumask_copy(idle->cpus_allowed_ptr, cpumask_of(cpu));
+ idle->flags &= ~PF_THREAD_UNBOUND;
+
/*
* We're having a chicken and egg problem, even though we are
* holding rq->lock, the cpu isn't yet set to this cpu so the
@@ -5952,10 +5955,15 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
if (p->sched_class->set_cpus_allowed)
p->sched_class->set_cpus_allowed(p, new_mask);
else {
- cpumask_copy(tsk_cpus_allowed(p), new_mask);
+ cpumask_copy(p->cpus_allowed_ptr, new_mask);
p->rt.nr_cpus_allowed = cpumask_weight(new_mask);
}
+ if (cpumask_equal(new_mask, cpu_possible_mask))
+ p->flags |= PF_THREAD_UNBOUND;
+ else
+ p->flags &= ~PF_THREAD_UNBOUND;
+
/* Can the task run on the task's current CPU? If so, we're done */
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 92cdf8c..291f33e 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1583,7 +1583,7 @@ static void set_cpus_allowed_rt(struct task_struct *p,
update_rt_migration(&rq->rt);
}
- cpumask_copy(tsk_cpus_allowed(p), new_mask);
+ cpumask_copy(p->cpus_allowed_ptr, new_mask);
p->rt.nr_cpus_allowed = weight;
}
--
1.7.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread